Clone wiki

NeuralMT / News

New publications, developments and events

Other sources


  • CNN Is All You Need (Qiming Chen, Ren Wu)
    Incredible improvement in BLEU scores - is this for real? Check discussion and see the reason ...
  • Attention Is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin)
    The transformer model by Google without convolutions nor recurrent network layers
  • Convolutional Sequence to Sequence Learning (Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin)
    Facebook's convolutional NMT system, translation accuracy comporable to Google's system but much faster.
  • Google’s Neural Machine Translation System (Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi)
    Various details about Google's NMT model
  • SYSTRAN's Pure NMT
    A system based on Torch and the Harvard NMT implementation
  • Context Gates for Neural Machine Translation (Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, Hang Li)
    Context gates that control the influence of source and target context when generating words. Intuition: Content words should rely more on source language context whereas function words should look more at target language context. (code available here)
  • Modeling Coverage for Neural Machine Translation (Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li)
    Add a coverage vector to keep track of the attention history to avoid under- and over-translation. (code available here and the older version here)
  • Neural Machine Translation with Reconstruction (Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li)
    Add a reconstruction layer to improve adequacy of the model. The system needs to reconstruct the source sentence after decoding.

Multilingual Models

Subword and character based methods

Discourse-level NMT / wider context

Unsupervised / semi-supervised models

Improved alignment models

Hybrid models (in whatever sense)

Supervision at different layers

Optimization and regularization methods

  • A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (Yarin Gal, 2015)
    Dropout has been very successful for regularization of different types of networks, but it has been difficult to apply to RNNs. Gal presents a method that actually works, has a theoretical foundation on variational Bayesian method (so it is sometimes referred to as "variational dropout"), and has been adopted by several people already. Drastically reduces overfitting, but comes at the cost of somewhat slower convergence. Implemented in BNAS.
  • Layer Normalization (Jimmy Lei Ba, Jamie Ryan Kiros and Geoffrey E. Hinton, 2016)
    Similar to Batch Normalization, but normalizing over the nodes in a layer rather than over the same node in a minibatch. Easy to apply to recurrent networks, and our experiments show that their first LSTM variant (equations 20--22) works better than the second one (equations 29--31), although there are issues with numerical stability.

Domain adaptation

Cool stuff, possibly useful