Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

21 March 2020

Papers citing "Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers"

9 / 9 papers shown

Title
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives Elena Voita Rico Sennrich Ivan Titov 247 185 0 03 Sep 2019
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel Yao-Hung Hubert Tsai Shaojie Bai M. Yamada Louis-Philippe Morency Ruslan Salakhutdinov 91 251 0 30 Aug 2019
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation Chenze Shao Yang Feng Jinchao Zhang Fandong Meng Xilin Chen Jie Zhou 46 42 0 22 Jun 2019
Assessing the Ability of Self-Attention Networks to Learn Word Order Baosong Yang Longyue Wang Derek F. Wong Lidia S. Chao Zhaopeng Tu 25 32 0 03 Jun 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 76 1,120 0 23 May 2019
BERT Rediscovers the Classical NLP Pipeline Ian Tenney Dipanjan Das Ellie Pavlick MILM SSeg 100 1,458 0 15 May 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 272 888 0 03 May 2018
Accelerating Neural Transformer via an Average Attention Network Biao Zhang Deyi Xiong Jinsong Su 45 120 0 02 May 2018
Neural Machine Translation of Rare Words with Subword Units Rico Sennrich Barry Haddow Alexandra Birch 151 7,683 0 31 Aug 2015