An alternative formulation of attention pooling function in translation

An alternative formulation of attention pooling function in translation

23 August 2024

Eddie Conti

ArXiv (abs)PDF HTML

Papers citing "An alternative formulation of attention pooling function in translation"

10 / 10 papers shown

Title
Long Range Arena: A Benchmark for Efficient Transformers Yi Tay Mostafa Dehghani Samira Abnar Songlin Yang Dara Bahri Philip Pham J. Rao Liu Yang Sebastian Ruder Donald Metzler 163 730 0 08 Nov 2020
Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation Rui Li Jianlin Su Chenxi Duan Shunyi Zheng 3DV 53 39 0 29 Jul 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 582 2,105 0 28 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos Apoorv Vyas Nikolaos Pappas Franccois Fleuret 209 1,793 0 29 Jun 2020
Linformer: Self-Attention with Linear Complexity Sinong Wang Belinda Z. Li Madian Khabsa Han Fang Hao Ma 219 1,719 0 08 Jun 2020
Longformer: The Long-Document Transformer Iz Beltagy Matthew E. Peters Arman Cohan RALM VLM 187 4,105 0 10 Apr 2020
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 119 1,149 0 23 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,324 0 11 Oct 2018
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 422 7,971 0 17 Aug 2015
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 584 27,345 0 01 Sep 2014