Assessing the Ability of Self-Attention Networks to Learn Word Order

Assessing the Ability of Self-Attention Networks to Learn Word Order

3 June 2019

Papers citing "Assessing the Ability of Self-Attention Networks to Learn Word Order"

12 / 12 papers shown

Title
AST-MHSA : Code Summarization using Multi-Head Self-Attention Y. Nagaraj U. Gupta 25 1 0 10 Aug 2023
Position Information in Transformers: An Overview Philipp Dufter Martin Schmitt Hinrich Schütze 13 139 0 22 Feb 2021
Mitigating the Position Bias of Transformer Models in Passage Re-Ranking Sebastian Hofstatter Aldo Lipani Sophia Althammer Markus Zlabinger Allan Hanbury 55 15 0 18 Jan 2021
Rethinking the Value of Transformer Components Wenxuan Wang Zhaopeng Tu 11 38 0 07 Nov 2020
On the Sub-Layer Functionalities of Transformer Decoder Yilin Yang Longyue Wang Shuming Shi Prasad Tadepalli Stefan Lee Zhaopeng Tu 26 27 0 06 Oct 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling S. Bhattamishra Arkil Patel Navin Goyal 33 65 0 16 Jun 2020
How Does Selective Mechanism Improve Self-Attention Networks? Xinwei Geng Longyue Wang Xing Wang Bing Qin Ting Liu Zhaopeng Tu AAML 39 35 0 03 May 2020
Self-Attention with Cross-Lingual Position Representation Liang Ding Longyue Wang Dacheng Tao MILM 33 37 0 28 Apr 2020
Towards Understanding Neural Machine Translation with Word Importance Shilin He Zhaopeng Tu Xing Wang Longyue Wang Michael R. Lyu Shuming Shi AAML 23 39 0 01 Sep 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 201 883 0 03 May 2018
A Decomposable Attention Model for Natural Language Inference Ankur P. Parikh Oscar Täckström Dipanjan Das Jakob Uszkoreit 213 1,367 0 06 Jun 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,929 0 17 Aug 2015