The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

3 September 2019

Papers citing "The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives"

50 / 133 papers shown

Title
Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain Xu Liu Mengyue Zhou Gaosheng Shi Yu Du Lin Zhao Zihao Wu David Liu Tianming Liu Xintao Hu 39 10 0 27 Mar 2023
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations Alexander Yom Din Taelin Karidi Leshem Choshen Mor Geva 17 57 0 16 Mar 2023
Topics in Contextualised Attention Embeddings Mozhgan Talebpour A. G. S. D. Herrera Shoaib Jameel 31 2 0 11 Jan 2023
What Makes for Good Tokenizers in Vision Transformer? Shengju Qian Yi Zhu Wenbo Li Mu Li Jiaya Jia ViT 37 14 0 21 Dec 2022
On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning S. Takagi OffRL 18 7 0 17 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token Baohao Liao David Thulke Sanjika Hewavitharana Hermann Ney Christof Monz 36 9 0 09 Nov 2022
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective Bingyang Wen K. P. Subbalakshmi Fan Yang FAtt 19 6 0 31 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei 35 16 0 18 Oct 2022
Transparency Helps Reveal When Language Models Learn Meaning Zhaofeng Wu William Merrill Hao Peng Iz Beltagy Noah A. Smith 19 9 0 14 Oct 2022
Analyzing Transformers in Embedding Space Guy Dar Mor Geva Ankit Gupta Jonathan Berant 24 83 0 06 Sep 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models Ya-Ming Shen Lijie Wang Ying-Cong Chen Xinyan Xiao Jing Liu Hua Wu 37 4 0 28 Jul 2022
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces Timothee Mickus Denis Paperno Mathieu Constant 27 19 0 07 Jun 2022
Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation Verna Dankers Christopher G. Lucas Ivan Titov 38 36 0 30 May 2022
Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer Javier Ferrando Gerard I. Gállego Belen Alastruey Carlos Escolano Marta R. Costa-jussá 30 44 0 23 May 2022
The Geometry of Multilingual Language Model Representations Tyler A. Chang Z. Tu Benjamin Bergen 21 56 0 22 May 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 134 350 0 21 May 2022
A Study on Transformer Configuration and Training Objective Fuzhao Xue Jianghai Chen Aixin Sun Xiaozhe Ren Zangwei Zheng Xiaoxin He Yongming Chen Xin Jiang Yang You 33 7 0 21 May 2022
Visualizing and Explaining Language Models Adrian M. P. Braşoveanu Razvan Andonie MILM VLM 29 4 0 30 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? Thomas Wang Adam Roberts Daniel Hesslow Teven Le Scao Hyung Won Chung Iz Beltagy Julien Launay Colin Raffel 31 167 0 12 Apr 2022
Transformer Language Models without Positional Encodings Still Learn Positional Information Adi Haviv Ori Ram Ofir Press Peter Izsak Omer Levy 20 113 0 30 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva Avi Caciularu Ke Wang Yoav Goldberg KELM 69 336 0 28 Mar 2022
Integrating Vectorized Lexical Constraints for Neural Machine Translation Shuo Wang Zhixing Tan Yang Liu 19 11 0 23 Mar 2022
On Robust Prefix-Tuning for Text Classification Zonghan Yang Yang Liu VLM 24 20 0 19 Mar 2022
Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations Robert Wolfe Aylin Caliskan VLM 21 13 0 14 Mar 2022
VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models Robert Wolfe Aylin Caliskan 18 12 0 14 Mar 2022
Should You Mask 15% in Masked Language Modeling? Alexander Wettig Tianyu Gao Zexuan Zhong Danqi Chen CVBM 29 162 0 16 Feb 2022
Representation Topology Divergence: A Method for Comparing Neural Network Representations S. Barannikov I. Trofimov Nikita Balabin Evgeny Burnaev 3DPC 40 45 0 31 Dec 2021
Consistency and Coherence from Points of Contextual Similarity Oleg V. Vasilyev John Bohannon HILM 33 1 0 22 Dec 2021
Measuring Context-Word Biases in Lexical Semantic Datasets Qianchu Liu Diana McCarthy Anna Korhonen 31 2 0 13 Dec 2021
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models Robert Wolfe Aylin Caliskan 87 51 0 01 Oct 2021
On the Prunability of Attention Heads in Multilingual BERT Aakriti Budhraja Madhura Pande Pratyush Kumar Mitesh M. Khapra 50 4 0 26 Sep 2021
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Jason Phang Haokun Liu Samuel R. Bowman 30 25 0 17 Sep 2021
Differentiable Physics: A Position Piece Bharath Ramsundar Dilip Krishnamurthy V. Viswanathan PINN AI4CE 40 14 0 14 Sep 2021
Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations Mohsen Fayyaz Ehsan Aghazadeh Ali Modarressi Hosein Mohebbi Mohammad Taher Pilehvar 18 21 0 13 Sep 2021
Interactively Providing Explanations for Transformer Language Models Felix Friedrich P. Schramowski Christopher Tauchmann Kristian Kersting LRM 41 6 0 02 Sep 2021
Automatic Text Evaluation through the Lens of Wasserstein Barycenters Pierre Colombo Guillaume Staerman Chloé Clavel Pablo Piantanida 27 41 0 27 Aug 2021
Translation Error Detection as Rationale Extraction M. Fomicheva Lucia Specia Nikolaos Aletras 21 23 0 27 Aug 2021
Do Vision Transformers See Like Convolutional Neural Networks? M. Raghu Thomas Unterthiner Simon Kornblith Chiyuan Zhang Alexey Dosovitskiy ViT 67 925 0 19 Aug 2021
CoBERL: Contrastive BERT for Reinforcement Learning Andrea Banino Adria Puidomenech Badia Jacob Walker Tim Scholtes Jovana Mitrović Charles Blundell OffRL 30 36 0 12 Jul 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model Ankita Pasad Ju-Chieh Chou Karen Livescu SSL 26 288 0 10 Jul 2021
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning Rabeeh Karimi Mahabadi Yonatan Belinkov James Henderson DRL 21 71 0 10 Jun 2021
On Compositional Generalization of Neural Machine Translation Yafu Li Yongjing Yin Yulong Chen Yue Zhang 156 45 0 31 May 2021
Inspecting the concept knowledge graph encoded by modern language models Carlos Aspillaga Marcelo Mendoza Alvaro Soto 27 13 0 27 May 2021
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond Daniel Loureiro A. Jorge Jose Camacho-Collados 33 26 0 26 May 2021
DirectQE: Direct Pretraining for Machine Translation Quality Estimation Qu Cui Shujian Huang Jiahuan Li Xiang Geng Zaixiang Zheng Guoping Huang Jiajun Chen 29 24 0 15 May 2021
The Low-Dimensional Linear Geometry of Contextualized Word Representations Evan Hernandez Jacob Andreas MILM 25 40 0 15 May 2021
Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses Aina Garí Soler Marianna Apidianaki MILM 211 68 0 29 Apr 2021
Editing Factual Knowledge in Language Models Nicola De Cao Wilker Aziz Ivan Titov KELM 68 474 0 16 Apr 2021
Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders Brendan Chambers James A. Evans MedIm 13 0 0 13 Apr 2021
Discourse Probing of Pretrained Language Models Fajri Koto Jey Han Lau Tim Baldwin 28 53 0 13 Apr 2021