Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.01380
Cited By
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
3 September 2019
Elena Voita
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives"
50 / 133 papers shown
Title
Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain
Xu Liu
Mengyue Zhou
Gaosheng Shi
Yu Du
Lin Zhao
Zihao Wu
David Liu
Tianming Liu
Xintao Hu
39
10
0
27 Mar 2023
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
Alexander Yom Din
Taelin Karidi
Leshem Choshen
Mor Geva
17
57
0
16 Mar 2023
Topics in Contextualised Attention Embeddings
Mozhgan Talebpour
A. G. S. D. Herrera
Shoaib Jameel
31
2
0
11 Jan 2023
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
37
14
0
21 Dec 2022
On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
S. Takagi
OffRL
18
7
0
17 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Baohao Liao
David Thulke
Sanjika Hewavitharana
Hermann Ney
Christof Monz
36
9
0
09 Nov 2022
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective
Bingyang Wen
K. P. Subbalakshmi
Fan Yang
FAtt
19
6
0
31 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
35
16
0
18 Oct 2022
Transparency Helps Reveal When Language Models Learn Meaning
Zhaofeng Wu
William Merrill
Hao Peng
Iz Beltagy
Noah A. Smith
19
9
0
14 Oct 2022
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
24
83
0
06 Sep 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying-Cong Chen
Xinyan Xiao
Jing Liu
Hua Wu
37
4
0
28 Jul 2022
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Timothee Mickus
Denis Paperno
Mathieu Constant
27
19
0
07 Jun 2022
Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation
Verna Dankers
Christopher G. Lucas
Ivan Titov
38
36
0
30 May 2022
Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer
Javier Ferrando
Gerard I. Gállego
Belen Alastruey
Carlos Escolano
Marta R. Costa-jussá
30
44
0
23 May 2022
The Geometry of Multilingual Language Model Representations
Tyler A. Chang
Z. Tu
Benjamin Bergen
21
56
0
22 May 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
134
350
0
21 May 2022
A Study on Transformer Configuration and Training Objective
Fuzhao Xue
Jianghai Chen
Aixin Sun
Xiaozhe Ren
Zangwei Zheng
Xiaoxin He
Yongming Chen
Xin Jiang
Yang You
33
7
0
21 May 2022
Visualizing and Explaining Language Models
Adrian M. P. Braşoveanu
Razvan Andonie
MILM
VLM
29
4
0
30 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
31
167
0
12 Apr 2022
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
20
113
0
30 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
69
336
0
28 Mar 2022
Integrating Vectorized Lexical Constraints for Neural Machine Translation
Shuo Wang
Zhixing Tan
Yang Liu
19
11
0
23 Mar 2022
On Robust Prefix-Tuning for Text Classification
Zonghan Yang
Yang Liu
VLM
24
20
0
19 Mar 2022
Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations
Robert Wolfe
Aylin Caliskan
VLM
21
13
0
14 Mar 2022
VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models
Robert Wolfe
Aylin Caliskan
18
12
0
14 Mar 2022
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
29
162
0
16 Feb 2022
Representation Topology Divergence: A Method for Comparing Neural Network Representations
S. Barannikov
I. Trofimov
Nikita Balabin
Evgeny Burnaev
3DPC
40
45
0
31 Dec 2021
Consistency and Coherence from Points of Contextual Similarity
Oleg V. Vasilyev
John Bohannon
HILM
33
1
0
22 Dec 2021
Measuring Context-Word Biases in Lexical Semantic Datasets
Qianchu Liu
Diana McCarthy
Anna Korhonen
31
2
0
13 Dec 2021
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models
Robert Wolfe
Aylin Caliskan
87
51
0
01 Oct 2021
On the Prunability of Attention Heads in Multilingual BERT
Aakriti Budhraja
Madhura Pande
Pratyush Kumar
Mitesh M. Khapra
50
4
0
26 Sep 2021
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers
Jason Phang
Haokun Liu
Samuel R. Bowman
30
25
0
17 Sep 2021
Differentiable Physics: A Position Piece
Bharath Ramsundar
Dilip Krishnamurthy
V. Viswanathan
PINN
AI4CE
40
14
0
14 Sep 2021
Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations
Mohsen Fayyaz
Ehsan Aghazadeh
Ali Modarressi
Hosein Mohebbi
Mohammad Taher Pilehvar
18
21
0
13 Sep 2021
Interactively Providing Explanations for Transformer Language Models
Felix Friedrich
P. Schramowski
Christopher Tauchmann
Kristian Kersting
LRM
41
6
0
02 Sep 2021
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Pierre Colombo
Guillaume Staerman
Chloé Clavel
Pablo Piantanida
27
41
0
27 Aug 2021
Translation Error Detection as Rationale Extraction
M. Fomicheva
Lucia Specia
Nikolaos Aletras
21
23
0
27 Aug 2021
Do Vision Transformers See Like Convolutional Neural Networks?
M. Raghu
Thomas Unterthiner
Simon Kornblith
Chiyuan Zhang
Alexey Dosovitskiy
ViT
67
925
0
19 Aug 2021
CoBERL: Contrastive BERT for Reinforcement Learning
Andrea Banino
Adria Puidomenech Badia
Jacob Walker
Tim Scholtes
Jovana Mitrović
Charles Blundell
OffRL
30
36
0
12 Jul 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model
Ankita Pasad
Ju-Chieh Chou
Karen Livescu
SSL
26
288
0
10 Jul 2021
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning
Rabeeh Karimi Mahabadi
Yonatan Belinkov
James Henderson
DRL
21
71
0
10 Jun 2021
On Compositional Generalization of Neural Machine Translation
Yafu Li
Yongjing Yin
Yulong Chen
Yue Zhang
156
45
0
31 May 2021
Inspecting the concept knowledge graph encoded by modern language models
Carlos Aspillaga
Marcelo Mendoza
Alvaro Soto
27
13
0
27 May 2021
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond
Daniel Loureiro
A. Jorge
Jose Camacho-Collados
33
26
0
26 May 2021
DirectQE: Direct Pretraining for Machine Translation Quality Estimation
Qu Cui
Shujian Huang
Jiahuan Li
Xiang Geng
Zaixiang Zheng
Guoping Huang
Jiajun Chen
29
24
0
15 May 2021
The Low-Dimensional Linear Geometry of Contextualized Word Representations
Evan Hernandez
Jacob Andreas
MILM
25
40
0
15 May 2021
Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses
Aina Garí Soler
Marianna Apidianaki
MILM
211
68
0
29 Apr 2021
Editing Factual Knowledge in Language Models
Nicola De Cao
Wilker Aziz
Ivan Titov
KELM
68
474
0
16 Apr 2021
Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders
Brendan Chambers
James A. Evans
MedIm
13
0
0
13 Apr 2021
Discourse Probing of Pretrained Language Models
Fajri Koto
Jey Han Lau
Tim Baldwin
28
53
0
13 Apr 2021
Previous
1
2
3
Next