ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown
Title
Understanding and Mitigating Spurious Correlations in Text
  Classification with Neighborhood Analysis
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis
Oscar Chew
Hsuan-Tien Lin
Kai-Wei Chang
Kuan-Hao Huang
34
5
0
23 May 2023
GATology for Linguistics: What Syntactic Dependencies It Knows
GATology for Linguistics: What Syntactic Dependencies It Knows
Yuqian Dai
S. Sharoff
M. Kamps
28
0
0
22 May 2023
Teaching Probabilistic Logical Reasoning to Transformers
Teaching Probabilistic Logical Reasoning to Transformers
Aliakbar Nafar
K. Venable
Parisa Kordjamshidi
ReLM
LRM
24
3
0
22 May 2023
LMs: Understanding Code Syntax and Semantics for Code Analysis
LMs: Understanding Code Syntax and Semantics for Code Analysis
Wei Ma
Shangqing Liu
Zhihao Lin
Wenhan Wang
Q. Hu
Ye Liu
Cen Zhang
Liming Nie
Li Li
Yang Liu
37
16
0
20 May 2023
Constructing Word-Context-Coupled Space Aligned with Associative
  Knowledge Relations for Interpretable Language Modeling
Constructing Word-Context-Coupled Space Aligned with Associative Knowledge Relations for Interpretable Language Modeling
Fanyu Wang
Zhenping Xie
24
0
0
19 May 2023
Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words
  Extraction with Wordpieces and Aspect Enhancement
Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement
Samuel Mensah
Kai Sun
Nikolaos Aletras
24
1
0
18 May 2023
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Chong Deng
Hai Yu
Jiaqing Liu
Yukun Ma
Chong Zhang
32
3
0
18 May 2023
Probing the Role of Positional Information in Vision-Language Models
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
24
8
0
17 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
41
82
0
15 May 2023
Continual Multimodal Knowledge Graph Construction
Continual Multimodal Knowledge Graph Construction
Xiang Chen
Jintian Zhang
Xiaohan Wang
Ningyu Zhang
Tongtong Wu
Luo Si
Yongheng Wang
Huajun Chen
KELM
CLL
30
14
0
15 May 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent
  English?
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
31
239
0
12 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction
  Shortcuts of Question Answering Models
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
41
3
0
11 May 2023
HiFi: High-Information Attention Heads Hold for Parameter-Efficient
  Model Adaptation
HiFi: High-Information Attention Heads Hold for Parameter-Efficient Model Adaptation
Anchun Gui
Han Xiao
21
4
0
08 May 2023
Transformer Working Memory Enables Regular Language Reasoning and
  Natural Language Length Extrapolation
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
14
13
0
05 May 2023
AttentionViz: A Global View of Transformer Attention
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
33
52
0
04 May 2023
Entity Tracking in Language Models
Entity Tracking in Language Models
Najoung Kim
Sebastian Schuster
55
16
0
03 May 2023
Causality-aware Concept Extraction based on Knowledge-guided Prompting
Causality-aware Concept Extraction based on Knowledge-guided Prompting
Siyu Yuan
Deqing Yang
Jinxi Liu
Shuyu Tian
Jiaqing Liang
Yanghua Xiao
R. Xie
69
13
0
03 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch
Uri Alon
Graham Neubig
Matthew R. Gormley
RALM
108
122
0
02 May 2023
Logion: Machine Learning for Greek Philology
Logion: Machine Learning for Greek Philology
Charlie Cowen-Breen
Creston Brooks
J. Haubold
B. Graziosi
27
4
0
01 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
193
121
0
30 Apr 2023
What does BERT learn about prosody?
What does BERT learn about prosody?
Sofoklis Kakouros
Johannah O'Mahony
MILM
22
5
0
25 Apr 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
27
1
0
17 Apr 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
34
6
0
13 Apr 2023
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Irina Proskurina
Irina Piontkovskaya
Ekaterina Artemova
85
3
0
04 Apr 2023
Coupling Artificial Neurons in BERT and Biological Neurons in the Human
  Brain
Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain
Xu Liu
Mengyue Zhou
Gaosheng Shi
Yu Du
Lin Zhao
Zihao Wu
David Liu
Tianming Liu
Xintao Hu
39
10
0
27 Mar 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
27
103
0
20 Mar 2023
Attention-likelihood relationship in transformers
Attention-likelihood relationship in transformers
Valeria Ruscio
Valentino Maiorca
Fabrizio Silvestri
21
1
0
15 Mar 2023
Finding the Needle in a Haystack: Unsupervised Rationale Extraction from
  Long Text Classifiers
Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers
Kamil Bujel
Andrew Caines
H. Yannakoudakis
Marek Rei
AI4TS
19
1
0
14 Mar 2023
The Life Cycle of Knowledge in Big Language Models: A Survey
The Life Cycle of Knowledge in Big Language Models: A Survey
Boxi Cao
Hongyu Lin
Xianpei Han
Le Sun
KELM
33
27
0
14 Mar 2023
Input-length-shortening and text generation via attention values
Input-length-shortening and text generation via attention values
Necset Ozkan Tan
A. Peng
Joshua Bensemann
Qiming Bao
Tim Hartill
M. Gahegan
Michael Witbrock
24
1
0
14 Mar 2023
LUKE-Graph: A Transformer-based Approach with Gated Relational Graph
  Attention for Cloze-style Reading Comprehension
LUKE-Graph: A Transformer-based Approach with Gated Relational Graph Attention for Cloze-style Reading Comprehension
Shima Foolad
Kourosh Kiani
19
3
0
12 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
Ultra-High-Resolution Detector Simulation with Intra-Event Aware GAN and
  Self-Supervised Relational Reasoning
Ultra-High-Resolution Detector Simulation with Intra-Event Aware GAN and Self-Supervised Relational Reasoning
H. Hashemi
Nikolai Hartmann
Sahand Sharifzadeh
James Kahn
T. Kuhr
26
4
0
07 Mar 2023
Spelling convention sensitivity in neural language models
Spelling convention sensitivity in neural language models
Elizabeth Nielsen
Christo Kirov
Brian Roark
22
1
0
06 Mar 2023
A Survey on Long Text Modeling with Transformers
A Survey on Long Text Modeling with Transformers
Zican Dong
Tianyi Tang
Lunyi Li
Wayne Xin Zhao
VLM
21
54
0
28 Feb 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models
Inseq: An Interpretability Toolkit for Sequence Generation Models
Gabriele Sarti
Nils Feldhus
Ludwig Sickert
Oskar van der Wal
Malvina Nissim
Arianna Bisazza
32
64
0
27 Feb 2023
SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based
  Sentiment Analysis
SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based Sentiment Analysis
Chengze Yu
Taiqiang Wu
Jiayi Li
Xingyu Bai
Yujiu Yang
28
10
0
25 Feb 2023
Mask-guided BERT for Few Shot Text Classification
Mask-guided BERT for Few Shot Text Classification
Wenxiong Liao
Zheng Liu
Haixing Dai
Zihao Wu
Yiyang Zhang
...
Dajiang Zhu
Tianming Liu
Sheng Li
Xiang Li
Hongmin Cai
VLM
47
39
0
21 Feb 2023
Evaluating the Effectiveness of Pre-trained Language Models in
  Predicting the Helpfulness of Online Product Reviews
Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews
Ali Boluki
Javad Pourmostafa Roshan Sharami
D. Shterionov
22
1
0
19 Feb 2023
Representation Deficiency in Masked Language Modeling
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
90
7
0
04 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of
  Attention Maps
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
30
14
0
01 Feb 2023
Quantifying Context Mixing in Transformers
Quantifying Context Mixing in Transformers
Hosein Mohebbi
Willem H. Zuidema
Grzegorz Chrupała
A. Alishahi
168
24
0
30 Jan 2023
Can We Use Probing to Better Understand Fine-tuning and Knowledge
  Distillation of the BERT NLU?
Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?
Jakub Ho'scilowicz
Marcin Sowanski
Piotr Czubowski
Artur Janicki
25
2
0
27 Jan 2023
Interpretability in Activation Space Analysis of Transformers: A Focused
  Survey
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
35
3
0
22 Jan 2023
Deep Learning Models to Study Sentence Comprehension in the Human Brain
Deep Learning Models to Study Sentence Comprehension in the Human Brain
S. Arana
Jacques Pesnot Lerousseau
P. Hagoort
23
10
0
16 Jan 2023
Topics in Contextualised Attention Embeddings
Topics in Contextualised Attention Embeddings
Mozhgan Talebpour
A. G. S. D. Herrera
Shoaib Jameel
34
2
0
11 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
40
8
0
05 Jan 2023
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Tomer Wullach
Shlomo E. Chazan
30
1
0
27 Dec 2022
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
Attention as a Guide for Simultaneous Speech Translation
Attention as a Guide for Simultaneous Speech Translation
Sara Papi
Matteo Negri
Marco Turchi
26
30
0
15 Dec 2022
Previous
123...678...161718
Next