ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown
Title
Faithful Explanations of Black-box NLP Models Using LLM-generated
  Counterfactuals
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals
Y. Gat
Nitay Calderon
Amir Feder
Alexander Chapanin
Amit Sharma
Roi Reichart
38
29
0
01 Oct 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
  Language Models
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
23
40
0
26 Sep 2023
AMPLIFY:Attention-based Mixup for Performance Improvement and Label
  Smoothing in Transformer
AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer
Leixin Yang
Yu Xiang
28
0
0
22 Sep 2023
AttentionMix: Data augmentation method that relies on BERT attention
  mechanism
AttentionMix: Data augmentation method that relies on BERT attention mechanism
Dominik Lewy
Jacek Mańdziuk
22
3
0
20 Sep 2023
Weakly Supervised Reasoning by Neuro-Symbolic Approaches
Weakly Supervised Reasoning by Neuro-Symbolic Approaches
Xianggen Liu
Zhengdong Lu
Lili Mou
LRM
NAI
40
4
0
19 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
29
62
0
13 Sep 2023
Generating Natural Language Queries for More Effective Systematic Review
  Screening Prioritisation
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation
Shuai Wang
Harrisen Scells
Martin Potthast
Bevan Koopman
Guido Zuccon
27
10
0
11 Sep 2023
DeViT: Decomposing Vision Transformers for Collaborative Inference in
  Edge Devices
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Guanyu Xu
Zhiwei Hao
Yong Luo
Han Hu
J. An
Shiwen Mao
ViT
37
14
0
10 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
38
47
0
09 Sep 2023
One Wide Feedforward is All You Need
One Wide Feedforward is All You Need
Telmo Pires
António V. Lopes
Yannick Assogba
Hendra Setiawan
48
12
0
04 Sep 2023
A Visual Interpretation-Based Self-Improved Classification System Using
  Virtual Adversarial Training
A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training
Shuai Jiang
Sayaka Kamei
Chen Li
Shengzhe Hou
Yasuhiko Morimoto
SSL
16
1
0
03 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
29
411
0
02 Sep 2023
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive
  Languages
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
Shunjie Wang
Shane Steinert-Threlkeld
27
4
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?:
  Geometry might be the answer
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
22
10
0
01 Sep 2023
Consensus of state of the art mortality prediction models: From
  all-cause mortality to sudden death prediction
Consensus of state of the art mortality prediction models: From all-cause mortality to sudden death prediction
Yola Jones
F. Deligianni
Jeffrey Stephen Dalton
P. Pellicori
John G. F. Cleland
OOD
26
0
0
30 Aug 2023
Uncertainty Estimation of Transformers' Predictions via Topological
  Analysis of the Attention Matrices
Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok
D. Cherniavskii
Alexey Zaytsev
56
5
0
22 Aug 2023
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Running Zhao
Jiang-Tao Luca Yu
Haiying Zhao
Edith C.H. Ngai
32
4
0
16 Aug 2023
Task Conditioned BERT for Joint Intent Detection and Slot-filling
Task Conditioned BERT for Joint Intent Detection and Slot-filling
Diogo Tavares
Pedro Azevedo
David Semedo
R. Sousa
João Magalhães
24
4
0
11 Aug 2023
Slot Induction via Pre-trained Language Model Probing and Multi-level
  Contrastive Learning
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
Hoang Nguyen
Chenwei Zhang
Ye Liu
Philip S. Yu
39
5
0
09 Aug 2023
Trusting Language Models in Education
Trusting Language Models in Education
J. Neto
Li-Ming Deng
Thejaswi Raya
Reza Shahbazi
Nick Liu
Adhitya Venkatesh
Miral Shah
Neeru Khosla
Rodrigo Guido
24
0
0
07 Aug 2023
Prompt Guided Copy Mechanism for Conversational Question Answering
Prompt Guided Copy Mechanism for Conversational Question Answering
Yong Zhang
Zhitao Li
Jianzong Wang
Yiming Gao
Ning Cheng
Fengying Yu
Jing Xiao
19
0
0
07 Aug 2023
Explaining Relation Classification Models with Semantic Extents
Explaining Relation Classification Models with Semantic Extents
Lars Klöser
André Büsgen
Philipp Kohl
Bodo Kraft
Albert Zündorf
19
0
0
04 Aug 2023
MDViT: Multi-domain Vision Transformer for Small Medical Image
  Segmentation Datasets
MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets
Siyi Du
Nourhan Bayasi
Ghassan Hamarneh
Rafeef Garbi
ViT
30
18
0
05 Jul 2023
The Inner Sentiments of a Thought
The Inner Sentiments of a Thought
Christian Gagné
Peter Dayan
33
4
0
04 Jul 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
21
25
0
30 Jun 2023
Constraint-aware and Ranking-distilled Token Pruning for Efficient
  Transformer Inference
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Junyan Li
Li Zhang
Jiahang Xu
Yujing Wang
Shaoguang Yan
...
Ting Cao
Hao Sun
Weiwei Deng
Qi Zhang
Mao Yang
38
10
0
26 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
23
87
0
22 Jun 2023
Opening the Black Box: Analyzing Attention Weights and Hidden States in
  Pre-trained Language Models for Non-language Tasks
Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
34
2
0
21 Jun 2023
Explicit Syntactic Guidance for Neural Text Generation
Explicit Syntactic Guidance for Neural Text Generation
Yafu Li
Leyang Cui
Jianhao Yan
Yongjng Yin
Wei Bi
Shuming Shi
Yue Zhang
21
9
0
20 Jun 2023
Did the Models Understand Documents? Benchmarking Models for Language
  Understanding in Document-Level Relation Extraction
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction
Haotian Chen
Bingsheng Chen
Xiangdong Zhou
45
6
0
20 Jun 2023
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided
  Framework
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework
Paras Sheth
Tharindu Kumarage
Raha Moraffah
Amanat Chadha
Huan Liu
31
7
0
15 Jun 2023
Is Anisotropy Inherent to Transformers?
Is Anisotropy Inherent to Transformers?
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
19
3
0
13 Jun 2023
Actively Supervised Clustering for Open Relation Extraction
Actively Supervised Clustering for Open Relation Extraction
Jun Zhao
Yongxin Zhang
Qi Zhang
Tao Gui
Zhongyu Wei
Minlong Peng
Mingming Sun
24
5
0
08 Jun 2023
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
  Language Understanding
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Junda Wu
Tong Yu
Rui Wang
Zhao Song
Ruiyi Zhang
Handong Zhao
Chaochao Lu
Shuai Li
Ricardo Henao
VLM
39
23
0
08 Jun 2023
Causal interventions expose implicit situation models for commonsense
  language understanding
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
25
6
0
06 Jun 2023
CUE: An Uncertainty Interpretation Framework for Text Classifiers Built
  on Pre-Trained Language Models
CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models
Jiazheng Li
ZHAOYUE SUN
Bin Liang
Lin Gui
Yulan He
18
2
0
06 Jun 2023
Representational Strengths and Limitations of Transformers
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
22
81
0
05 Jun 2023
DecompX: Explaining Transformers Decisions by Propagating Token
  Decomposition
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition
Ali Modarressi
Mohsen Fayyaz
Ehsan Aghazadeh
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
25
26
0
05 Jun 2023
Span Identification of Epistemic Stance-Taking in Academic Written
  English
Span Identification of Epistemic Stance-Taking in Academic Written English
Masaki Eguchi
K. Kyle
11
5
0
03 Jun 2023
Do Large Language Models Pay Similar Attention Like Human Programmers
  When Generating Code?
Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?
Bonan Kou
Shengmai Chen
Zhijie Wang
Lei Ma
Tianyi Zhang
ALM
11
13
0
02 Jun 2023
Learning Transformer Programs
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
28
32
0
01 Jun 2023
ACLM: A Selective-Denoising based Generative Data Augmentation Approach
  for Low-Resource Complex NER
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
Sreyan Ghosh
Utkarsh Tyagi
Manan Suri
Sonal Kumar
S. Ramaneswaran
Dinesh Manocha
28
15
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
28
7
0
01 Jun 2023
Assessing Word Importance Using Models Trained for Semantic Tasks
Assessing Word Importance Using Models Trained for Semantic Tasks
Dávid Javorský
Ondrej Bojar
François Yvon
33
2
0
31 May 2023
Emergent Modularity in Pre-trained Transformers
Emergent Modularity in Pre-trained Transformers
Zhengyan Zhang
Zhiyuan Zeng
Yankai Lin
Chaojun Xiao
Xiaozhi Wang
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
47
23
0
28 May 2023
Robust Natural Language Understanding with Residual Attention Debiasing
Robust Natural Language Understanding with Residual Attention Debiasing
Fei Wang
James Y. Huang
Tianyi Yan
Wenxuan Zhou
Muhao Chen
37
10
0
28 May 2023
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model
  Fine-tuning
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
Zhen-Ru Zhang
Chuanqi Tan
Haiyang Xu
Chengyu Wang
Jun Huang
Songfang Huang
33
29
0
24 May 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight
  Initialisation and Distillation Objectives
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
28
8
0
24 May 2023
SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
  Representations
SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations
Victoria Lin
Louis-Philippe Morency
MILM
20
1
0
24 May 2023
All Roads Lead to Rome? Exploring the Invariance of Transformers'
  Representations
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations
Yuxin Ren
Qipeng Guo
Zhijing Jin
Shauli Ravfogel
Mrinmaya Sachan
Bernhard Schölkopf
Ryan Cotterell
30
4
0
23 May 2023
Previous
123...567...161718
Next