ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 886 papers shown
Title
Attention as a Guide for Simultaneous Speech Translation
Attention as a Guide for Simultaneous Speech Translation
Sara Papi
Matteo Negri
Marco Turchi
26
30
0
15 Dec 2022
Explainability of Text Processing and Retrieval Methods: A Critical
  Survey
Explainability of Text Processing and Retrieval Methods: A Critical Survey
Sourav Saha
Debapriyo Majumdar
Mandar Mitra
18
5
0
14 Dec 2022
Mortality Prediction Models with Clinical Notes Using Sparse Attention
  at the Word and Sentence Levels
Mortality Prediction Models with Clinical Notes Using Sparse Attention at the Word and Sentence Levels
Miguel Rios
A. Abu-Hanna
16
0
0
12 Dec 2022
On the Importance of Clinical Notes in Multi-modal Learning for EHR Data
On the Importance of Clinical Notes in Multi-modal Learning for EHR Data
Severin Husmann
Hugo Yèche
Gunnar Rätsch
Rita Kuznetsova
HAI
16
10
0
06 Dec 2022
Syntactic Substitutability as Unsupervised Dependency Syntax
Syntactic Substitutability as Unsupervised Dependency Syntax
Jasper Jian
Siva Reddy
27
3
0
29 Nov 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer
Explanation on Pretraining Bias of Finetuned Vision Transformer
Bumjin Park
Jaesik Choi
ViT
36
1
0
18 Nov 2022
Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
  Representations
Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations
Linlin Liu
Xingxuan Li
Megh Thakkar
Xin Li
Chenyu You
Luo Si
Lidong Bing
27
2
0
16 Nov 2022
Introducing Semantics into Speech Encoders
Introducing Semantics into Speech Encoders
Derek Xu
Shuyan Dong
Changhan Wang
Suyoun Kim
Zhaojiang Lin
...
Alexei Baevski
Guan-Ting Lin
Hung-yi Lee
Yizhou Sun
Wei Wang
SSL
36
3
0
15 Nov 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
36
3
0
14 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILM
MoE
27
50
0
14 Nov 2022
Demystify Self-Attention in Vision Transformers from a Semantic
  Perspective: Analysis and Application
Demystify Self-Attention in Vision Transformers from a Semantic Perspective: Analysis and Application
Leijie Wu
Song Guo
Yaohong Ding
Junxiao Wang
Wenchao Xu
Richard Yi Da Xu
Jiewei Zhang
41
2
0
13 Nov 2022
FPT: Improving Prompt Tuning Efficiency via Progressive Training
FPT: Improving Prompt Tuning Efficiency via Progressive Training
Yufei Huang
Yujia Qin
Huadong Wang
Yichun Yin
Maosong Sun
Zhiyuan Liu
Qun Liu
VLM
LRM
35
6
0
13 Nov 2022
The Architectural Bottleneck Principle
The Architectural Bottleneck Principle
Tiago Pimentel
Josef Valvoda
Niklas Stoehr
Ryan Cotterell
25
5
0
11 Nov 2022
Improving word mover's distance by leveraging self-attention matrix
Improving word mover's distance by leveraging self-attention matrix
Hiroaki Yamagiwa
Sho Yokoi
Hidetoshi Shimodaira
OT
32
4
0
11 Nov 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene
  Descriptions
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
Michele Cafagna
Kees van Deemter
Albert Gatt
CoGe
16
3
0
09 Nov 2022
miCSE: Mutual Information Contrastive Learning for Low-shot Sentence
  Embeddings
miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings
T. Klein
Moin Nabi
SSL
26
15
0
09 Nov 2022
How Much Does Attention Actually Attend? Questioning the Importance of
  Attention in Pretrained Transformers
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Michael Hassid
Hao Peng
Daniel Rotem
Jungo Kasai
Ivan Montero
Noah A. Smith
Roy Schwartz
32
24
0
07 Nov 2022
MPCFormer: fast, performant and private Transformer inference with MPC
MPCFormer: fast, performant and private Transformer inference with MPC
Dacheng Li
Rulin Shao
Hongyi Wang
Han Guo
Eric P. Xing
Haotong Zhang
18
79
0
02 Nov 2022
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Rochelle Choenni
Dan Garrette
Ekaterina Shutova
24
2
0
31 Oct 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive
  Sparsity and Cost
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
40
3
0
27 Oct 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency
  with Slenderized Multi-exit Language Models
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
47
4
0
27 Oct 2022
Benchmarking Language Models for Code Syntax Understanding
Benchmarking Language Models for Code Syntax Understanding
Da Shen
Xinyun Chen
Chenguang Wang
Koushik Sen
Dawn Song
ELM
22
16
0
26 Oct 2022
Influence Functions for Sequence Tagging Models
Influence Functions for Sequence Tagging Models
Sarthak Jain
Varun Manjunatha
Byron C. Wallace
A. Nenkova
TDI
35
8
0
25 Oct 2022
IELM: An Open Information Extraction Benchmark for Pre-Trained Language
  Models
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models
Chenguang Wang
Xiao Liu
Dawn Song
VLM
24
2
0
25 Oct 2022
Exploring Self-Attention for Crop-type Classification Explainability
Exploring Self-Attention for Crop-type Classification Explainability
Ivica Obadic
R. Roscher
Dario Augusto Borges Oliveira
Xiao Xiang Zhu
30
7
0
24 Oct 2022
A BERT-based Deep Learning Approach for Reputation Analysis in Social
  Media
A BERT-based Deep Learning Approach for Reputation Analysis in Social Media
Mohammad Wali Ur Rahman
Sicong Shao
Pratik Satam
Salim Hariri
Chris Padilla
Zoe Taylor
C. Nevarez
25
5
0
23 Oct 2022
SLING: Sino Linguistic Evaluation of Large Language Models
SLING: Sino Linguistic Evaluation of Large Language Models
Yixiao Song
Kalpesh Krishna
R. Bhatt
Mohit Iyyer
24
8
0
21 Oct 2022
Enhancing Out-of-Distribution Detection in Natural Language
  Understanding via Implicit Layer Ensemble
Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble
Hyunsoo Cho
Choonghyun Park
Jaewoo Kang
Kang Min Yoo
Taeuk Kim
Sang-goo Lee
OODD
30
8
0
20 Oct 2022
Automatic Document Selection for Efficient Encoder Pretraining
Automatic Document Selection for Efficient Encoder Pretraining
Yukun Feng
Patrick Xia
Benjamin Van Durme
João Sedoc
58
7
0
20 Oct 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
48
156
0
19 Oct 2022
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Xu Yuan
Chengjun Xu
Qiwei Chen
Tao Zhuang
Hongjie Chen
Chong Li
Junfeng Ge
AI4TS
25
0
0
19 Oct 2022
Explainable Slot Type Attentions to Improve Joint Intent Detection and
  Slot Filling
Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling
Kalpa Gunaratna
Vijay Srinivasan
Akhila Yerukola
Hongxia Jin
29
6
0
19 Oct 2022
A Simple and Effective Method to Improve Zero-Shot Cross-Lingual
  Transfer Learning
A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning
Kunbo Ding
Weijie Liu
Yuejian Fang
Weiquan Mao
Zhe Zhao
Tao Zhu
Haoyan Liu
Rong Tian
Yiren Chen
43
8
0
18 Oct 2022
Improving Semantic Matching through Dependency-Enhanced Pre-trained
  Model with Adaptive Fusion
Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion
Jian Song
Di Liang
Rumei Li
Yun Li
Sirui Wang
Minlong Peng
Wei Wu
Yongxin Yu
35
12
0
16 Oct 2022
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe
  Completion using Cascaded Set Transformer
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer
Mogan Gim
Donghee Choi
Kana Maruyama
Jihun Choi
Hajung Kim
Donghyeon Park
Jaewoo Kang
48
5
0
14 Oct 2022
LSG Attention: Extrapolation of pretrained Transformers to long
  sequences
LSG Attention: Extrapolation of pretrained Transformers to long sequences
Charles Condevaux
S. Harispe
38
24
0
13 Oct 2022
On the Explainability of Natural Language Processing Deep Models
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
29
82
0
13 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model
  Fine-Tuning
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
32
3
0
12 Oct 2022
Shapley Head Pruning: Identifying and Removing Interference in
  Multilingual Transformers
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers
William B. Held
Diyi Yang
VLM
45
5
0
11 Oct 2022
Towards Structure-aware Paraphrase Identification with Phrase Alignment
  Using Sentence Encoders
Towards Structure-aware Paraphrase Identification with Phrase Alignment Using Sentence Encoders
Qiwei Peng
David J. Weir
Julie Weeds
23
3
0
11 Oct 2022
Characterization of anomalous diffusion through convolutional
  transformers
Characterization of anomalous diffusion through convolutional transformers
Nicolás Firbas
Òscar Garibo i Orts
M. Garcia-March
J. A. Conejero
38
18
0
10 Oct 2022
What the DAAM: Interpreting Stable Diffusion Using Cross Attention
What the DAAM: Interpreting Stable Diffusion Using Cross Attention
Raphael Tang
Linqing Liu
Akshat Pandey
Zhiying Jiang
Gefei Yang
K. Kumar
Pontus Stenetorp
Jimmy J. Lin
Ferhan Ture
29
167
0
10 Oct 2022
Metaphorical Paraphrase Generation: Feeding Metaphorical Language Models
  with Literal Texts
Metaphorical Paraphrase Generation: Feeding Metaphorical Language Models with Literal Texts
Giorgio Ottolina
John Pavlopoulos
26
1
0
10 Oct 2022
Parameter-Efficient Tuning with Special Token Adaptation
Parameter-Efficient Tuning with Special Token Adaptation
Xiaoocong Yang
James Y. Huang
Wenxuan Zhou
Muhao Chen
34
12
0
10 Oct 2022
Better Pre-Training by Reducing Representation Confusion
Better Pre-Training by Reducing Representation Confusion
Haojie Zhang
Mingfei Liang
Ruobing Xie
Zhen Sun
Bo Zhang
Leyu Lin
33
2
0
09 Oct 2022
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Siddhartha Brahma
Polina Zablotskaia
David M. Mimno
32
1
0
07 Oct 2022
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models
  for Programming Language Attend Code Structure
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure
Nuo Chen
Qiushi Sun
Renyu Zhu
Xiang Li
Xuesong Lu
Ming Gao
44
10
0
07 Oct 2022
Every word counts: A multilingual analysis of individual human alignment
  with model attention
Every word counts: A multilingual analysis of individual human alignment with model attention
Stephanie Brandl
Nora Hollenstein
40
11
0
05 Oct 2022
Unveiling the Black Box of PLMs with Semantic Anchors: Towards
  Interpretable Neural Semantic Parsing
Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing
L. Nie
Jiu Sun
Yanlin Wang
Lun Du
Lei Hou
Juanzi Li
Shi Han
Dongmei Zhang
Jidong Zhai
34
6
0
04 Oct 2022
Causal Proxy Models for Concept-Based Model Explanations
Causal Proxy Models for Concept-Based Model Explanations
Zhengxuan Wu
Karel DÓosterlinck
Atticus Geiger
Amir Zur
Christopher Potts
MILM
83
35
0
28 Sep 2022
Previous
123...789...161718
Next