ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.11207
  4. Cited By
Self-Attention Attribution: Interpreting Information Interactions Inside
  Transformer

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

23 April 2020
Y. Hao
Li Dong
Furu Wei
Ke Xu
    ViT
ArXivPDFHTML

Papers citing "Self-Attention Attribution: Interpreting Information Interactions Inside Transformer"

34 / 34 papers shown
Title
ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers
ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers
Gavin Hull
Alex Bihlo
29
0
0
13 May 2025
Hierarchical Attention Network for Interpretable ECG-based Heart Disease Classification
Hierarchical Attention Network for Interpretable ECG-based Heart Disease Classification
Mario Padilla Rodriguez
Mohamed Nafea
28
0
0
25 Mar 2025
Revealing and Mitigating Over-Attention in Knowledge Editing
Revealing and Mitigating Over-Attention in Knowledge Editing
Pinzheng Wang
Zecheng Tang
Keyan Zhou
J. Li
Qiaoming Zhu
Hao Fei
KELM
120
2
0
21 Feb 2025
Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
Tobias Leemann
Alina Fastowski
Felix Pfeiffer
Gjergji Kasneci
62
4
0
10 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
91
0
0
02 Jan 2025
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Samuele Poppi
Zheng-Xin Yong
Yifei He
Bobbie Chern
Han Zhao
Aobo Yang
Jianfeng Chi
AAML
53
15
0
23 Oct 2024
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Nan Huang
Haishuai Wang
Zihuai He
Marinka Zitnik
Xiang Zhang
AI4TS
OOD
31
1
0
03 Oct 2024
SRViT: Vision Transformers for Estimating Radar Reflectivity from
  Satellite Observations at Scale
SRViT: Vision Transformers for Estimating Radar Reflectivity from Satellite Observations at Scale
Jason Stock
Kyle Hilburn
Imme Ebert-Uphoff
Charles Anderson
40
1
0
20 Jun 2024
Are Human Conversations Special? A Large Language Model Perspective
Are Human Conversations Special? A Large Language Model Perspective
Toshish Jawale
Chaitanya Animesh
Sekhar Vallath
Kartik Talamadupula
Larry Heck
38
2
0
08 Mar 2024
Interpreting and Exploiting Functional Specialization in Multi-Head
  Attention under Multi-task Learning
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
38
4
0
16 Oct 2023
Concise and Organized Perception Facilitates Reasoning in Large Language Models
Concise and Organized Perception Facilitates Reasoning in Large Language Models
Junjie Liu
Shaotian Yan
Chen Shen
Zhengdong Xiao
Wenxiao Wang
Jieping Ye
Jieping Ye
LRM
26
1
0
05 Oct 2023
Interpretability-Aware Vision Transformer
Interpretability-Aware Vision Transformer
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
85
7
0
14 Sep 2023
Instruction Position Matters in Sequence Generation with Large Language
  Models
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
54
8
0
23 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for
  Multimodal Analysis: a Case Study on Hateful Memes
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes
Yosuke Miyanishi
Minh Le Nguyen
34
2
0
19 Aug 2023
PMET: Precise Model Editing in a Transformer
PMET: Precise Model Editing in a Transformer
Xiaopeng Li
Shasha Li
Shezheng Song
Jing Yang
Jun Ma
Jie Yu
KELM
34
119
0
17 Aug 2023
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical
  Attention Bias
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias
Mario Almagro
Emilio Almazán
Diego Ortego
David Jiménez
29
3
0
06 Jul 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
How Does Attention Work in Vision Transformers? A Visual Analytics
  Attempt
How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
Yiran Li
Junpeng Wang
Xin Dai
Liang Wang
Chin-Chia Michael Yeh
Yan Zheng
Wei Zhang
Kwan-Liu Ma
ViT
20
23
0
24 Mar 2023
Interpretability in Activation Space Analysis of Transformers: A Focused
  Survey
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
35
3
0
22 Jan 2023
Spatio-Temporal Attention in Multi-Granular Brain Chronnectomes for
  Detection of Autism Spectrum Disorder
Spatio-Temporal Attention in Multi-Granular Brain Chronnectomes for Detection of Autism Spectrum Disorder
James Orme-Rogers
Ajitesh Srivastava
6
3
0
30 Oct 2022
Explainable Slot Type Attentions to Improve Joint Intent Detection and
  Slot Filling
Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling
Kalpa Gunaratna
Vijay Srinivasan
Akhila Yerukola
Hongxia Jin
29
6
0
19 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model
  Fine-Tuning
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
32
3
0
12 Oct 2022
TransPolymer: a Transformer-based language model for polymer property
  predictions
TransPolymer: a Transformer-based language model for polymer property predictions
Changwen Xu
Yuyang Wang
A. Farimani
27
86
0
03 Sep 2022
FedMCSA: Personalized Federated Learning via Model Components
  Self-Attention
FedMCSA: Personalized Federated Learning via Model Components Self-Attention
Qianling Guo
Yong Qi
Saiyu Qi
Di Wu
Qian Li
FedML
21
9
0
23 Aug 2022
What does Transformer learn about source code?
What does Transformer learn about source code?
Kechi Zhang
Ge Li
Zhi Jin
ViT
28
8
0
18 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Yunzhi Yao
Shaohan Huang
Li Dong
Furu Wei
Huajun Chen
Ningyu Zhang
KELM
MedIm
31
42
0
15 Jan 2022
Identifying and Mitigating Spurious Correlations for Improving
  Robustness in NLP Models
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
Tianlu Wang
Rohit Sridhar
Diyi Yang
Xuezhi Wang
AAML
120
72
0
14 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to
  CNNs
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
47
79
0
06 Oct 2021
Attributing Fair Decisions with Attention Interventions
Attributing Fair Decisions with Attention Interventions
Ninareh Mehrabi
Umang Gupta
Fred Morstatter
Greg Ver Steeg
Aram Galstyan
32
21
0
08 Sep 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
68
2,749
0
15 Jun 2021
Knowledge Neurons in Pretrained Transformers
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
28
417
0
18 Apr 2021
On the Robustness of Vision Transformers to Adversarial Examples
On the Robustness of Vision Transformers to Adversarial Examples
Kaleel Mahmood
Rigel Mahmood
Marten van Dijk
ViT
30
218
0
31 Mar 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1