Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 214 papers shown

Title
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments Ibne Farabi Shihab Sanjeda Akter Anuj Sharma Mamba 50 0 0 13 May 2025
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation Chiara Manna Afra Alishahi Frédéric Blain Eva Vanmassenhove 27 0 0 13 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Zihan Qiu Zhaoxiang Wang Bo Zheng Zeyu Huang Kaiyue Wen ... Fei Huang Suozhi Huang Dayiheng Liu Jingren Zhou Junyang Lin MoE 31 0 0 10 May 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability Sehyeong Jo Gangjae Jang Haesol Park 32 0 0 28 Apr 2025
Hallucination Detection in LLMs via Topological Divergence on Attention Graphs Alexandra Bazarova Aleksandr Yugay Andrey Shulga A. Ermilova Andrei Volodichev ... Dmitry Simakov M. Savchenko Andrey Savchenko Serguei Barannikov Alexey Zaytsev HILM 33 0 0 14 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao Shuaishuai Zu 43 0 0 11 Apr 2025
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs Pedro Sandoval-Segura Xijun Wang Ashwinee Panda Micah Goldblum Ronen Basri Tom Goldstein David Jacobs 22 0 0 04 Apr 2025
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles Chen Wei Kuo Kevin Chu Nouar Aldahoul Hazem Ibrahim Talal Rahwan Yasir Zaki SyDa 60 0 0 04 Apr 2025
Language Models at the Syntax-Semantics Interface: A Case Study of the Long-Distance Binding of Chinese Reflexive ziji Xiulin Yang 40 0 0 02 Apr 2025
Are formal and functional linguistic mechanisms dissociated in language models? Michael Hanna Sandro Pezzelle Yonatan Belinkov 50 0 0 14 Mar 2025
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models Itay Benou Tammy Riklin-Raviv 67 0 0 27 Feb 2025
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs Xuan Ding Rui Sun Yunjian Zhang Xiu Yan Yueqi Zhou Kaihao Huang Suzhong Fu Angelica I Aviles-Rivero Chuanlong Xie Yao Zhu 131 1 0 26 Feb 2025
Selective Prompt Anchoring for Code Generation Yuan Tian Tianyi Zhang 94 3 0 24 Feb 2025
EvoP: Robust LLM Inference via Evolutionary Pruning Shangyu Wu Hongchao Du Ying Xiong Shuai Chen Tei-Wei Kuo Nan Guan Chun Jason Xue 34 1 0 19 Feb 2025
Exploring Translation Mechanism of Large Language Models Hongbin Zhang Kehai Chen Xuefeng Bai Xiucheng Li Yang Xiang Min Zhang 67 1 0 17 Feb 2025
Learning Task Representations from In-Context Learning Baturay Saglam Zhuoran Yang Dionysis Kalogerias Amin Karbasi 60 1 0 08 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Alireza Amiri Xinting Huang Mark Rofin Michael Hahn LRM 201 0 0 04 Feb 2025
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference Yuan Feng Junlin Lv Yukun Cao Xike Xie S. K. Zhou VLM 61 27 0 28 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers Neha Verma Kenton W. Murray Kevin Duh AI4CE 50 0 0 10 Jan 2025
CURing Large Models: Compression via CUR Decomposition Sanghyeon Park Soo-Mook Moon 41 0 0 08 Jan 2025
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit Zeqing He Zhibo Wang Zhixuan Chu Huiyu Xu Rui Zheng Kui Ren Chun Chen 57 3 0 17 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition Lorenzo Basile Valentino Maiorca Luca Bortolussi Emanuele Rodolà Francesco Locatello 48 1 0 31 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin Bo Zhu Li Yuan Shuicheng Yan MoE 31 13 0 15 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer Sudhakar Sah Ravish Kumar Honnesh Rohmetra Ehsan Saboori ViT 26 1 0 12 Oct 2024
Explanation Bottleneck Models Shinýa Yamaguchi Kosuke Nishida LRM BDL 51 1 0 26 Sep 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models Jian Gao Xiao Zhang Ji Wu Miao Li 43 0 0 26 Sep 2024
Collaborative Learning for Enhanced Unsupervised Domain Adaptation Minhee Cho Hyesong Choi Hayeon Jo Dongbo Min 27 1 0 04 Sep 2024
Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction Melkamu Mersha Khang Lam Joseph Wood Ali AlShami Jugal Kalita XAI AI4TS 74 28 0 30 Aug 2024
Isomorphic Pruning for Vision Models Gongfan Fang Xinyin Ma Michael Bi Mi Xinchao Wang VLM ViT 42 6 0 05 Jul 2024
Reasoning in Large Language Models: A Geometric Perspective Romain Cosentino Sarath Shekkizhar LRM 44 2 0 02 Jul 2024
Inpainting the Gaps: A Novel Framework for Evaluating Explanation Methods in Vision Transformers Lokesh Badisa Sumohana S. Channappayya 45 0 0 17 Jun 2024
Investigating the translation capabilities of Large Language Models trained on parallel data only Javier García Gilabert Carlos Escolano Aleix Sant Savall Francesca de Luca Fornaciari Audrey Mash Xixian Liao Maite Melero LRM 42 2 0 13 Jun 2024
Attention as a Hypernetwork Simon Schug Seijin Kobayashi Yassir Akram João Sacramento Razvan Pascanu GNN 37 3 0 09 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP Yossi Gandelsman Alexei A. Efros Jacob Steinhardt MILM 62 16 0 06 Jun 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers Lorenzo Tiberi Francesca Mignacco Kazuki Irie H. Sompolinsky 44 6 0 24 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate A. Fuller Daniel G. Kyrollos Yousef Yassin James R. Green 52 2 0 22 May 2024
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis Yao Fu 32 19 0 14 May 2024
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Abhinav Agarwalla Abhay Gupta Alexandre Marques Shubhra Pandit Michael Goin ... Tuan Nguyen Mahmoud Salem Dan Alistarh Sean Lie Mark Kurtz MoE SyDa 40 11 0 06 May 2024
Efficiently Distilling LLMs for Edge Applications Achintya Kundu Fabian Lim Aaron Chew L. Wynter Penny Chong Rhui Dih Lee 42 6 0 01 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers Andrey Gromov Kushal Tirumala Hassan Shapourian Paolo Glorioso Daniel A. Roberts 52 81 0 26 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models Carlo Nicolini Jacopo Staiano Bruno Lepri Raffaele Marino MoE 34 1 0 13 Mar 2024
Explainable Learning with Gaussian Processes Kurt Butler Guanchao Feng P. Djuric 39 1 0 11 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM Hao Kang Qingru Zhang Souvik Kundu Geonhwa Jeong Zaoxing Liu Tushar Krishna Tuo Zhao MQ 43 81 0 08 Mar 2024
Where does In-context Translation Happen in Large Language Models Suzanna Sia David Mueller Kevin Duh LRM 41 0 0 07 Mar 2024
Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations Stephanie Brandl Oliver Eberle Tiago F. R. Ribeiro Anders Søgaard Nora Hollenstein 40 1 0 29 Feb 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models Amit Dhurandhar Tejaswini Pedapati Ronny Luss Soham Dan Aurélie C. Lozano Payel Das Georgios Kollias 22 3 0 28 Feb 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization T. Yasuda Kyriakos Axiotis Gang Fu M. Bateni Vahab Mirrokni 44 0 0 27 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Yuhui Li Fangyun Wei Chao Zhang Hongyang R. Zhang 44 123 0 26 Jan 2024
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges Wang Chao Jiaxuan Zhao Licheng Jiao Lingling Li Fang Liu Shuyuan Yang 75 13 0 19 Jan 2024
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language Leonard Salewski A. Sophia Koepke Hendrik P. A. Lensch Zeynep Akata 34 2 0 08 Nov 2023