ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18799
  4. Cited By
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

24 May 2025
Hao Chen
Haoze Li
Zhiqing Xiao
Lirong Gao
Qi Zhang
Xiaomeng Hu
Ningtao Wang
Xing Fu
Junbo Zhao
ArXivPDFHTML

Papers citing "ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models"

39 / 39 papers shown
Title
Not All Heads Matter: A Head-Level KV Cache Compression Method with
  Integrated Retrieval and Reasoning
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Yu Fu
Zefan Cai
Abedelkadir Asi
Wayne Xiong
Yue Dong
Wen Xiao
51
21
0
25 Oct 2024
Understanding Layer Significance in LLM Alignment
Understanding Layer Significance in LLM Alignment
Guangyuan Shi
Zexin Lu
Xiaoyu Dong
Wenlong Zhang
Xuanyu Zhang
Yujie Feng
Xiao-Ming Wu
84
3
0
23 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Sihang Li
Yongbin Li
90
8
0
17 Oct 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
84
261
0
18 Sep 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via
  Self-Improvement
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
...
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
58
251
0
18 Sep 2024
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang
Yang Lin
Jing Lin
Qingsen Han
Shikuan Hong
Yiwu Yao
Gongyi Wang
MQ
60
31
0
22 Jul 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
  Language Model
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
...
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
MoE
76
447
0
07 May 2024
Retrieval Head Mechanistically Explains Long-Context Factuality
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu
Yizhong Wang
Guangxuan Xiao
Hao-Chun Peng
Yao Fu
LRM
46
72
0
24 Apr 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
75
953
0
05 Feb 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
  Rise of Code Intelligence
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Yiming Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
82
735
0
25 Jan 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
64
627
0
20 Nov 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction
  Tuning
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMat
LRM
97
391
0
11 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
49
437
0
02 Sep 2023
Code Llama: Open Foundation Models for Code
Code Llama: Open Foundation Models for Code
Baptiste Rozière
Jonas Gehring
Fabian Gloeckle
Sten Sootla
Itai Gat
...
Hugo Touvron
Louis Martin
Nicolas Usunier
Thomas Scialom
Gabriel Synnaeve
ELM
ALM
77
1,990
0
24 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
201
11,636
0
18 Jul 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
87
511
0
23 May 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
57
626
0
22 May 2023
LIMA: Less Is More for Alignment
LIMA: Less Is More for Alignment
Chunting Zhou
Pengfei Liu
Puxin Xu
Srini Iyer
Jiao Sun
...
Susan Zhang
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
Omer Levy
ALM
71
819
0
18 May 2023
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low
  Training Data Instruction Tuning
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning
Haowen Chen
Yiming Zhang
Qi Zhang
Hantao Yang
Xiaomeng Hu
Xuetao Ma
Yifan YangGong
Jiaqi Zhao
ALM
81
50
0
16 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
178
203
0
02 May 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
201
859
0
02 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
531
13,788
0
15 Mar 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
61
75
0
12 Jan 2023
Large Language Models Encode Clinical Knowledge
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MA
ELM
AI4MH
101
2,283
0
26 Dec 2022
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALM
SyDa
LRM
74
2,166
0
20 Dec 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
186
4,175
0
27 Oct 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
88
1,893
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
150
5,328
0
07 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
230
10,099
0
17 Jun 2021
Causal Abstractions of Neural Networks
Causal Abstractions of Neural Networks
Atticus Geiger
Hanson Lu
Thomas Icard
Christopher Potts
NAI
CML
50
234
0
06 Jun 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
117
2,109
0
05 Mar 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
135
4,222
0
07 Sep 2020
What Does BERT Look At? An Analysis of BERT's Attention
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
186
1,586
0
11 Jun 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
73
1,051
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
76
1,120
0
23 May 2019
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
69
2,474
0
14 Mar 2018
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
124
3,090
0
15 Dec 2017
Wasserstein GAN
Wasserstein GAN
Martín Arjovsky
Soumith Chintala
Léon Bottou
GAN
135
4,822
0
26 Jan 2017
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
236
19,523
0
09 Mar 2015
1