Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.17422
Cited By
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
25 September 2024
Zhenmei Shi
Yifei Ming
Xuan-Phi Nguyen
Yingyu Liang
Shafiq Joty
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction"
22 / 22 papers shown
Title
PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference
Weisheng Jin
Maojia Song
Tej Deep Pala
Yew Ken Chia
Amir Zadeh
Chuan Li
Soujanya Poria
VLM
57
0
0
30 Mar 2025
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Jiangxuan Long
Zhao-quan Song
Chiwun Yang
AI4TS
162
0
0
18 Mar 2025
Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows
Chengyue Gong
Xiaoyu Li
Yingyu Liang
Jiangxuan Long
Zhenmei Shi
Zhao-quan Song
Yu Tian
56
3
0
12 Mar 2025
Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving
Qihui Zhou
Peiqi Yin
Pengfei Zuo
James Cheng
CLL
40
1
0
01 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
36
3
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
96
18
0
21 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao-quan Song
Chiwun Yang
VGen
46
2
0
01 Feb 2025
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
Weizhi Fei
Xueyan Niu
Guoqing Xie
Yingqing Liu
Bo Bai
Wei Han
33
1
0
22 Jan 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
61
11
0
03 Jan 2025
Theoretical Constraints on the Expressive Power of
R
o
P
E
\mathsf{RoPE}
RoPE
-based Tensor Attention Transformers
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Mingda Wan
95
8
0
23 Dec 2024
Membership Inference Attack against Long-Context Large Language Models
Zixiong Wang
Gaoyang Liu
Yang Yang
Chen Wang
81
1
0
18 Nov 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
37
2
0
28 Oct 2024
Lossless KV Cache Compression to 2%
Zhen Yang
Jizong Han
Kan Wu
Ruobing Xie
An Wang
Xingchen Sun
Zhanhui Kang
VLM
MQ
31
2
0
20 Oct 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
69
3
0
15 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
96
19
0
15 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
95
18
0
14 Oct 2024
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
54
15
0
12 Oct 2024
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang
Zhihao Zhang
Zhuofu Chen
Zikun Li
Zhihao Jia
45
4
0
07 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Shafiq R. Joty
HILM
112
16
0
30 Sep 2024
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
74
22
0
22 Aug 2024
Differentially Private Attention Computation
Yeqi Gao
Zhao-quan Song
Xin Yang
47
19
0
08 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao-quan Song
Yu Xia
Tong Yu
Dinesh Manocha
33
36
0
26 Apr 2023
1