ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.08454
  4. Cited By
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on
  Long-Context Tasks

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

11 July 2024
Zheng Wang
Boxiao Jin
Zhongzhi Yu
Minjia Zhang
    MoMe
ArXivPDFHTML

Papers citing "Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks"

12 / 12 papers shown
Title
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Zehao Fan
Garrett Gagnon
Zhenyu Liu
Liu Liu
29
0
0
09 May 2025
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
Jushi Kai
Boyi Zeng
Yue Wang
Haoli Bai
Bo Jiang
Bo Jiang
Zhouhan Lin
42
0
0
01 May 2025
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
163
1
0
03 Apr 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
60
0
0
14 Mar 2025
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Yu Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Zehua Wang
Hui Xiong
Hui Xiong
LLMAG
36
5
0
05 Nov 2024
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin
Zihao Zeng
Zipeng Xiao
Siqi Kou
Tianqi Hou
Xiaofeng Gao
Hao Zhang
Zhijie Deng
16
2
0
16 Oct 2024
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed
  KV Caches for Chunked Text
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
31
14
0
10 Oct 2024
House of Cards: Massive Weights in LLMs
House of Cards: Massive Weights in LLMs
Jaehoon Oh
Seungjun Shin
Dokwan Oh
37
1
0
02 Oct 2024
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large
  Language Models without Training through Attention Calibration
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu
Zheng Wang
Yonggan Fu
Huihong Shi
Khalid Shaikh
Yingyan Celine Lin
49
20
0
22 Jun 2024
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan
Xinjian Wu
Yu Zhang
Yi Xin
Chaofan Tao
...
Xin Wang
Siqi Luo
Jing Xiong
Mi Zhang
Mi Zhang
29
0
0
18 Jun 2024
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware
  Mixed Precision Quantization
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
J. Yang
Byeongwook Kim
Jeongin Bae
Beomseok Kwon
Gunho Park
Eunho Yang
S. Kwon
Dongsoo Lee
MQ
39
45
0
28 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
1