Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.12528
Cited By
SirLLM: Streaming Infinite Retentive LLM
21 May 2024
Yao Yao
Z. Li
Hai Zhao
KELM
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SirLLM: Streaming Infinite Retentive LLM"
5 / 5 papers shown
Title
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
160
1
0
03 Apr 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
58
1
0
05 Feb 2025
An Evolved Universal Transformer Memory
Edoardo Cetin
Qi Sun
Tianyu Zhao
Yujin Tang
146
0
0
17 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
81
1
0
02 Oct 2024
Optimizing Retrieval-augmented Reader Models via Token Elimination
Moshe Berchansky
Peter Izsak
Avi Caciularu
Ido Dagan
Moshe Wasserblat
RALM
50
12
0
20 Oct 2023
1