Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.00071
Cited By
v1
v2 (latest)
YaRN: Efficient Context Window Extension of Large Language Models
31 August 2023
Bowen Peng
Jeffrey Quesnelle
Honglu Fan
Enrico Shippole
OSLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1489★)
Papers citing
"YaRN: Efficient Context Window Extension of Large Language Models"
50 / 199 papers shown
Title
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Zhen Xu
Shang Zhu
Jue Wang
Junlin Wang
Ben Athiwaratkun
Chi Wang
James Zou
Ce Zhang
LLMAG
17
0
0
19 Jun 2025
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim
Kyuhong Shim
Jungwook Choi
Simyung Chang
VLM
14
0
0
18 Jun 2025
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Xiaoran Liu
Zhigeng Liu
Zengfeng Huang
Qipeng Guo
Ziwei He
Xipeng Qiu
41
0
0
17 Jun 2025
Multipole Attention for Efficient Long Context Reasoning
Coleman Hooper
Sebastian Zhao
Luca Manolache
Sehoon Kim
Michael W. Mahoney
Y. Shao
Kurt Keutzer
Amir Gholami
OffRL
LRM
26
0
0
16 Jun 2025
Lag-Relative Sparse Attention In Long Context Training
Manlai Liang
Wanyi Huang
Mandi Liu
Huaijun Li
Jinlong Li
RALM
17
0
0
13 Jun 2025
Extrapolation by Association: Length Generalization Transfer in Transformers
Ziyang Cai
Nayoung Lee
Avi Schwarzschild
Samet Oymak
Dimitris Papailiopoulos
37
0
0
10 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
36
0
0
09 Jun 2025
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Y. Wu
Yushi Bai
Zhiqiang Hu
Juanzi Li
Roy Ka-wei Lee
66
0
0
04 Jun 2025
Native-Resolution Image Synthesis
Zidong Wang
Lei Bai
Xiangyu Yue
Wanli Ouyang
Yiyuan Zhang
74
0
0
03 Jun 2025
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
Yunzhu Zhang
Yu Lu
T. Wang
Fengyun Rao
Yi Yang
Linchao Zhu
VLM
49
0
0
01 Jun 2025
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
Woomin Song
Sai Muralidhar Jayanthi
S. Ronanki
Kanthashree Mysore Sathyendra
Jinwoo Shin
Aram Galstyan
Shubham Katiyar
S. Bodapati
VLM
52
0
0
01 Jun 2025
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
Mohamed S. Elaraby
Diane Litman
LLMAG
36
0
0
29 May 2025
Curse of High Dimensionality Issue in Transformer for Long-context Modeling
Shuhai Zhang
Zeng You
Yaofo Chen
Z. Wen
Qianyue Wang
Zhijie Qiu
Yuanqing Li
Mingkui Tan
44
0
0
28 May 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Zijun Liu
Zhennan Wan
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
LLMAG
89
0
0
27 May 2025
Understanding Transformer from the Perspective of Associative Memory
Shu Zhong
Mingyu Xu
Tenglong Ao
Guang Shi
47
1
0
26 May 2025
Rotary Masked Autoencoders are Versatile Learners
Uros Zivanovic
Serafina Di Gioia
Andre Scaffidi
Martín de los Rios
Gabriella Contardo
R. Trotta
35
0
0
26 May 2025
QwenLong-CPRS: Towards
∞
\infty
∞
-LLMs with Dynamic Context Optimization
Weizhou Shen
Chenliang Li
Fanqi Wan
Shengyi Liao
Shaopeng Lai
...
Bin Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
49
1
0
23 May 2025
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang
Zirui Liu
Hongye Jin
Qingyu Yin
Vipin Chaudhary
Xiaotian Han
ReLM
LRM
68
0
0
22 May 2025
From Evaluation to Defense: Advancing Safety in Video Large Language Models
Yiwei Sun
Peiqi Jiang
Chuanbin Liu
Luohao Lin
Zhiying Lu
Hongtao Xie
53
0
0
22 May 2025
LongMagpie: A Self-synthesis Method for Generating Large-scale Long-context Instructions
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
216
0
0
22 May 2025
Scale-invariant Attention
Ben Anson
Xi Wang
Laurence Aitchison
LRM
105
0
0
20 May 2025
Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning
Adam Štorek
Mukur Gupta
Samira Hajizadeh
Prashast Srivastava
Suman Jana
LRM
69
0
0
19 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
177
0
0
16 May 2025
Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
Julian Tanke
Takashi Shibuya
Kengo Uchida
Koichi Saito
Yuki Mitsufuji
Mamba
86
0
0
14 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
118
100
0
14 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
99
0
0
10 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
165
0
0
10 May 2025
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
Zihao Wang
Yibo Jiang
Jiahao Yu
Heqing Huang
104
0
0
01 May 2025
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
110
0
0
23 Apr 2025
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
Zhifan Ye
Kejing Xia
Yonggan Fu
Xin Dong
Jihoon Hong
Xiangchi Yuan
Shizhe Diao
Jan Kautz
Pavlo Molchanov
Yingyan Lin
Mamba
101
9
0
22 Apr 2025
SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling
Krishna Puvvada
Faisal Ladhak
Santiago Akle Serrano
Cheng-Ping Hsieh
Shantanu Acharya
...
Fei Jia
Samuel Kriman
Simeng Sun
Dima Rekesh
Boris Ginsburg
RALM
110
0
0
11 Apr 2025
Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models
Yu Fu
Haz Sameen Shahgir
Hui Liu
Xianfeng Tang
Qi He
Yue Dong
KELM
155
0
0
11 Apr 2025
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
102
0
0
03 Apr 2025
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
128
2
0
30 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
176
11
0
25 Mar 2025
Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings
Cong Liu
Liang Hou
Mingwu Zheng
Xin Tao
Pengfei Wan
Di Zhang
Kun Gai
63
0
0
24 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
100
1
0
24 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
102
0
0
19 Mar 2025
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
Wenlong Meng
Fan Zhang
Wendao Yao
Zhenyuan Guo
Yongqian Li
Chengkun Wei
Wenzhi Chen
AAML
120
5
0
11 Mar 2025
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao
Jingru Tan
Kaihuan Liang
Feizhao Zhang
Yazhe Niu
Jiahao Hu
Ruihao Gong
Dahua Lin
Ningyi Xu
98
0
0
10 Mar 2025
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Feng Wang
Zesheng Shi
Bo Wang
Nan Wang
Han Xiao
RALM
120
3
0
03 Mar 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
135
14
0
28 Feb 2025
TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
Tong Wu
Junzhe Shen
Zixia Jia
Yanjie Wang
Zilong Zheng
124
1
0
26 Feb 2025
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
Penghui Yang
Cunxiao Du
Fengzhuo Zhang
Haonan Wang
Tianyu Pang
Chao Du
Bo An
RALM
103
2
0
24 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
83
0
0
24 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
110
2
0
24 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
476
1
0
23 Feb 2025
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
Min Zhao
Guande He
Yixiao Chen
Hongzhou Zhu
Chong Li
Jun Zhu
VGen
132
11
0
21 Feb 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Andrey Kravchenko
134
4
0
17 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
475
7
0
04 Feb 2025
1
2
3
4
Next