Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.00071
Cited By
YaRN: Efficient Context Window Extension of Large Language Models
31 August 2023
Bowen Peng
Jeffrey Quesnelle
Honglu Fan
Enrico Shippole
OSLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"YaRN: Efficient Context Window Extension of Large Language Models"
50 / 178 papers shown
Title
Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning
Adam Štorek
Mukur Gupta
Samira Hajizadeh
Prashast Srivastava
Suman Jana
LRM
12
0
0
19 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
Jiaming Zhang
Fei Huang
25
0
0
16 May 2025
Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
Julian Tanke
Takashi Shibuya
Kengo Uchida
Koichi Saito
Yuki Mitsufuji
Mamba
47
0
0
14 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
51
10
0
14 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
42
0
0
10 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
59
0
0
10 May 2025
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
Zihao Wang
Yibo Jiang
Jiahao Yu
Heqing Huang
42
0
0
01 May 2025
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
61
0
0
23 Apr 2025
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
Zhifan Ye
Kejing Xia
Yonggan Fu
Xin Dong
Jihoon Hong
Xiangchi Yuan
Shizhe Diao
Jan Kautz
Pavlo Molchanov
Yingyan Lin
Mamba
49
4
0
22 Apr 2025
SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling
Krishna Puvvada
Faisal Ladhak
Santiago Akle Serrano
Cheng-Ping Hsieh
Shantanu Acharya
...
Fei Jia
Samuel Kriman
Simeng Sun
Dima Rekesh
Boris Ginsburg
RALM
65
0
0
11 Apr 2025
Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models
Yu Fu
Haz Sameen Shahgir
Hui Liu
Xianfeng Tang
Qi He
Yue Dong
KELM
62
0
0
11 Apr 2025
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
41
0
0
03 Apr 2025
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
47
1
0
30 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
87
3
0
25 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
57
0
0
24 Mar 2025
Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings
Cong Liu
Liang Hou
Mingwu Zheng
Xin Tao
Pengfei Wan
Di Zhang
Kun Gai
49
0
0
24 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
46
0
0
19 Mar 2025
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
Wenlong Meng
Fan Zhang
Wendao Yao
Zhenyuan Guo
Yong Li
Chengkun Wei
Wenzhi Chen
AAML
40
2
0
11 Mar 2025
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao
Jingru Tan
Kaihuan Liang
Feizhao Zhang
Yazhe Niu
Jiahao Hu
Ruihao Gong
Dahua Lin
Ningyi Xu
65
0
0
10 Mar 2025
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Feng Wang
Zesheng Shi
Bo Wang
Nan Wang
Han Xiao
RALM
81
1
0
03 Mar 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
71
6
0
28 Feb 2025
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
Tong Wu
Junzhe Shen
Zixia Jia
Yunhong Wang
Zilong Zheng
85
0
0
26 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
44
0
0
24 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
52
0
0
24 Feb 2025
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Penghui Yang
Cunxiao Du
Fengzhuo Zhang
Haonan Wang
Tianyu Pang
Chao Du
Bo An
RALM
47
0
0
24 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
242
0
0
23 Feb 2025
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
Min Zhao
Guande He
Yixiao Chen
Hongzhou Zhu
Chong Li
Jun Zhu
VGen
69
6
0
21 Feb 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Andrey Kravchenko
68
3
0
17 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
248
4
0
04 Feb 2025
SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
Changhun Lee
Jun-gyu Jin
Younghyun Cho
Eunhyeok Park
LRM
56
0
0
28 Jan 2025
Qwen2.5-1M Technical Report
An Yang
Bowen Yu
Chong Li
Dayiheng Liu
Fei Huang
...
Xingzhang Ren
Xinlong Yang
You Li
Zhiying Xu
Zizhuo Zhang
76
12
0
28 Jan 2025
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
Hamed Firooz
Maziar Sanjabi
Adrian Englhardt
Aman Gupta
Ben Levine
...
Xiaoling Zhai
Ya Xu
Yu Wang
Yun Dai
Yun Dai
ALM
53
3
0
27 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Zhicheng Dou
MQ
46
0
0
22 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
145
2
0
20 Jan 2025
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari
Chun-Liang Li
Jen-Hao Rick Chang
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Oncel Tuzel
42
8
0
08 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Zhilin Wang
KELM
53
1
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
72
24
0
31 Dec 2024
LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning
Yansheng Mao
Jiaqi Li
Fanxu Meng
Jing Xiong
Zilong Zheng
Muhan Zhang
LLMAG
RALM
104
1
0
18 Dec 2024
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Tianwei Yin
Qiang Zhang
Richard Zhang
William T. Freeman
F. Durand
Eli Shechtman
Xun Huang
VGen
DiffM
86
5
0
10 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
100
4
0
09 Dec 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
102
21
0
20 Nov 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang
Qian Liu
Chao Du
Tongyao Zhu
Cunxiao Du
Kenji Kawaguchi
Tianyu Pang
115
6
0
20 Nov 2024
Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding
Zeyd Boukhers
AmeerAli Khan
Qusai Ramadan
Cong Yang
36
0
0
11 Nov 2024
LongSafety: Enhance Safety for Long-Context LLMs
Mianqiu Huang
Xiaoran Liu
Shaojun Zhou
Mozhi Zhang
Chenkun Tan
...
Zhikai Lei
Linlin Li
Qiang Liu
Yaqian Zhou
Xipeng Qiu
ELM
ALM
46
2
0
11 Nov 2024
TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network
Nouf Alabbasi
Omar Erak
Omar Alhussein
Ismail Lotfi
Sami Muhaidat
Merouane Debbah
RALM
234
0
0
04 Nov 2024
RuAG: Learned-rule-augmented Generation for Large Language Models
Yudi Zhang
Pei Xiao
Lu Wang
C. Zhang
Meng Fang
...
Qingwei Lin
Mykola Pechenizkiy
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
LRM
43
0
0
04 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
69
6
0
31 Oct 2024
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
29
0
0
25 Oct 2024
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Zecheng Tang
Zechen Sun
Juntao Li
Qiaoming Zhu
Min Zhang
37
1
0
24 Oct 2024
Mitigating Object Hallucination via Concentric Causal Attention
Yun Xing
Yiheng Li
Ivan Laptev
Shijian Lu
53
18
0
21 Oct 2024
1
2
3
4
Next