Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.00071
Cited By
v1
v2 (latest)
YaRN: Efficient Context Window Extension of Large Language Models
31 August 2023
Bowen Peng
Jeffrey Quesnelle
Honglu Fan
Enrico Shippole
OSLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1489★)
Papers citing
"YaRN: Efficient Context Window Extension of Large Language Models"
50 / 199 papers shown
Title
Qwen2.5-1M Technical Report
An Yang
Bowen Yu
Chong Li
Dayiheng Liu
Fei Huang
...
Xingzhang Ren
Xinlong Yang
You Li
Zhiying Xu
Zizhuo Zhang
141
29
0
28 Jan 2025
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
Hamed Firooz
Maziar Sanjabi
Adrian Englhardt
Aman Gupta
Ben Levine
...
Xiaoling Zhai
Ya Xu
Yu Wang
Yun Dai
Yun Dai
ALM
145
4
0
27 Jan 2025
SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
Changhun Lee
Jun-gyu Jin
Jun-gyu Jin
Younghyun Cho
Eunhyeok Park
RALM
LRM
119
0
0
25 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
73
0
0
22 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
225
3
0
20 Jan 2025
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari
Chun-Liang Li
Jen-Hao Rick Chang
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Oncel Tuzel
95
11
0
08 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Ziyi Wang
KELM
112
1
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
134
26
0
31 Dec 2024
LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning
Yansheng Mao
Jiaqi Li
Fanxu Meng
Jing Xiong
Zilong Zheng
Muhan Zhang
LLMAG
RALM
172
1
0
18 Dec 2024
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Tianwei Yin
Qiang Zhang
Richard Zhang
William T. Freeman
F. Durand
Eli Shechtman
Xun Huang
VGen
DiffM
188
11
0
10 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
249
6
0
09 Dec 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
164
27
0
20 Nov 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang
Qian Liu
Chao Du
Tongyao Zhu
Cunxiao Du
Kenji Kawaguchi
Tianyu Pang
231
8
0
20 Nov 2024
Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding
Zeyd Boukhers
AmeerAli Khan
Qusai Ramadan
Cong Yang
45
1
0
11 Nov 2024
LongSafety: Enhance Safety for Long-Context LLMs
Mianqiu Huang
Xiaoran Liu
Shaojun Zhou
Mozhi Zhang
Chenkun Tan
...
Zhikai Lei
Linlin Li
Qiang Liu
Yaqian Zhou
Xipeng Qiu
ELM
ALM
68
0
0
11 Nov 2024
TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network
Nouf Alabbasi
Omar Erak
Omar Alhussein
Ismail Lotfi
Sami Muhaidat
Merouane Debbah
RALM
455
0
0
04 Nov 2024
RuAG: Learned-rule-augmented Generation for Large Language Models
Yudi Zhang
Pei Xiao
Lu Wang
Chen Zhang
Meng Fang
...
Qingwei Lin
Mykola Pechenizkiy
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
LRM
78
1
0
04 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
157
13
0
31 Oct 2024
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
70
0
0
25 Oct 2024
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Zecheng Tang
Zechen Sun
Juntao Li
Qiaoming Zhu
Min Zhang
79
2
0
24 Oct 2024
Mitigating Object Hallucination via Concentric Causal Attention
Yun Xing
Yiheng Li
Ivan Laptev
Shijian Lu
108
23
0
21 Oct 2024
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Xin Ma
Yang Liu
Qingbin Liu
Xiaoxu Ma
46
1
0
21 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
126
6
0
17 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
148
28
0
17 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
265
22
0
14 Oct 2024
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
Nanne van Noord
Marcel Worring
Cees G. M. Snoek
VLM
143
4
0
13 Oct 2024
ACER: Automatic Language Model Context Extension via Retrieval
Luyu Gao
Yunyi Zhang
Jamie Callan
RALM
56
0
0
11 Oct 2024
FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding
Jingyang Deng
Zhengyang Shen
Boyang Wang
Lixin Su
Suqi Cheng
Ying Nie
Junfeng Wang
D. Yin
Jinwen Ma
62
1
0
09 Oct 2024
Stuffed Mamba: Oversized States Lead to the Inability to Forget
Yingfa Chen
Xinrong Zhang
Shengding Hu
Xu Han
Zhiyuan Liu
Maosong Sun
Mamba
114
2
0
09 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
83
3
0
07 Oct 2024
PECAN: LLM-Guided Dynamic Progress Control with Attention-Guided Hierarchical Weighted Graph for Long-Document QA
Xinyu Wang
Yanzheng Xiang
Lin Gui
Yulan He
84
2
0
07 Oct 2024
Accelerating Inference of Networks in the Frequency Domain
Chenqiu Zhao
Guanfang Dong
Anup Basu
122
20
0
06 Oct 2024
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Zixuan Li
Jing Xiong
Fanghua Ye
Chuanyang Zheng
Xun Wu
...
Xiaodan Liang
Chengming Li
Zhenan Sun
Lingpeng Kong
Ngai Wong
RALM
UQLM
102
2
0
03 Oct 2024
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?
Zecheng Tang
Keyan Zhou
Juntao Li
Baibei Ji
Jianye Hou
Min Zhang
85
2
0
03 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
143
37
0
03 Oct 2024
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
202
48
0
03 Oct 2024
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models
David Castillo-Bolado
Joseph Davidson
Finlay Gray
Marek Rosa
51
9
0
30 Sep 2024
Visual Context Window Extension: A New Perspective for Long Video Understanding
Hongchen Wei
Zhenzhong Chen
VLM
88
6
0
30 Sep 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Haoran Que
Feiyu Duan
Liqun He
Yutao Mou
Wangchunshu Zhou
...
Ge Zhang
Junran Peng
Zhaoxiang Zhang
Songyang Zhang
Kai Chen
LM&MA
ELM
VLM
106
16
0
24 Sep 2024
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
482
2
0
20 Sep 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
110
9
0
19 Sep 2024
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
121
337
0
18 Sep 2024
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
Zihan Liao
Jun Wang
Hang Yu
Lingxiao Wei
Jianguo Li
Jun Wang
Wei Zhang
67
3
0
10 Sep 2024
You Only Use Reactive Attention Slice For Long Context Retrieval
Yun Joon Soh
Hanxian Huang
Yuandong Tian
Jishen Zhao
RALM
70
0
0
03 Sep 2024
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
Zhi Chen
Qiguang Chen
Libo Qin
Qipeng Guo
Haijun Lv
Yicheng Zou
Wanxiang Che
Hang Yan
Kai Chen
Dahua Lin
SyDa
130
4
0
03 Sep 2024
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
Zhiyuan Hu
Yuliang Liu
Jinman Zhao
Suyuchen Wang
Yan Wang
...
Qing Gu
Anh Tuan Luu
See-Kiong Ng
Zhiwei Jiang
Bryan Hooi
150
13
0
31 Aug 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
Hari Subramoni
D. Panda
102
2
0
30 Aug 2024
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
M. Russak
Umar Jamil
Christopher Bryant
Kiran Kamble
Axel Magnuson
Mateusz Russak
Waseem Alshikh
58
3
0
27 Aug 2024
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai
Jiajie Zhang
Xin Lv
Linzhi Zheng
Siqi Zhu
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
VGen
LLMAG
ALM
100
55
0
13 Aug 2024
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
Zhiwen Mo
Lei Wang
Jianyu Wei
Zhichen Zeng
Shijie Cao
...
Naifeng Jing
Ting Cao
Jilong Xue
Fan Yang
Mao Yang
120
4
0
12 Aug 2024
Previous
1
2
3
4
Next