ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.14992
  4. Cited By
Efficient Pretraining Length Scaling

Efficient Pretraining Length Scaling

21 April 2025
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
ArXivPDFHTML

Papers citing "Efficient Pretraining Length Scaling"

46 / 46 papers shown
Title
LLM Pretraining with Continuous Concepts
LLM Pretraining with Continuous Concepts
Jihoon Tack
Jack Lanchantin
Jane Dwivedi-Yu
Andrew Cohen
Ilia Kulikov
Janice Lan
Shibo Hao
Yuandong Tian
Jason Weston
Xian Li
CLL
111
4
0
12 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
318
1,611
0
22 Jan 2025
MiniMax-01: Scaling Foundation Models with Lightning Attention
MiniMax-01: Scaling Foundation Models with Lightning Attention
MiniMax
Aonian Li
Bangwei Gong
Bo Yang
Bo Shen
...
Zhan Qin
Zhenhua Fan
Zhihang Yu
Z. L. Jiang
Zijia Wu
MoE
120
38
0
14 Jan 2025
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
113
144
0
11 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
129
109
0
02 Jul 2024
Step-level Value Preference Optimization for Mathematical Reasoning
Step-level Value Preference Optimization for Mathematical Reasoning
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
LRM
62
39
0
16 Jun 2024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang
Yilong Zhao
Kan Zhu
Guangxuan Xiao
Baris Kasikci
Song Han
71
93
0
16 Jun 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
  Language Model
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
...
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
MoE
78
456
0
07 May 2024
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
107
191
0
22 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
108
1,193
0
22 Apr 2024
PSYDIAL: Personality-based Synthetic Dialogue Generation using Large
  Language Models
PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models
Ji-Eun Han
Jun-Seok Koh
Hyeon-Tae Seo
Du-Seong Chang
Kyung-Ah Sohn
70
7
0
01 Apr 2024
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic
  Prompt Compression
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Zhuoshi Pan
Qianhui Wu
Huiqiang Jiang
Menglin Xia
Xufang Luo
...
Yuqing Yang
Chin-Yew Lin
H. Vicky Zhao
Lili Qiu
Dongmei Zhang
VLM
74
110
0
19 Mar 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAG
ReLM
LRM
91
140
0
14 Mar 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
108
1,099
0
05 Feb 2024
SparQ Attention: Bandwidth-Efficient LLM Inference
SparQ Attention: Bandwidth-Efficient LLM Inference
Luka Ribar
Ivan Chelombiev
Luke Hudlass-Galley
Charlie Blake
Carlo Luschi
Douglas Orr
82
51
0
08 Dec 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
74
662
0
20 Nov 2023
Striped Attention: Faster Ring Attention for Causal Transformers
Striped Attention: Faster Ring Attention for Causal Transformers
William Brandon
Aniruddha Nrusimha
Kevin Qian
Zack Ankner
Tian Jin
Zhiye Song
Jonathan Ragan-Kelley
36
36
0
15 Nov 2023
FlashDecoding++: Faster Large Language Model Inference on GPUs
FlashDecoding++: Faster Large Language Model Inference on GPUs
Ke Hong
Guohao Dai
Jiaming Xu
Qiuli Mao
Xiuhong Li
Jun Liu
Kangdi Chen
Yuhan Dong
Yu Wang
44
73
0
02 Nov 2023
CoTFormer: More Tokens With Attention Make Up For Less Depth
CoTFormer: More Tokens With Attention Make Up For Less Depth
Amirkeivan Mohtashami
Matteo Pagliardini
Martin Jaggi
34
2
0
16 Oct 2023
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
  via Prompt Compression
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
Huiqiang Jiang
Qianhui Wu
Xufang Luo
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
Lili Qiu
RALM
154
211
0
10 Oct 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
66
564
0
10 Oct 2023
Think before you speak: Training Language Models With Pause Tokens
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
94
116
0
03 Oct 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
70
238
0
03 Oct 2023
Efficient Streaming Language Models with Attention Sinks
Efficient Streaming Language Models with Attention Sinks
Michel Lang
Yuandong Tian
Beidi Chen
Song Han
Mike Lewis
AI4TS
RALM
116
744
0
29 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
154
2,163
0
12 Sep 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
105
1,276
0
17 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
79
159
0
05 Jul 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
124
289
0
24 Jun 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
63
657
0
22 May 2023
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
193
2,200
0
27 May 2022
Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
61
182
0
02 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
740
9,330
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
227
4,354
0
27 Oct 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
63
54
0
25 Feb 2021
Scaling Laws for Transfer
Scaling Laws for Transfer
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
69
244
0
02 Feb 2021
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
103
388
0
17 Dec 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
157
4,377
0
07 Sep 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
508
2,074
0
28 Jul 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
133
4,048
0
10 Apr 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
541
4,773
0
23 Jan 2020
PIQA: Reasoning about Physical Commonsense in Natural Language
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
116
1,776
0
26 Nov 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
156
2,446
0
19 May 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
99
1,894
0
23 Apr 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense
  Knowledge
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor
Jonathan Herzig
Nicholas Lourie
Jonathan Berant
RALM
140
1,716
0
02 Nov 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
146
2,567
0
14 Mar 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
446
18,931
0
20 Jul 2017
1