ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.14701
  4. Cited By
Scaling Laws for Autoregressive Generative Modeling

Scaling Laws for Autoregressive Generative Modeling

28 October 2020
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
Jacob Jackson
Heewoo Jun
Tom B. Brown
Prafulla Dhariwal
Scott Gray
Chris Hallacy
Benjamin Mann
Alec Radford
Aditya A. Ramesh
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
ArXivPDFHTML

Papers citing "Scaling Laws for Autoregressive Generative Modeling"

50 / 310 papers shown
Title
Superposition Yields Robust Neural Scaling
Superposition Yields Robust Neural Scaling
Yizhou Liu
Ziming Liu
Jeff Gore
MILM
24
0
0
15 May 2025
Guiding Data Collection via Factored Scaling Curves
Guiding Data Collection via Factored Scaling Curves
Lihan Zha
Apurva Badithela
Michael Zhang
Justin Lidard
Jeremy Bao
Emily Zhou
David Snyder
Allen Z. Ren
Dhruv Shah
Anirudha Majumdar
OffRL
34
0
0
12 May 2025
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
Francesco Cagnetta
Alessandro Favero
Antonio Sclocchi
M. Wyart
26
0
0
11 May 2025
Quiet Feature Learning in Algorithmic Tasks
Quiet Feature Learning in Algorithmic Tasks
Prudhviraj Naidu
Zixian Wang
Leon Bergen
R. Paturi
VLM
57
0
0
06 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
121
0
0
01 May 2025
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
X. Li
Jiajie Jin
Guanting Dong
Hongjin Qian
Yutao Zhu
Yongkang Wu
Ji-Rong Wen
Zhicheng Dou
LLMAG
LRM
97
2
0
30 Apr 2025
Compute-Optimal LLMs Provably Generalize Better With Scale
Compute-Optimal LLMs Provably Generalize Better With Scale
Marc Finzi
Sanyam Kapoor
Diego Granziol
Anming Gu
Christopher De Sa
J. Zico Kolter
Andrew Gordon Wilson
32
0
0
21 Apr 2025
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Patrik Reizinger
Randall Balestriero
David Klindt
Wieland Brendel
40
0
0
17 Apr 2025
Scaling Laws for Data-Efficient Visual Transfer Learning
Scaling Laws for Data-Efficient Visual Transfer Learning
Wenxuan Yang
Qingqu Wei
Chenxi Ma
Weimin Tan
Bo Yan
33
0
0
17 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
35
0
0
16 Apr 2025
Hyperflows: Pruning Reveals the Importance of Weights
Hyperflows: Pruning Reveals the Importance of Weights
Eugen Barbulescu
Antonio Alexoaie
31
0
0
06 Apr 2025
Data Scaling Laws for End-to-End Autonomous Driving
Data Scaling Laws for End-to-End Autonomous Driving
Alexander Naumann
Xunjiang Gu
Tolga Dimlioglu
Mariusz Bojarski
Alperen Degirmenci
A. Popov
Devansh Bisla
Marco Pavone
Urs Muller
Boris Ivanovic
48
0
0
06 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
71
2
0
03 Apr 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
72
2
0
30 Mar 2025
Make Autoregressive Great Again: Diffusion-Free Graph Generation with Next-Scale Prediction
Make Autoregressive Great Again: Diffusion-Free Graph Generation with Next-Scale Prediction
Samuel Belkadi
Steve Hong
Marian Chen
DiffM
50
0
0
30 Mar 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yanjie Wang
Zhijie Lin
Yao Teng
Yuanzhi Zhu
Shuhuai Ren
Jiashi Feng
Xihui Liu
53
0
0
20 Mar 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo
Haodong Wen
Shengding Hu
Zhenbo Sun
Zhiyuan Liu
Maosong Sun
Kaifeng Lyu
Wenguang Chen
CLL
67
2
0
17 Mar 2025
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
Shiran Yuan
Hao Zhao
DiffM
54
0
0
17 Mar 2025
Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li
Jinyue Yang
Guoqi Li
Huan Wang
55
0
0
13 Mar 2025
Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study
Xuzhi Zhang
Yue-jiao Gong
Jun Zhang
64
0
0
11 Mar 2025
ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification
Mingshi Li
Dusan Grujicic
Ben Somers
Stien Heremans
S. D. Saeger
Matthew B. Blaschko
ViT
57
0
0
11 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
54
2
0
08 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
109
2
0
07 Mar 2025
Non-Gaussianities in Collider Metric Binning
Andrew J. Larkoski
52
1
0
05 Mar 2025
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
Hongjie Fang
Chenxi Wang
Yiming Wang
J. Chen
Shangning Xia
...
Xinyu Zhan
Lixin Yang
Weiming Wang
Cewu Lu
Hao-Shu Fang
87
1
0
05 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
73
3
0
03 Mar 2025
Discrete Codebook World Models for Continuous Control
Aidan Scannell
Mohammadreza Nakhaei
Kalle Kujanpää
Yi Zhao
Kevin Sebastian Luck
Dieter Büchler
Joni Pajarinen
OffRL
50
1
0
01 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
38
0
0
28 Feb 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
92
0
0
27 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
69
2
0
26 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Philip Torr
Adel Bibi
53
1
0
26 Feb 2025
Scaling Laws for Downstream Task Performance in Machine Translation
Scaling Laws for Downstream Task Performance in Machine Translation
Berivan Isik
Natalia Ponomareva
Hussein Hazimeh
Dimitris Paparas
Sergei Vassilvitskii
Sanmi Koyejo
113
4
0
24 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
56
0
0
24 Feb 2025
Model-agnostic Coreset Selection via LLM-based Concept Bottlenecks
Akshay Mehra
Trisha Mittal
Subhadra Gopalakrishnan
Joshua Kimball
45
0
0
23 Feb 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
58
5
0
21 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
50
0
0
17 Feb 2025
FuXi-$\alpha$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
FuXi-α\alphaα: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
Yufei Ye
Wei Guo
Jin Yao Chin
Hao Wang
Hong Zhu
...
Yuyang Ye
Y. Liu
Ruiming Tang
Defu Lian
Enhong Chen
100
2
0
05 Feb 2025
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation
Jiajian Li
Jingyun Liang
Yong Guo
W. J. Li
Yulun Zhang
DiffM
75
0
0
04 Feb 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
39
5
0
28 Jan 2025
NExtLong: Toward Effective Long-Context Training without Long Documents
NExtLong: Toward Effective Long-Context Training without Long Documents
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
68
1
0
22 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
109
6
0
21 Jan 2025
Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance
Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance
Jin Zhu
Huimin Ma
Jiansheng Chen
Jian Yuan
79
4
0
20 Jan 2025
Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins
Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins
Ilker Oguz
Louis J. E. Suter
J. Hsieh
Mustafa Yildirim
Niyazi Ulaş Dinç
Christophe Moser
D. Psaltis
63
2
0
14 Jan 2025
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie
Yinhong Liu
Penghao Zhou
Chen Zhao
Jun Zhou
Kaicheng Zhang
Zhenru Zhang
Jian Yang
Zhengyuan Yang
Ying Tai
VGen
DiffM
41
2
0
06 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yong-Jin Liu
51
0
0
06 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
124
2
0
03 Jan 2025
When Worse is Better: Navigating the compression-generation tradeoff in
  visual tokenization
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
74
2
0
20 Dec 2024
Outcome-Refining Process Supervision for Code Generation
Outcome-Refining Process Supervision for Code Generation
Zhuohao Yu
Weizheng Gu
Yidong Wang
Zhengran Zeng
Jindong Wang
Wei Ye
Shikun Zhang
LRM
89
4
0
19 Dec 2024
Parallelized Autoregressive Visual Generation
Parallelized Autoregressive Visual Generation
Yanjie Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
90
12
0
19 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
123
9
0
19 Dec 2024
1234567
Next