ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.14701
  4. Cited By
Scaling Laws for Autoregressive Generative Modeling

Scaling Laws for Autoregressive Generative Modeling

28 October 2020
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
Jacob Jackson
Heewoo Jun
Tom B. Brown
Prafulla Dhariwal
Scott Gray
Chris Hallacy
Benjamin Mann
Alec Radford
Aditya A. Ramesh
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
ArXivPDFHTML

Papers citing "Scaling Laws for Autoregressive Generative Modeling"

50 / 310 papers shown
Title
AI capabilities can be significantly improved without expensive
  retraining
AI capabilities can be significantly improved without expensive retraining
Tom Davidson
Jean-Stanislas Denain
Pablo Villalobos
Guillem Bas
OffRL
VLM
26
26
0
12 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIP
VLM
44
62
0
07 Dec 2023
H-GAP: Humanoid Control with a Generalist Planner
H-GAP: Humanoid Control with a Generalist Planner
Zhengyao Jiang
Yingchen Xu
Nolan Wagener
Yicheng Luo
Michael Janner
Edward Grefenstette
Tim Rocktaschel
Yuandong Tian
AI4CE
27
5
0
05 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
29
22
0
01 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
7
0
21 Nov 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
33
2
0
06 Nov 2023
Large Trajectory Models are Scalable Motion Predictors and Planners
Large Trajectory Models are Scalable Motion Predictors and Planners
Q. Sun
Shiduo Zhang
Danjiao Ma
Jingzhe Shi
Derun Li
Simian Luo
Yu Wang
Ningyi Xu
Guangzhi Cao
Hang Zhao
27
13
0
30 Oct 2023
Roles of Scaling and Instruction Tuning in Language Perception: Model
  vs. Human Attention
Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention
Changjiang Gao
Shujian Huang
Jixing Li
Jiajun Chen
LRM
ALM
42
7
0
29 Oct 2023
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination
  for each Benchmark
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
Oscar Sainz
Jon Ander Campos
Iker García-Ferrero
Julen Etxaniz
Oier López de Lacalle
Eneko Agirre
27
155
0
27 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch,
  Evaluations and Domain Applications
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
16
8
0
24 Oct 2023
PrivImage: Differentially Private Synthetic Image Generation using
  Diffusion Models with Semantic-Aware Pretraining
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
Kecen Li
Chen Gong
Zhixiang Li
Yuzhong Zhao
Xinwen Hou
Tianhao Wang
33
10
0
19 Oct 2023
BitNet: Scaling 1-bit Transformers for Large Language Models
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
34
97
0
17 Oct 2023
Visual Data-Type Understanding does not emerge from Scaling
  Vision-Language Models
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
36
9
0
12 Oct 2023
Predicting Emergent Abilities with Infinite Resolution Evaluation
Predicting Emergent Abilities with Infinite Resolution Evaluation
Shengding Hu
Xin Liu
Xu Han
Xinrong Zhang
Chaoqun He
...
Ning Ding
Zebin Ou
Guoyang Zeng
Zhiyuan Liu
Maosong Sun
ELM
LRM
25
13
0
05 Oct 2023
A Neural Scaling Law from Lottery Ticket Ensembling
A Neural Scaling Law from Lottery Ticket Ensembling
Ziming Liu
Max Tegmark
26
4
0
03 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
31
15
0
28 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
71
703
0
19 Sep 2023
Pretraining on the Test Set Is All You Need
Pretraining on the Test Set Is All You Need
Rylan Schaeffer
18
28
0
13 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
21
0
07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
48
8
0
07 Sep 2023
International Governance of Civilian AI: A Jurisdictional Certification
  Approach
International Governance of Civilian AI: A Jurisdictional Certification Approach
Robert F. Trager
Benjamin Harack
Anka Reuel
A. Carnegie
Lennart Heim
...
R. Lall
Owen Larter
Seán Ó hÉigeartaigh
Simon Staffell
José Jaime Villalobos
26
20
0
29 Aug 2023
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
Chuyi Kong
Yaxin Fan
Xiang Wan
Feng Jiang
Benyou Wang
40
8
0
21 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for
  Multimodal Analysis: a Case Study on Hateful Memes
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes
Yosuke Miyanishi
M. Nguyen
34
2
0
19 Aug 2023
Scaling Relationship on Learning Mathematical Reasoning with Large
  Language Models
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Zheng Yuan
Hongyi Yuan
Cheng Li
Guanting Dong
Keming Lu
Chuanqi Tan
Chang Zhou
Jingren Zhou
LRM
ALM
33
167
0
03 Aug 2023
Applicability of scaling laws to vision encoding models
Applicability of scaling laws to vision encoding models
Takuya Matsuyama
K. Sasaki
Shinji Nishimoto
MedIm
21
4
0
01 Aug 2023
Scaling Laws for Imitation Learning in Single-Agent Games
Scaling Laws for Imitation Learning in Single-Agent Games
Jens Tuyls
Dhruv Madeka
Kari Torkkola
Dean Phillips Foster
Karthik Narasimhan
Sham Kakade
32
4
0
18 Jul 2023
Mining of Single-Class by Active Learning for Semantic Segmentation
Mining of Single-Class by Active Learning for Semantic Segmentation
Hugues Lambert
E. Slade
CLL
VLM
19
0
0
18 Jul 2023
Nonlinear Processing with Linear Optics
Nonlinear Processing with Linear Optics
Mustafa Yildirim
Niyazi Ulaş Dinç
Ilker Oguz
D. Psaltis
C. Moser
40
36
0
17 Jul 2023
FedYolo: Augmenting Federated Learning with Pretrained Transformers
FedYolo: Augmenting Federated Learning with Pretrained Transformers
Xuechen Zhang
Mingchen Li
Xiangyu Chang
Jiasi Chen
A. Roy-Chowdhury
A. Suresh
Samet Oymak
FedML
31
7
0
10 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
44
118
0
06 Jul 2023
DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image
  Generation using Limited Data
DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data
Jin Zhu
Huimin Ma
Jiansheng Chen
Jian Yuan
DiffM
24
10
0
25 Jun 2023
Beyond Scale: the Diversity Coefficient as a Data Quality Metric
  Demonstrates LLMs are Pre-trained on Formally Diverse Data
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data
Alycia Lee
Brando Miranda
Sudharsan Sundar
Sanmi Koyejo
40
17
0
24 Jun 2023
Empowering Business Transformation: The Positive Impact and Ethical
  Considerations of Generative AI in Software Product Management -- A
  Systematic Literature Review
Empowering Business Transformation: The Positive Impact and Ethical Considerations of Generative AI in Software Product Management -- A Systematic Literature Review
N. A. Parikh
6
12
0
05 Jun 2023
Towards Foundation Models for Scientific Machine Learning:
  Characterizing Scaling and Transfer Behavior
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
Shashank Subramanian
P. Harrington
Kurt Keutzer
W. Bhimji
Dmitriy Morozov
Michael W. Mahoney
A. Gholami
AI4CE
33
72
0
01 Jun 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
30
334
0
29 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
38
200
0
25 May 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
29
46
0
25 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
562
0
22 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
30
58
0
22 May 2023
Revisiting the Architectures like Pointer Networks to Efficiently
  Improve the Next Word Distribution, Summarization Factuality, and Beyond
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond
Haw-Shiuan Chang
Zonghai Yao
Alolika Gon
Hong-ye Yu
Andrew McCallum
43
10
0
20 May 2023
Few-shot 3D Shape Generation
Few-shot 3D Shape Generation
Jin Zhu
Huimin Ma
Jiansheng Chen
Jian Yuan
DiffM
28
1
0
19 May 2023
Code Execution with Pre-trained Language Models
Code Execution with Pre-trained Language Models
Chenxiao Liu
Shuai Lu
Weizhu Chen
Daxin Jiang
Alexey Svyatkovskiy
Shengyu Fu
Neel Sundaresan
Nan Duan
ELM
22
21
0
08 May 2023
Are Emergent Abilities of Large Language Models a Mirage?
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
44
396
0
28 Apr 2023
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Xin Chen
Hengheng Zhang
Xiaotao Gu
Kaifeng Bi
Lingxi Xie
Qi Tian
MoE
22
4
0
22 Apr 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
29
117
0
21 Apr 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss
  Prediction across Scales
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
33
6
0
14 Apr 2023
Ambiguous Medical Image Segmentation using Diffusion Models
Ambiguous Medical Image Segmentation using Diffusion Models
Aimon Rahman
Jeya Maria Jose Valanarasu
I. Hacihaliloglu
V. Patel
MedIm
DiffM
43
102
0
10 Apr 2023
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence
  Models for Improved Inference Efficiency
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Daniel Fernando Campos
Chengxiang Zhai
24
2
0
05 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
36
1,178
0
03 Apr 2023
The Quantization Model of Neural Scaling
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
27
79
0
23 Mar 2023
Previous
1234567
Next