ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.14701
  4. Cited By
Scaling Laws for Autoregressive Generative Modeling

Scaling Laws for Autoregressive Generative Modeling

28 October 2020
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
Jacob Jackson
Heewoo Jun
Tom B. Brown
Prafulla Dhariwal
Scott Gray
Chris Hallacy
Benjamin Mann
Alec Radford
Aditya A. Ramesh
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
ArXivPDFHTML

Papers citing "Scaling Laws for Autoregressive Generative Modeling"

50 / 310 papers shown
Title
SemDeDup: Data-efficient learning at web-scale through semantic
  deduplication
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Amro Abbas
Kushal Tirumala
Daniel Simig
Surya Ganguli
Ari S. Morcos
28
162
0
16 Mar 2023
Can neural networks do arithmetic? A survey on the elementary numerical
  skills of state-of-the-art deep learning models
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models
Alberto Testolin
AIMat
35
20
0
14 Mar 2023
Architext: Language-Driven Generative Architecture Design
Architext: Language-Driven Generative Architecture Design
Theodoros Galanos
Antonios Liapis
Georgios N. Yannakakis
VLM
AI4CE
26
6
0
13 Mar 2023
Synthetic ECG Signal Generation using Probabilistic Diffusion Models
Synthetic ECG Signal Generation using Probabilistic Diffusion Models
Edmond Adib
Amanda Fernandez
Fatemeh Afghah
John J. Prevost
DiffM
40
38
0
04 Mar 2023
Does Deep Learning Learn to Abstract? A Systematic Probing Framework
Does Deep Learning Learn to Abstract? A Systematic Probing Framework
Shengnan An
Zeqi Lin
B. Chen
Qiang Fu
Nanning Zheng
Jian-Guang Lou
43
4
0
23 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
38
29
0
19 Feb 2023
A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe
  Processes
A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe Processes
L. Debowski
MILM
19
11
0
17 Feb 2023
Scaling laws for single-agent reinforcement learning
Scaling laws for single-agent reinforcement learning
Jacob Hilton
Jie Tang
John Schulman
22
20
0
31 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Probing Out-of-Distribution Robustness of Language Models with
  Parameter-Efficient Transfer Learning
Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning
Hyunsoo Cho
Choonghyun Park
Junyeop Kim
Hyuhng Joon Kim
Kang Min Yoo
Sang-goo Lee
OODD
35
3
0
27 Jan 2023
ClimaX: A foundation model for weather and climate
ClimaX: A foundation model for weather and climate
Tung Nguyen
Johannes Brandstetter
Ashish Kapoor
Jayesh K. Gupta
Aditya Grover
AI4Cl
AI4CE
11
245
0
24 Jan 2023
Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual
  Conditional Generation with Interaction
Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual Conditional Generation with Interaction
Jonathan Pilault
Xavier Garcia
Arthur Bravzinskas
Orhan Firat
AI4CE
LRM
19
17
0
24 Jan 2023
Scaling Laws for Generative Mixed-Modal Language Models
Scaling Laws for Generative Mixed-Modal Language Models
Armen Aghajanyan
L. Yu
Alexis Conneau
Wei-Ning Hsu
Karen Hambardzumyan
Susan Zhang
Stephen Roller
Naman Goyal
Omer Levy
Luke Zettlemoyer
MoE
VLM
19
104
0
10 Jan 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
40
2,024
0
19 Dec 2022
The case for 4-bit precision: k-bit Inference Scaling Laws
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
214
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
364
0
19 Dec 2022
Reproducible scaling laws for contrastive language-image learning
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLM
CLIP
59
743
0
14 Dec 2022
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Ippei Fujisawa
Ryota Kanai
ELM
LRM
28
4
0
14 Nov 2022
Development of a Neural Network-Based Mathematical Operation Protocol
  for Embedded Hexadecimal Digits Using Neural Architecture Search (NAS)
Development of a Neural Network-Based Mathematical Operation Protocol for Embedded Hexadecimal Digits Using Neural Architecture Search (NAS)
Victor Robila
Kexin Pei
Junfeng Yang
21
0
0
12 Nov 2022
Few-shot Image Generation with Diffusion Models
Few-shot Image Generation with Diffusion Models
Jin Zhu
Huimin Ma
Jiansheng Chen
Jian Yuan
DiffM
40
20
0
07 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert
  Denoisers
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
49
804
0
02 Nov 2022
A Solvable Model of Neural Scaling Laws
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
36
51
0
30 Oct 2022
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
Broken Neural Scaling Laws
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
30
74
0
26 Oct 2022
Scaling Laws Beyond Backpropagation
Scaling Laws Beyond Backpropagation
Matthew J. Filipovich
Alessandro Cappelli
Daniel Hesslow
Julien Launay
19
3
0
26 Oct 2022
Will we run out of data? Limits of LLM scaling based on human-generated
  data
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
38
111
0
26 Oct 2022
Precision Machine Learning
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
24
34
0
24 Oct 2022
Leveraging Large Language Models for Multiple Choice Question Answering
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
146
186
0
22 Oct 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
481
0
19 Oct 2022
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Oren Neumann
C. Gros
29
26
0
29 Sep 2022
The Chamber Ensemble Generator: Limitless High-Quality MIR Data via
  Generative Modeling
The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling
Yusong Wu
Josh Gardner
Ethan Manilow
Ian Simon
Curtis Hawthorne
Jesse Engel
40
10
0
28 Sep 2022
Local Grammar-Based Coding Revisited
Local Grammar-Based Coding Revisited
L. Debowski
33
0
0
27 Sep 2022
Understanding Scaling Laws for Recommendation Models
Understanding Scaling Laws for Recommendation Models
Newsha Ardalani
Carole-Jean Wu
Zeliang Chen
Bhargav Bhushanam
Adnan Aziz
36
28
0
17 Aug 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
34
630
0
15 Aug 2022
The BUTTER Zone: An Empirical Study of Training Dynamics in Fully
  Connected Neural Networks
The BUTTER Zone: An Empirical Study of Training Dynamics in Fully Connected Neural Networks
Charles Edison Tripp
J. Perr-Sauer
L. Hayne
M. Lunacek
Jamil Gafur
AI4CE
21
0
0
25 Jul 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence
  Scaling?
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
34
100
0
21 Jul 2022
Beyond neural scaling laws: beating power law scaling via data pruning
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
22
418
0
29 Jun 2022
Evaluating the Impact of Model Scale for Compositional Generalization in
  Semantic Parsing
Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing
Linlu Qiu
Peter Shaw
Panupong Pasupat
Tianze Shi
Jonathan Herzig
Emily Pitler
Fei Sha
Kristina Toutanova
AI4CE
LRM
33
52
0
24 May 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of
  Large Language Models
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
29
187
0
22 May 2022
Scaling Laws and Interpretability of Learning from Repeated Data
Scaling Laws and Interpretability of Learning from Repeated Data
Danny Hernandez
Tom B. Brown
Tom Conerly
Nova Dassarma
Dawn Drain
...
Catherine Olsson
Dario Amodei
Nicholas Joseph
Jared Kaplan
Sam McCandlish
30
111
0
21 May 2022
Active Learning Helps Pretrained Models Learn the Intended Task
Active Learning Helps Pretrained Models Learn the Intended Task
Alex Tamkin
Dat Nguyen
Salil Deshpande
Jesse Mu
Noah D. Goodman
23
35
0
18 Apr 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
99
802
0
14 Apr 2022
InCoder: A Generative Model for Code Infilling and Synthesis
InCoder: A Generative Model for Code Infilling and Synthesis
Daniel Fried
Armen Aghajanyan
Jessy Lin
Sida I. Wang
Eric Wallace
Freda Shi
Ruiqi Zhong
Wen-tau Yih
Luke Zettlemoyer
M. Lewis
SyDa
28
627
0
12 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
72
2,341
0
12 Apr 2022
Vision Transformer Compression with Structured Pruning and Low Rank
  Approximation
Vision Transformer Compression with Structured Pruning and Low Rank Approximation
Ankur Kumar
ViT
28
6
0
25 Mar 2022
Autoregressive Image Generation using Residual Quantization
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
175
330
0
03 Mar 2022
Deconstructing Distributions: A Pointwise Framework of Learning
Deconstructing Distributions: A Pointwise Framework of Learning
Gal Kaplun
Nikhil Ghosh
Saurabh Garg
Boaz Barak
Preetum Nakkiran
OOD
33
21
0
20 Feb 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance
  from Small Scale Experiments
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
19
17
0
13 Feb 2022
Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
27
177
0
02 Feb 2022
Nonlinear Initialization Methods for Low-Rank Neural Networks
Nonlinear Initialization Methods for Low-Rank Neural Networks
Kiran Vodrahalli
Rakesh Shivanna
M. Sathiamoorthy
Sagar Jain
Ed H. Chi
19
4
0
02 Feb 2022
Previous
1234567
Next