ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.08361
  4. Cited By
Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
ArXivPDFHTML

Papers citing "Scaling Laws for Neural Language Models"

50 / 982 papers shown
Title
Will we run out of data? Limits of LLM scaling based on human-generated
  data
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
33
111
0
26 Oct 2022
Learning Better Intent Representations for Financial Open Intent
  Classification
Learning Better Intent Representations for Financial Open Intent Classification
Xianzhi Li
Will Aitken
Xiao-Dan Zhu
Stephen W. Thomas
AIFin
11
8
0
25 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
40
49
0
25 Oct 2022
The Robustness Limits of SoTA Vision Models to Natural Variation
The Robustness Limits of SoTA Vision Models to Natural Variation
Mark Ibrahim
Q. Garrido
Ari S. Morcos
Diane Bouchacourt
VLM
43
16
0
24 Oct 2022
Precision Machine Learning
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
24
34
0
24 Oct 2022
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Maarten Sap
Ronan Le Bras
Daniel Fried
Yejin Choi
27
207
0
24 Oct 2022
Performance-Efficiency Trade-Offs in Adapting Language Models to Text
  Classification Tasks
Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks
Laura Aina
Nikos Voskarides
Roi Blanco
14
0
0
21 Oct 2022
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards
  Model-Oriented Scale
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
Ran Tian
Ankur P. Parikh
ODL
15
6
0
21 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
62
2,989
0
20 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
42
68
0
20 Oct 2022
A baseline revisited: Pushing the limits of multi-segment models for
  context-aware translation
A baseline revisited: Pushing the limits of multi-segment models for context-aware translation
Suvodeep Majumde
Stanislas Lauly
Maria Nadejde
Marcello Federico
Georgiana Dinu
32
13
0
19 Oct 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
475
0
19 Oct 2022
Optimisation & Generalisation in Networks of Neurons
Optimisation & Generalisation in Networks of Neurons
Jeremy Bernstein
AI4CE
24
2
0
18 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
86
997
0
17 Oct 2022
Can Language Representation Models Think in Bets?
Can Language Representation Models Think in Bets?
Zhi–Bin Tang
Mayank Kejriwal
15
6
0
14 Oct 2022
An Exploration of Hierarchical Attention Transformers for Efficient Long
  Document Classification
An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification
Ilias Chalkidis
Xiang Dai
Manos Fergadiotis
Prodromos Malakasiotis
Desmond Elliott
34
33
0
11 Oct 2022
Meta-Principled Family of Hyperparameter Scaling Strategies
Meta-Principled Family of Hyperparameter Scaling Strategies
Sho Yaida
52
16
0
10 Oct 2022
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in
  Realistic Healthcare Settings
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
Jean Ogier du Terrail
Samy Ayed
Edwige Cyffers
Felix Grimberg
Chaoyang He
...
Sai Praneeth Karimireddy
Marco Lorenzi
Giovanni Neglia
Marc Tommasi
M. Andreux
FedML
41
142
0
10 Oct 2022
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning
  Ticket's Mask?
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
Mansheej Paul
F. Chen
Brett W. Larsen
Jonathan Frankle
Surya Ganguli
Gintare Karolina Dziugaite
UQCV
25
38
0
06 Oct 2022
Generalization Properties of Retrieval-based Models
Generalization Properties of Retrieval-based Models
Soumya Basu
A. S. Rawat
Manzil Zaheer
29
6
0
06 Oct 2022
Learning to Reason With Relational Abstractions
Learning to Reason With Relational Abstractions
A. Nam
Mengye Ren
Chelsea Finn
James L. McClelland
ReLM
LRM
37
4
0
06 Oct 2022
Privacy-Preserving Text Classification on BERT Embeddings with
  Homomorphic Encryption
Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption
Garam Lee
Minsoo Kim
J. Park
Seung-won Hwang
Jung Hee Cheon
38
16
0
05 Oct 2022
Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors
Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors
Mohammad Reza Taesiri
Finlay Macklon
Yihe Wang
Hengshuo Shen
C. Bezemer
ELM
LLMAG
MLLM
42
13
0
05 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
214
208
0
05 Oct 2022
Fine-Tuning with Differential Privacy Necessitates an Additional
  Hyperparameter Search
Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search
Yannis Cattan
Christopher A. Choquette-Choo
Nicolas Papernot
Abhradeep Thakurta
26
20
0
05 Oct 2022
Memory in humans and deep language models: Linking hypotheses for model
  augmentation
Memory in humans and deep language models: Linking hypotheses for model augmentation
Omri Raccah
Pheobe Chen
Ted Willke
David Poeppel
Vy A. Vo
RALM
23
1
0
04 Oct 2022
Data Budgeting for Machine Learning
Data Budgeting for Machine Learning
Xin-Bo Zhao
Weixin Liang
James Zou
18
2
0
03 Oct 2022
Complexity-Based Prompting for Multi-Step Reasoning
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
162
414
0
03 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual
  Pre-training Methods
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
33
9
0
30 Sep 2022
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Oren Neumann
C. Gros
24
26
0
29 Sep 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with
  Latest Weight Averaging
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
Bidirectional Language Models Are Also Few-shot Learners
Bidirectional Language Models Are Also Few-shot Learners
Ajay Patel
Bryan Li
Mohammad Sadegh Rasooli
Noah Constant
Colin Raffel
Chris Callison-Burch
LRM
67
45
0
29 Sep 2022
Transfer Learning with Pretrained Remote Sensing Transformers
Transfer Learning with Pretrained Remote Sensing Transformers
A. Fuller
K. Millard
J.R. Green
30
11
0
28 Sep 2022
Local Grammar-Based Coding Revisited
Local Grammar-Based Coding Revisited
L. Debowski
25
0
0
27 Sep 2022
Moral Mimicry: Large Language Models Produce Moral Rationalizations
  Tailored to Political Identity
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Gabriel Simmons
105
57
0
24 Sep 2022
Variational Open-Domain Question Answering
Variational Open-Domain Question Answering
Valentin Liévin
Andreas Geert Motzfeldt
Ida Riis Jensen
Ole Winther
OOD
BDL
36
8
0
23 Sep 2022
Efficient Few-Shot Learning Without Prompts
Efficient Few-Shot Learning Without Prompts
Lewis Tunstall
Nils Reimers
Unso Eun Seo Jo
Luke Bates
Daniel Korat
Moshe Wasserblat
Oren Pereg
VLM
34
182
0
22 Sep 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
W. Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALM
AIMat
229
321
0
21 Sep 2022
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training
  Dynamics
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Tegan Maharaj
David M. Krueger
Sara Hooker
42
27
0
20 Sep 2022
Extremely Simple Activation Shaping for Out-of-Distribution Detection
Extremely Simple Activation Shaping for Out-of-Distribution Detection
Andrija Djurisic
Nebojsa Bozanic
Arjun Ashok
Rosanne Liu
OODD
169
150
0
20 Sep 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
PainPoints: A Framework for Language-based Detection of Chronic Pain and
  Expert-Collaborative Text-Summarization
PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization
S. Fadnavis
Amit Dhurandhar
R. Norel
Jenna M. Reinen
C. Agurto
E. Secchettin
V. Schweiger
Giovanni Perini
Guillermo Cecchi
26
1
0
14 Sep 2022
Revisiting Neural Scaling Laws in Language and Vision
Revisiting Neural Scaling Laws in Language and Vision
Ibrahim M. Alabdulmohsin
Behnam Neyshabur
Xiaohua Zhai
159
102
0
13 Sep 2022
Impact of dataset size and long-term ECoG-based BCI usage on deep
  learning decoders performance
Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance
Maciej Śliwowski
Matthieu Martin
Antoine Souloumiac
P. Blanchart
T. Aksenova
24
6
0
08 Sep 2022
Enabling Connectivity for Automated Mobility: A Novel MQTT-based
  Interface Evaluated in a 5G Case Study on Edge-Cloud Lidar Object Detection
Enabling Connectivity for Automated Mobility: A Novel MQTT-based Interface Evaluated in a 5G Case Study on Edge-Cloud Lidar Object Detection
Lennart Reiher
Bastian Lampe
Timo Woopen
Raphael van Kempen
Till Beemelmanns
L. Eckstein
8
8
0
08 Sep 2022
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on
  GPU
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU
Jian-He Liao
Mingzhen Li
Qingxiao Sun
Jiwei Hao
F. Yu
...
Ye Tao
Zicheng Zhang
Hailong Yang
Zhongzhi Luan
D. Qian
23
4
0
06 Sep 2022
Masked Sinogram Model with Transformer for ill-Posed Computed Tomography
  Reconstruction: a Preliminary Study
Masked Sinogram Model with Transformer for ill-Posed Computed Tomography Reconstruction: a Preliminary Study
Zhengchun Liu
R. Kettimuthu
Ian Foster
MedIm
30
2
0
03 Sep 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
29
20
0
03 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Previous
123...151617181920
Next