ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.15556
  4. Cited By
Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
    AI4TS
ArXivPDFHTML

Papers citing "Training Compute-Optimal Large Language Models"

50 / 417 papers shown
Title
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
32
4
0
04 Mar 2023
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Christopher Akiki
Odunayo Ogundepo
Aleksandra Piktus
Xinyu Crystina Zhang
Akintunde Oladipo
Jimmy J. Lin
Martin Potthast
25
5
0
28 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
37
12,368
0
27 Feb 2023
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)
Paulo Shakarian
Abhinav Koyyalamudi
Noel Ngu
Lakshmivihari Mareedu
31
64
0
23 Feb 2023
Poisoning Web-Scale Training Datasets is Practical
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
31
182
0
20 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
38
29
0
19 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
48
196
0
16 Feb 2023
Counting Carbon: A Survey of Factors Influencing the Emissions of
  Machine Learning
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning
A. Luccioni
Alex Hernandez-Garcia
34
45
0
16 Feb 2023
Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion
  of Spurious Solutions to Strict Saddle Points
Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion of Spurious Solutions to Strict Saddle Points
Ziye Ma
Igor Molybog
Javad Lavaei
Somayeh Sojoudi
31
3
0
15 Feb 2023
Cliff-Learning
Cliff-Learning
T. T. Wang
I. Zablotchi
Nir Shavit
Jonathan S. Rosenfeld
44
0
0
14 Feb 2023
Binarized Neural Machine Translation
Binarized Neural Machine Translation
Yichi Zhang
Ankush Garg
Yuan Cao
Lukasz Lew
Behrooz Ghorbani
Zhiru Zhang
Orhan Firat
MQ
34
14
0
09 Feb 2023
The unreasonable effectiveness of few-shot learning for machine
  translation
The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia
Yamini Bansal
Colin Cherry
George F. Foster
M. Krikun
Fan Feng
Melvin Johnson
Orhan Firat
38
102
0
02 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
Scaling laws for single-agent reinforcement learning
Scaling laws for single-agent reinforcement learning
Jacob Hilton
Jie Tang
John Schulman
22
20
0
31 Jan 2023
Adaptive Computation with Elastic Input Sequence
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
31
19
0
30 Jan 2023
REPLUG: Retrieval-Augmented Black-Box Language Models
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi
Sewon Min
Michihiro Yasunaga
Minjoon Seo
Rich James
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
VLM
KELM
83
580
0
30 Jan 2023
Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on
  a developmentally plausible corpus
Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
Alex Warstadt
Leshem Choshen
Aaron Mueller
Adina Williams
Ethan Gotlieb Wilcox
Chengxu Zhuang
27
54
0
27 Jan 2023
Probing Out-of-Distribution Robustness of Language Models with
  Parameter-Efficient Transfer Learning
Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning
Hyunsoo Cho
Choonghyun Park
Junyeop Kim
Hyuhng Joon Kim
Kang Min Yoo
Sang-goo Lee
OODD
35
3
0
27 Jan 2023
Projected Subnetworks Scale Adaptation
Projected Subnetworks Scale Adaptation
Siddhartha Datta
N. Shadbolt
VLM
CLL
28
0
0
27 Jan 2023
Efficient Language Model Training through Cross-Lingual and Progressive
  Transfer Learning
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
Malte Ostendorff
Georg Rehm
CLIP
VLM
CLL
41
23
0
23 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Human-Timescale Adaptation in an Open-Ended Task Space
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&Ro
OffRL
AI4CE
LRM
38
109
0
18 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
  World
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
33
9
0
14 Jan 2023
Data Distillation: A Survey
Data Distillation: A Survey
Noveen Sachdeva
Julian McAuley
DD
45
73
0
11 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
99
35
0
01 Jan 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
73
370
0
28 Dec 2022
Task Ambiguity in Humans and Language Models
Task Ambiguity in Humans and Language Models
Alex Tamkin
Kunal Handa
Ava Shrestha
Noah D. Goodman
UQLM
44
22
0
20 Dec 2022
Is GPT-3 a Good Data Annotator?
Is GPT-3 a Good Data Annotator?
Bosheng Ding
Chengwei Qin
Linlin Liu
Yew Ken Chia
Chenyu You
Boyang Albert Li
Lidong Bing
26
233
0
20 Dec 2022
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
32
92
0
14 Dec 2022
Gradient flow in the gaussian covariate model: exact solution of
  learning curves and multiple descent structures
Gradient flow in the gaussian covariate model: exact solution of learning curves and multiple descent structures
Antione Bodin
N. Macris
34
4
0
13 Dec 2022
General-Purpose In-Context Learning by Meta-Learning Transformers
General-Purpose In-Context Learning by Meta-Learning Transformers
Louis Kirsch
James Harrison
Jascha Narain Sohl-Dickstein
Luke Metz
40
72
0
08 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
72
439
0
08 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and
  Training Efficiency via Efficient Data Sampling and Routing
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
27
24
0
07 Dec 2022
Exploring the Limits of Differentially Private Deep Learning with
  Group-wise Clipping
Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
Jiyan He
Xuechen Li
Da Yu
Huishuai Zhang
Janardhan Kulkarni
Y. Lee
A. Backurs
Nenghai Yu
Jiang Bian
30
46
0
03 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
28
52
0
02 Dec 2022
A Pipeline for Generating, Annotating and Employing Synthetic Data for
  Real World Question Answering
A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering
Matthew Maufe
James Ravenscroft
Rob Procter
Maria Liakata
32
3
0
30 Nov 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
49
215
0
28 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
39
3
0
27 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
22
95
0
22 Nov 2022
GAMMT: Generative Ambiguity Modeling Using Multiple Transformers
GAMMT: Generative Ambiguity Modeling Using Multiple Transformers
Xingcheng Xu
30
0
0
16 Nov 2022
Breadth-First Pipeline Parallelism
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
28
1
0
11 Nov 2022
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
37
295
0
09 Nov 2022
Astronomia ex machina: a history, primer, and outlook on neural networks
  in astronomy
Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy
Michael J. Smith
James E. Geach
35
32
0
07 Nov 2022
Inverse scaling can become U-shaped
Inverse scaling can become U-shaped
Jason W. Wei
Najoung Kim
Yi Tay
Quoc V. Le
LRM
29
60
0
03 Nov 2022
Changes from Classical Statistics to Modern Statistics and Data Science
Changes from Classical Statistics to Modern Statistics and Data Science
Kai Zhang
Shan-Yu Liu
M. Xiong
34
0
0
30 Oct 2022
A Solvable Model of Neural Scaling Laws
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
36
51
0
30 Oct 2022
Modeling structure-building in the brain with CCG parsing and large
  language models
Modeling structure-building in the brain with CCG parsing and large language models
Miloš Stanojević
Jonathan Brennan
Donald Dunagan
Mark Steedman
John T. Hale
24
12
0
28 Oct 2022
Multi-lingual Evaluation of Code Generation Models
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun
Sanjay Krishna Gouda
Zijian Wang
Xiaopeng Li
Yuchen Tian
...
Baishakhi Ray
Parminder Bhatia
Sudipta Sengupta
Dan Roth
Bing Xiang
ELM
120
161
0
26 Oct 2022
The Robustness Limits of SoTA Vision Models to Natural Variation
The Robustness Limits of SoTA Vision Models to Natural Variation
Mark Ibrahim
Q. Garrido
Ari S. Morcos
Diane Bouchacourt
VLM
43
16
0
24 Oct 2022
Precision Machine Learning
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
24
34
0
24 Oct 2022
Previous
123456789
Next