ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.15556
  4. Cited By
Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
    AI4TS
ArXivPDFHTML

Papers citing "Training Compute-Optimal Large Language Models"

50 / 417 papers shown
Title
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
7
0
21 Nov 2023
When Is Multilinguality a Curse? Language Modeling for 250 High- and
  Low-Resource Languages
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
43
7
0
15 Nov 2023
OpenForest: A data catalogue for machine learning in forest monitoring
OpenForest: A data catalogue for machine learning in forest monitoring
Arthur Ouaknine
T. Kattenborn
Etienne Laliberté
David Rolnick
53
6
0
01 Nov 2023
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language
  Models
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models
Mingfeng Xue
Dayiheng Liu
Kexin Yang
Guanting Dong
Wenqiang Lei
Zheng Yuan
Chang Zhou
Jingren Zhou
LLMAG
22
2
0
25 Oct 2023
Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and
  Prompt Engineering
Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and Prompt Engineering
Rahman S. M. Wahidur
Ishmam Tashdeed
Manjit Kaur
Heung-No Lee
ALM
33
17
0
20 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
30
1
0
15 Oct 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Wei Ping
Ming-Yu Liu
Lawrence C. McAfee
Peng Xu
Bo Li
M. Shoeybi
Bryan Catanzaro
RALM
16
46
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
31
1
0
10 Oct 2023
Divide-and-Conquer Dynamics in AI-Driven Disempowerment
Divide-and-Conquer Dynamics in AI-Driven Disempowerment
Peter S. Park
Max Tegmark
21
1
0
09 Oct 2023
Transformer Fusion with Optimal Transport
Transformer Fusion with Optimal Transport
Moritz Imfeld
Jacopo Graldi
Marco Giordano
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
ViT
MoMe
32
16
0
09 Oct 2023
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Anej Svete
Ryan Cotterell
36
2
0
08 Oct 2023
Reformulating Domain Adaptation of Large Language Models as
  Adapt-Retrieve-Revise
Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise
Zhen Wan
Yating Zhang
Yexiang Wang
Fei Cheng
Sadao Kurohashi
CLL
AILaw
34
10
0
05 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large
  Multimodal Models with In-Context Learning
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
46
20
0
01 Oct 2023
A Benchmark for Learning to Translate a New Language from One Grammar
  Book
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
24
51
0
28 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
40
84
0
25 Sep 2023
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
  Models
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
Ahmad Faiz
S. Kaneda
Ruhan Wang
Rita Osi
Parteek Sharma
Fan Chen
Lei Jiang
31
56
0
25 Sep 2023
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
27
231
0
20 Sep 2023
Language Modeling Is Compression
Language Modeling Is Compression
Grégoire Delétang
Anian Ruoss
Paul-Ambroise Duquenne
Elliot Catt
Tim Genewein
...
Wenliang Kevin Li
Matthew Aitchison
Laurent Orseau
Marcus Hutter
J. Veness
AI4CE
37
131
0
19 Sep 2023
D3: Data Diversity Design for Systematic Generalization in Visual
  Question Answering
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
36
1
0
15 Sep 2023
Balanced and Explainable Social Media Analysis for Public Health with
  Large Language Models
Balanced and Explainable Social Media Analysis for Public Health with Large Language Models
Yan Jiang
Ruihong Qiu
Yi Zhang
Peng Zhang
27
7
0
12 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
48
8
0
07 Sep 2023
GPT Can Solve Mathematical Problems Without a Calculator
GPT Can Solve Mathematical Problems Without a Calculator
Zhengyuan Yang
Ming Ding
Qingsong Lv
Zhihuan Jiang
Zehai He
Yuyi Guo
Jinfeng Bai
Jie Tang
RALM
LRM
39
53
0
06 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
64
4
0
28 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Xiaozhong Liu
78
31
0
27 Aug 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with
  Electronic Medical Records
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan H. Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
28
54
0
27 Aug 2023
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
Wenyu Zhu
Hao Wang
Yuchen Zhou
Jiaming Wang
Zihan Sha
Zeyu Gao
Chao Zhang
32
10
0
24 Aug 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Walter Hernandez Cruz
K. Tylinski
Alastair Moore
Niall Roche
Nikhil Vadgama
Horst Treiblmaier
J. Shangguan
Paolo Tasca
Jiahua Xu
25
2
0
23 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
54
14
0
23 Aug 2023
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence
  Understanding
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu
Chengyue Jiang
Chao Lou
Shen Huang
Xiaobin Wang
...
Haitao Zheng
Ningyu Zhang
Pengjun Xie
Fei Huang
Yong-jia Jiang
LRM
59
16
0
21 Aug 2023
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series
  Forecasters
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
Ching Chang
Wei-Yao Wang
Wenjie Peng
Tien-Fu Chen
AI4TS
38
46
0
16 Aug 2023
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal
  Deployment of Large Language Models
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models
Abi Aryan
Aakash Kumar Nain
Andrew McMahon
Lucas Augusto Meyer
Harpreet Sahota
27
6
0
15 Aug 2023
Composable Function-preserving Expansions for Transformer Architectures
Composable Function-preserving Expansions for Transformer Architectures
Andrea Gesmundo
Kaitlin Maile
AI4CE
40
8
0
11 Aug 2023
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
35
99
0
08 Aug 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
Scaling Sentence Embeddings with Large Language Models
Scaling Sentence Embeddings with Large Language Models
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
44
40
0
31 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
Opinion Mining Using Population-tuned Generative Language Models
Opinion Mining Using Population-tuned Generative Language Models
Allmin Pradhap Singh Susaiyah
Abhinay Pandya
Aki Härmä
15
0
0
24 Jul 2023
In-Context Learning Learns Label Relationships but Is Not Conventional
  Learning
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen
Y. Gal
Tom Rainforth
40
28
0
23 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
120
11,099
0
18 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
29
6
0
17 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
Continual Learning as Computationally Constrained Reinforcement Learning
Continual Learning as Computationally Constrained Reinforcement Learning
Saurabh Kumar
Henrik Marklund
Anand Srinivasa Rao
Yifan Zhu
Hong Jun Jeon
Yueyang Liu
Benjamin Van Roy
CLL
27
22
0
10 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
54
103
0
03 Jul 2023
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
Neural Algorithmic Reasoning Without Intermediate Supervision
Neural Algorithmic Reasoning Without Intermediate Supervision
Gleb Rodionov
Liudmila Prokhorenkova
OffRL
LRM
OOD
31
10
0
23 Jun 2023
AI could create a perfect storm of climate misinformation
AI could create a perfect storm of climate misinformation
V. Galaz
Hannah Metzler
Stefan Daume
A. Olsson
B. Lindström
A. Marklund
26
5
0
22 Jun 2023
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Jesse Zhang
Karl Pertsch
Jiahui Zhang
Joseph J. Lim
LM&Ro
38
17
0
20 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
62
359
0
20 Jun 2023
Structured Thoughts Automaton: First Formalized Execution Model for
  Auto-Regressive Language Models
Structured Thoughts Automaton: First Formalized Execution Model for Auto-Regressive Language Models
T. Vanderbruggen
C. Liao
P. Pirkelbauer
Pei-Hung Lin
LRM
ALM
24
2
0
16 Jun 2023
Large-scale Language Model Rescoring on Long-form Data
Large-scale Language Model Rescoring on Long-form Data
Tongzhou Chen
Cyril Allauzen
Yinghui Huang
Daniel S. Park
David Rybach
...
Rodrigo Cabrera
Kartik Audhkhasi
Bhuvana Ramabhadran
Pedro J. Moreno
Michael Riley
33
14
0
13 Jun 2023
Previous
123456789
Next