Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.15556
Cited By
Training Compute-Optimal Large Language Models
29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Compute-Optimal Large Language Models"
50 / 420 papers shown
Title
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
62
359
0
20 Jun 2023
Structured Thoughts Automaton: First Formalized Execution Model for Auto-Regressive Language Models
T. Vanderbruggen
C. Liao
P. Pirkelbauer
Pei-Hung Lin
LRM
ALM
24
2
0
16 Jun 2023
Large-scale Language Model Rescoring on Long-form Data
Tongzhou Chen
Cyril Allauzen
Yinghui Huang
Daniel S. Park
David Rybach
...
Rodrigo Cabrera
Kartik Audhkhasi
Bhuvana Ramabhadran
Pedro J. Moreno
Michael Riley
33
14
0
13 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
54
190
0
12 Jun 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models
Zhongbin Xie
Thomas Lukasiewicz
28
12
0
06 Jun 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
44
159
0
02 Jun 2023
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Baohao Liao
Shaomu Tan
Christof Monz
KELM
23
29
0
01 Jun 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
60
190
0
29 May 2023
Parameter-Efficient Fine-Tuning without Introducing New Latency
Baohao Liao
Yan Meng
Christof Monz
24
49
0
26 May 2023
Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
29
15
0
25 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
Large Language Models for User Interest Journeys
Konstantina Christakopoulou
Alberto Lalama
Cj Adams
Iris Qu
Yifat Amir
...
Dina Bseiso
Sarah Scodel
Lucas Dixon
Ed H. Chi
Minmin Chen
24
25
0
24 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski
Stephan Alaniz
Isabel Rio-Torto
Eric Schulz
Zeynep Akata
44
151
0
24 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
55
132
0
23 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
44
99
0
23 May 2023
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model
Leo Liu
Tim Dettmers
Xi Lin
Ves Stoyanov
Xian Li
MoE
26
9
0
23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
38
2
0
23 May 2023
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Zangwei Zheng
Xiaozhe Ren
Fuzhao Xue
Yang Luo
Xin Jiang
Yang You
42
55
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
562
0
22 May 2023
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
42
53
0
22 May 2023
Rethinking Semi-supervised Learning with Language Models
Zhengxiang Shi
Francesco Tonolini
Nikolaos Aletras
Emine Yilmaz
G. Kazai
Yunlong Jiao
32
18
0
22 May 2023
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen
Ming Yin
Max W.F. Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
38
124
0
21 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
36
93
0
19 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
125
1,152
0
17 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
32
31
0
15 May 2023
Improving Small Language Models on PubMedQA via Generative Data Augmentation
Zhen Guo
Peiqi Wang
Yanwei Wang
Shangdi Yu
LM&MA
MedIm
18
10
0
12 May 2023
On the average-case complexity of learning output distributions of quantum circuits
A. Nietner
M. Ioannou
R. Sweke
R. Kueng
Jens Eisert
M. Hinsche
J. Haferkamp
28
11
0
09 May 2023
When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution
Zhengxiang Shi
Jerome Ramos
To Eun Kim
Xi Wang
Hossein A. Rahmani
Aldo Lipani
27
10
0
09 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
26
40
0
09 May 2023
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
Xiaonan Li
Xipeng Qiu
ReLM
KELM
LRM
AI4MH
26
32
0
09 May 2023
The Current State of Summarization
Fabian Retkowski
23
6
0
08 May 2023
How Do In-Context Examples Affect Compositional Generalization?
Shengnan An
Zeqi Lin
Qiang Fu
B. Chen
Nanning Zheng
Jian-Guang Lou
Dongmei Zhang
34
49
0
08 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
38
85
0
02 May 2023
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
35
117
0
21 Apr 2023
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining
Hyung Won Chung
Noah Constant
Xavier Garcia
Adam Roberts
Yi Tay
Sharan Narang
Orhan Firat
29
50
0
18 Apr 2023
Learning to Compress Prompts with Gist Tokens
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
53
206
0
17 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
26
40
0
17 Apr 2023
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
34
4
0
15 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
125
3,055
0
14 Apr 2023
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence
Gengchen Mai
Weiming Huang
Jin Sun
Suhang Song
Deepak Mishra
...
Yingjie Hu
Chris Cundy
Ziyuan Li
Rui Zhu
Ni Lao
AI4CE
32
123
0
13 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
18
408
0
13 Apr 2023
Human-machine cooperation for semantic feature listing
Kushin Mukherjee
Siddharth Suresh
Timothy T. Rogers
VLM
17
2
0
11 Apr 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks
Blake Bordelon
Cengiz Pehlevan
MLT
38
29
0
06 Apr 2023
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
23
5
0
06 Apr 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
43
6
0
06 Apr 2023
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling
Haotao Wang
Ziyu Jiang
Yuning You
Yan Han
Gaowen Liu
Jayanth Srinivasa
Ramana Rao Kompella
Zhangyang Wang
26
29
0
06 Apr 2023
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
72
6,839
0
05 Apr 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
50
412
0
31 Mar 2023
On the Creativity of Large Language Models
Giorgio Franceschelli
Mirco Musolesi
72
52
0
27 Mar 2023
Guided Transfer Learning
Danilo Nikolić
Davor Andrić
V. Nikolić
16
2
0
26 Mar 2023
Previous
1
2
3
4
5
6
7
8
9
Next