ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.15556
  4. Cited By
Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
    AI4TS
ArXivPDFHTML

Papers citing "Training Compute-Optimal Large Language Models"

50 / 420 papers shown
Title
Evaluating Zero-Shot Long-Context LLM Compression
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
51
0
0
10 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Jingyang Ou
Shen Nie
Kaiwen Xue
Fengqi Zhu
Jiacheng Sun
Zhenguo Li
Chongxuan Li
DiffM
41
29
0
06 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large
  Language Model Training
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
31
7
0
05 Jun 2024
CSS: Contrastive Semantic Similarity for Uncertainty Quantification of
  LLMs
CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs
Shuang Ao
Stefan Rueger
Advaith Siddharthan
33
1
0
05 Jun 2024
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Hyunseung Kim
ByungKun Lee
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Jaegul Choo
39
1
0
01 Jun 2024
Beyond Imitation: Learning Key Reasoning Steps from Dual
  Chain-of-Thoughts in Reasoning Distillation
Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation
Chengwei Dai
Kun Li
Wei Zhou
Song Hu
LRM
43
5
0
30 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
52
3
0
28 May 2024
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for
  Controllable Language Generation
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Chengxing Jia
Pengyuan Wang
Ziniu Li
Yi-Chen Li
Zhilong Zhang
Nan Tang
Yang Yu
OffRL
42
1
0
27 May 2024
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Dixuan Wang
Yanda Li
Junyuan Jiang
Zepeng Ding
Ziqin Luo
Guochao Jiang
Jiaqing Liang
Deqing Yang
27
11
0
27 May 2024
Infinite Limits of Multi-head Transformer Dynamics
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
Cengiz Pehlevan
AI4CE
47
9
0
24 May 2024
A Misleading Gallery of Fluid Motion by Generative Artificial
  Intelligence
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
51
5
0
24 May 2024
Scaling Law for Time Series Forecasting
Scaling Law for Time Series Forecasting
Jingzhe Shi
Qinwei Ma
Huan Ma
Lei Li
AI4TS
31
8
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
43
0
23 May 2024
A social path to human-like artificial intelligence
A social path to human-like artificial intelligence
Edgar A. Duénez-Guzmán
Suzanne Sadedin
Jane X. Wang
Kevin R. McKee
Joel Z Leibo
GNN
31
28
0
22 May 2024
A Systematic Evaluation of Large Language Models for Natural Language
  Generation Tasks
A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks
Xuanfan Ni
Piji Li
ELM
LRM
34
8
0
16 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
39
6
0
14 May 2024
CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach
  vs. Expensive LLM
CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLM
Urjitkumar Patel
Fang-Chun Yeh
Chinmay Gondhalekar
29
3
0
10 May 2024
Large Language Models for Cyber Security: A Systematic Literature Review
Large Language Models for Cyber Security: A Systematic Literature Review
HanXiang Xu
Shenao Wang
Ningke Li
Kaidi Wang
Yanjie Zhao
Kai Chen
Ting Yu
Yang Liu
Haoyu Wang
37
23
0
08 May 2024
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Hanyin Wang
Chufan Gao
Bolun Liu
Qiping Xu
Guleid Hussein
Mohamad El Labban
Kingsley Iheasirim
H. Korsapati
Chuck Outcalt
Jiashuo Sun
LM&MA
AI4MH
40
2
0
25 Apr 2024
Data Authenticity, Consent, & Provenance for AI are all broken: what
  will it take to fix them?
Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?
Shayne Longpre
Robert Mahari
Naana Obeng-Marnu
William Brannon
Tobin South
Katy Gero
Sandy Pentland
Jad Kabbara
66
5
0
19 Apr 2024
Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis
Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis
Yufan Li
Subhabrata Sen
Ben Adlam
MLT
51
1
0
18 Apr 2024
Adapting Mental Health Prediction Tasks for Cross-lingual Learning via
  Meta-Training and In-context Learning with Large Language Model
Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model
Zita Lifelo
Huansheng Ning
Sahraoui Dhelim
AI4MH
48
0
0
13 Apr 2024
Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs
  versus Traditional Relational Databases
Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases
Xiang Zhang
Khatoon Khedri
Reza Rawassizadeh
40
2
0
12 Apr 2024
RAR-b: Reasoning as Retrieval Benchmark
RAR-b: Reasoning as Retrieval Benchmark
Chenghao Xiao
G. Thomas
Al Moubayed
LRM
RALM
36
8
0
09 Apr 2024
YaART: Yet Another ART Rendering Technology
YaART: Yet Another ART Rendering Technology
Sergey Kastryulin
Artem Konev
Alexander Shishenya
Eugene Lyapustin
Artem Khurshudov
...
Dmitrii Kornilov
Mikhail Romanov
Artem Babenko
Sergei Ovcharenko
Valentin Khrulkov
EGVM
41
1
0
08 Apr 2024
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream
  Neural Network Enhancement with LLMs
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream Neural Network Enhancement with LLMs
Xiao Liu
Jiawei Zhang
41
0
0
29 Mar 2024
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
52
81
0
26 Mar 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
57
64
0
25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
70
46
0
23 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language
  Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural
  Networks
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks
Louis Fournier
Edouard Oyallon
52
0
0
13 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
52
17
0
12 Mar 2024
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and
  Image Embeddings
Synth2^22: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
VLM
46
9
0
12 Mar 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal
  Linear Networks
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
Hristo Papazov
Scott Pesme
Nicolas Flammarion
38
5
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
502
0
07 Mar 2024
Do Large Language Models Mirror Cognitive Language Processing?
Do Large Language Models Mirror Cognitive Language Processing?
Yuqi Ren
Renren Jin
Tongxuan Zhang
Deyi Xiong
50
4
0
28 Feb 2024
VL-Trojan: Multimodal Instruction Backdoor Attacks against
  Autoregressive Visual Language Models
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Jiawei Liang
Siyuan Liang
Man Luo
Aishan Liu
Dongchen Han
Ee-Chien Chang
Xiaochun Cao
42
38
0
21 Feb 2024
LoRA+: Efficient Low Rank Adaptation of Large Models
LoRA+: Efficient Low Rank Adaptation of Large Models
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
46
148
0
19 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
Pretrained Generative Language Models as General Learning Frameworks for
  Sequence-Based Tasks
Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks
Ben Fauber
32
2
0
08 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Roland Hafner
N. Heess
Martin Riedmiller
OffRL
LRM
27
12
0
08 Feb 2024
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data
Wei-Yao Wang
Wei-Wei Du
Derek Xu
Wei Wang
Wenjie Peng
LMTD
38
7
0
02 Feb 2024
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Parth Sarthi
Salman Abdullah
Aditi Tuli
Shubh Khanna
Anna Goldie
Christopher D. Manning
RALM
19
123
0
31 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian
  Portuguese
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
30
9
0
30 Jan 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large
  Models
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
43
16
0
29 Jan 2024
Learning Universal Predictors
Learning Universal Predictors
Jordi Grau-Moya
Tim Genewein
Marcus Hutter
Laurent Orseau
Grégoire Delétang
...
Anian Ruoss
Wenliang Kevin Li
Christopher Mattern
Matthew Aitchison
J. Veness
27
11
0
26 Jan 2024
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
Samaneh Shafee
A. Bessani
Pedro M. Ferreira
31
19
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple
  Decoding Heads
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
52
252
0
19 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
69
35
0
16 Jan 2024
Previous
123456789
Next