ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.04325
  4. Cited By
Will we run out of data? Limits of LLM scaling based on human-generated
  data

Will we run out of data? Limits of LLM scaling based on human-generated data

26 October 2022
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
    ALM
ArXivPDFHTML

Papers citing "Will we run out of data? Limits of LLM scaling based on human-generated data"

24 / 74 papers shown
Title
The Curious Decline of Linguistic Diversity: Training Language Models on
  Synthetic Text
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
Yanzhu Guo
Guokan Shang
Michalis Vazirgiannis
Chloé Clavel
34
50
0
16 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of
  Large Language Models
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
30
8
0
08 Nov 2023
Market Concentration Implications of Foundation Models
Market Concentration Implications of Foundation Models
Jai Vipra
Anton Korinek
ELM
40
16
0
02 Nov 2023
Will Code Remain a Relevant User Interface for End-User Programming with
  Generative AI Models?
Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?
Advait Sarkar
26
16
0
01 Nov 2023
FedSplitX: Federated Split Learning for Computationally-Constrained
  Heterogeneous Clients
FedSplitX: Federated Split Learning for Computationally-Constrained Heterogeneous Clients
Jiyun Shin
Jinhyun Ahn
Honggu Kang
Joonhyuk Kang
FedML
65
6
0
23 Oct 2023
A State-Vector Framework for Dataset Effects
A State-Vector Framework for Dataset Effects
E. Sahak
Zining Zhu
Frank Rudzicz
30
1
0
17 Oct 2023
FATE-LLM: A Industrial Grade Federated Learning Framework for Large
  Language Models
FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models
Tao Fan
Yan Kang
Guoqiang Ma
Weijing Chen
Wenbin Wei
Lixin Fan
Qiang Yang
40
57
0
16 Oct 2023
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks,
  benefits, and alternative methods for pursuing open-source objectives
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
Elizabeth Seger
Noemi Dreksler
Richard Moulange
Emily Dardaman
Jonas Schuett
...
Emma Bluemke
Michael Aird
Patrick Levermore
Julian Hazell
Abhishek Gupta
20
40
0
29 Sep 2023
A Benchmark for Learning to Translate a New Language from One Grammar
  Book
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
24
51
0
28 Sep 2023
Chain-of-Thought Reasoning is a Policy Improvement Operator
Chain-of-Thought Reasoning is a Policy Improvement Operator
Hugh Zhang
David C. Parkes
ReLM
LM&Ro
LRM
31
12
0
15 Sep 2023
EarthPT: a time series foundation model for Earth Observation
EarthPT: a time series foundation model for Earth Observation
Michael J. Smith
Luke Fleming
James E. Geach
AI4TS
22
7
0
13 Sep 2023
FDAPT: Federated Domain-adaptive Pre-training for Language Models
FDAPT: Federated Domain-adaptive Pre-training for Language Models
Lekang Jiang
F. Svoboda
Nicholas D. Lane
FedML
AI4CE
72
4
0
12 Jul 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
99
85
0
27 Jun 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
  with Web Data, and Web Data Only
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
50
751
0
01 Jun 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
38
200
0
25 May 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
85
78
0
22 May 2023
A Symbolic Framework for Evaluating Mathematical Reasoning and
  Generalisation with Transformers
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
Jordan Meadows
Marco Valentino
Damien Teney
André Freitas
35
8
0
21 May 2023
Pretraining Language Models with Human Preferences
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
36
207
0
16 Feb 2023
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
354
12,003
0
04 Mar 2022
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
1,996
0
31 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
AI safety via debate
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
204
201
0
02 May 2018
Previous
12