Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09807
Cited By
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
16 November 2023
Yanzhu Guo
Guokan Shang
Michalis Vazirgiannis
Chloé Clavel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text"
14 / 14 papers shown
Title
Machine-generated text detection prevents language model collapse
George Drayson
Emine Yilmaz
Vasileios Lampos
DeLMO
62
0
0
20 May 2025
Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
Grgur Kovač
Jérémy Perez
Rémy Portelas
Peter Ford Dominey
Pierre-Yves Oudeyer
35
0
0
04 Apr 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed
Mingmeng Geng
Michalis Vazirgiannis
Guokan Shang
75
1
0
27 Feb 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
69
3
0
24 Feb 2025
Aligning Instruction Tuning with Pre-training
Yiming Liang
Tianyu Zheng
Xinrun Du
Ge Zhang
J. Liu
...
Zhaoxiang Zhang
Wenhao Huang
Jiajun Zhang
Xiang Yue
Jiajun Zhang
86
1
0
16 Jan 2025
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan
Rylan Schaeffer
Apratim Dey
Matthias Gerstgrasser
Rafael Rafailov
D. Donoho
Sanmi Koyejo
50
11
0
22 Oct 2024
Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models
Mengze Hong
Yuanfeng Song
Di Jiang
Lu Wang
Zichang Guo
Yuanqin He
Zhiyang Su
Qing Li
35
2
0
16 Oct 2024
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
54
22
0
22 Apr 2024
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
66
33
0
01 Mar 2024
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,248
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Data Augmentation Approaches in Natural Language Processing: A Survey
Bohan Li
Yutai Hou
Wanxiang Che
124
270
0
05 Oct 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
204
1,653
0
16 Mar 2020
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
175
3,509
0
10 Jun 2015
1