ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.13098
  4. Cited By
A Little Human Data Goes A Long Way
v1v2 (latest)

A Little Human Data Goes A Long Way

17 October 2024
Dhananjay Ashok
Jonathan May
    SyDa
ArXiv (abs)PDFHTML

Papers citing "A Little Human Data Goes A Long Way"

45 / 45 papers shown
Title
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Toufique Ahmed
Premkumar Devanbu
Christoph Treude
Michael Pradel
125
19
0
10 Aug 2024
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Tao Ge
Xin Chan
Dian Yu
Haitao Mi
Dong Yu
Dong Yu
SyDa
199
150
0
28 Jun 2024
Is Synthetic Data all We Need? Benchmarking the Robustness of Models
  Trained with Synthetic Images
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Krishnakant Singh
Thanush Navaratnam
Jannik Holmer
Simone Schaub-Meyer
Stefan Roth
DiffM
72
19
0
30 May 2024
GPT is Not an Annotator: The Necessity of Human Annotation in Fairness
  Benchmark Construction
GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction
Virginia K. Felkner
Jennifer A. Thompson
Jonathan May
86
11
0
24 May 2024
SciQAG: A Framework for Auto-Generated Science Question Answering
  Dataset with Fine-grained Evaluation
SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation
Yuwei Wan
Yixuan Liu
Aswathy Ajith
Clara Grazian
B. Hoex
Wenjie Zhang
Chunyu Kit
Tong Xie
Ian Foster
84
10
0
16 May 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang
Philippe Laban
Greg Durrett
HILMSyDa
67
101
0
16 Apr 2024
How Bad is Training on Synthetic Data? A Statistical Analysis of
  Language Model Collapse
How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
M. Seddik
Suei-Wen Chen
Soufiane Hayou
Pierre Youssef
Merouane Debbah
91
36
0
07 Apr 2024
Large Language Models for Data Annotation: A Survey
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
90
77
0
21 Feb 2024
A synthetic data approach for domain generalization of NLI models
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
70
10
0
19 Feb 2024
AFaCTA: Assisting the Annotation of Factual Claim Detection with
  Reliable LLM Annotators
AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators
Jingwei Ni
Minjing Shi
Dominik Stammbach
Mrinmaya Sachan
Elliott Ash
Markus Leippold
HILM
70
14
0
16 Feb 2024
SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State
  Tracking
SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking
Atharva Kulkarni
Bo-Hsiang Tseng
Joel Ruben Antony Moniz
Dhivya Piraviperumal
Hong-ye Yu
Shruti Bhargava
SyDa
93
10
0
03 Feb 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DVRALM
211
1,814
1
18 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIPVLM
72
67
0
07 Dec 2023
Large Language Models Suffer From Their Own Output: An Analysis of the
  Self-Consuming Training Loop
Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop
Martin Briesch
Dominik Sobania
Franz Rothlauf
94
59
0
28 Nov 2023
The Curious Decline of Linguistic Diversity: Training Language Models on
  Synthetic Text
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
Yanzhu Guo
Guokan Shang
Michalis Vazirgiannis
Chloé Clavel
71
58
0
16 Nov 2023
Generating Benchmarks for Factuality Evaluation of Language Models
Generating Benchmarks for Factuality Evaluation of Language Models
Dor Muhlgay
Ori Ram
Inbal Magar
Yoav Levine
Nir Ratner
Yonatan Belinkov
Omri Abend
Kevin Leyton-Brown
Amnon Shashua
Y. Shoham
HILM
55
98
0
13 Jul 2023
Large Language Model as Attributed Training Data Generator: A Tale of
  Diversity and Bias
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Yue Yu
Yuchen Zhuang
Jieyu Zhang
Yu Meng
Alexander Ratner
Ranjay Krishna
Jiaming Shen
Chao Zhang
ALM
94
234
0
28 Jun 2023
The Curse of Recursion: Training on Generated Data Makes Models Forget
The Curse of Recursion: Training on Generated Data Makes Models Forget
Ilia Shumailov
Zakhar Shumaylov
Yiren Zhao
Y. Gal
Nicolas Papernot
Ross J. Anderson
DiffM
79
298
0
27 May 2023
AlignScore: Evaluating Factual Consistency with a Unified Alignment
  Function
AlignScore: Evaluating Factual Consistency with a Unified Alignment Function
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
HILM
85
207
0
26 May 2023
SciFix: Outperforming GPT3 on Scientific Factual Error Correction
SciFix: Outperforming GPT3 on Scientific Factual Error Correction
D. Ashok
Atharva Kulkarni
Hai Pham
Barnabas Poczos
58
1
0
24 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
134
605
0
22 May 2023
ReGen: Zero-Shot Text Classification via Training Data Generation with
  Progressive Dense Retrieval
ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval
Yue Yu
Yuchen Zhuang
Rongzhi Zhang
Yu Meng
Jiaming Shen
Chao Zhang
VLM
68
37
0
18 May 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
321
558
0
03 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Ruixiang Tang
Xiaotian Han
Xiaoqian Jiang
Xia Hu
LM&MAAI4MHSyDa
70
185
0
08 Mar 2023
Is synthetic data from generative models ready for image recognition?
Is synthetic data from generative models ready for image recognition?
Ruifei He
Shuyang Sun
Xin Yu
Chuhui Xue
Wenqing Zhang
Philip Torr
Song Bai
Xiaojuan Qi
98
302
0
14 Oct 2022
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic
  Dataset for Narrative Comprehension
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension
Ying Xu
Dakuo Wang
Mo Yu
Daniel E. Ritchie
Bingsheng Yao
...
Xiaojuan Ma
Diyi Yang
Nanyun Peng
Zhou Yu
M. Warschauer
AI4Ed
68
105
0
26 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
883
13,176
0
04 Mar 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
101
220
0
16 Feb 2022
Generating Training Data with Language Models: Towards Zero-Shot
  Language Understanding
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Yu Meng
Jiaxin Huang
Yu Zhang
Jiawei Han
SyDa
62
235
0
09 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
159
221
0
16 Jan 2022
Want To Reduce Labeling Cost? GPT-3 Can Help
Want To Reduce Labeling Cost? GPT-3 Can Help
Shuohang Wang
Yang Liu
Yichong Xu
Chenguang Zhu
Michael Zeng
69
257
0
30 Aug 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
865
42,379
0
28 May 2020
Fact or Fiction: Verifying Scientific Claims
Fact or Fiction: Verifying Scientific Claims
David Wadden
Shanchuan Lin
Kyle Lo
Lucy Lu Wang
Madeleine van Zuylen
Arman Cohan
Hannaneh Hajishirzi
HAI
144
459
0
30 Apr 2020
Reasoning Over Paragraph Effects in Situations
Reasoning Over Paragraph Effects in Situations
Kevin Lin
Oyvind Tafjord
Peter Clark
Matt Gardner
85
115
0
16 Aug 2019
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
352
5,860
0
21 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine
  Translation
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
Xinyi Wang
Hieu H. Pham
Zihang Dai
Graham Neubig
67
197
0
22 Aug 2018
CoQA: A Conversational Question Answering Challenge
CoQA: A Conversational Question Answering Challenge
Siva Reddy
Danqi Chen
Christopher D. Manning
RALMHAI
114
1,209
0
21 Aug 2018
Contextual Augmentation: Data Augmentation by Words with Paradigmatic
  Relations
Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
Sosuke Kobayashi
84
615
0
16 May 2018
FEVER: a large-scale dataset for Fact Extraction and VERification
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
159
1,666
0
14 Mar 2018
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
524
4,494
0
18 Apr 2017
Data Noising as Smoothing in Neural Network Language Models
Data Noising as Smoothing in Neural Network Language Models
Ziang Xie
Sida I. Wang
Jiwei Li
Daniel Levy
Allen Nie
Dan Jurafsky
A. Ng
54
238
0
07 Mar 2017
Improving Neural Machine Translation Models with Monolingual Data
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
257
2,723
0
20 Nov 2015
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Jason Weston
Antoine Bordes
S. Chopra
Alexander M. Rush
Bart van Merriënboer
Armand Joulin
Tomas Mikolov
LRMELM
150
1,181
0
19 Feb 2015
1