Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.15126
Cited By
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
14 June 2024
Lin Long
Rui Wang
Ruixuan Xiao
Junbo Zhao
Xiao Ding
Gang Chen
Haobo Wang
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey"
32 / 32 papers shown
Title
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Abdelkarim El-Hajjami
Camille Salinesi
SyDa
34
0
0
06 May 2025
A Typology of Synthetic Datasets for Dialogue Processing in Clinical Contexts
Steven Bedrick
A. Seza Doğruöz
Sergiu Nisioi
131
0
0
05 May 2025
AKD : Adversarial Knowledge Distillation For Large Language Models Alignment on Coding tasks
Ilyas Oulkadda
Julien Perez
ALM
42
0
0
05 May 2025
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding
Siyang Jiang
Bufang Yang
Lilin Xu
Mu Yuan
Yeerzhati Abudunuer
...
Liekang Zeng
Hongkai Chen
Zhenyu Yan
Xiaofan Jiang
Guoliang Xing
VLM
86
0
0
03 May 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
57
0
0
29 Apr 2025
LLM-based Semantic Augmentation for Harmful Content Detection
Elyas Meguellati
Assaad Zeghina
S. Sadiq
Gianluca Demartini
34
0
0
22 Apr 2025
Leveraging LLMs for User Stories in AI Systems: UStAI Dataset
Asma Z. Yamani
Malak Baslyman
Moataz Ahmed
28
0
0
01 Apr 2025
Synthetic News Generation for Fake News Classification
Abdul Sittar
Luka Golob
Mateja Smiljanic
35
0
0
31 Mar 2025
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee
Russell Scheinberg
Amber Shore
Ameeta Agrawal
46
1
0
13 Mar 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
80
1
0
25 Feb 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Yi-Ling Chung
Aurora Cobo
Pablo Serna
SyDa
HILM
58
0
0
24 Feb 2025
Man Made Language Models? Evaluating LLMs' Perpetuation of Masculine Generics Bias
Enzo Doyen
Amalia Todirascu
40
0
0
14 Feb 2025
Measuring Diversity in Synthetic Datasets
Yuchang Zhu
Huizhe Zhang
Bingzhe Wu
Jintang Li
Zibin Zheng
Peilin Zhao
Liang Chen
Yatao Bian
100
0
0
12 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching
Jiyuan Ren
Zhaocheng Du
Zhihao Wen
Qinglin Jia
Sunhao Dai
Chuhan Wu
Zhenhua Dong
SyDa
77
0
0
09 Feb 2025
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
Saptarshi Sengupta
Kristal Curtis
Akshay Mallipeddi
Abhinav Mathur
Joseph Ross
Liang Gou
Liang Gou
LLMAG
SyDa
125
1
0
28 Nov 2024
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
Gabriel Chua
Shing Yee Chan
Shaun Khoo
75
1
0
20 Nov 2024
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen
Philip Arthur
Qianyu Feng
Cong Duy Vu Hoang
Yu-Heng Hong
...
Mark Johnson
K. K.
Don Dharmasiri
Long Duong
Yuan-Fang Li
SyDa
58
1
0
16 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
139
0
0
07 Oct 2024
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment
Chengfeng Dou
Y. Zhang
Zhi Jin
Wenpin Jiao
Haiyan Zhao
Yongqiang Zhao
Zhengwei Tao
30
0
0
05 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
47
2
0
02 Oct 2024
Efficacy of Synthetic Data as a Benchmark
Gaurav Maheshwari
Dmitry Ivanov
Kevin El Haddad
SyDa
18
6
0
18 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
RAGent: Retrieval-based Access Control Policy Generation
Sakuna Jayasundara
N. Arachchilage
Giovanni Russello
51
1
0
08 Sep 2024
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction
Martin Josifoski
Marija Sakota
Maxime Peyrard
Robert West
SyDa
56
78
0
07 Mar 2023
Mixture of Soft Prompts for Controllable Data Generation
Derek Chen
Celine Lee
Yunan Lu
Domenic Rosati
Zhou Yu
109
22
0
02 Mar 2023
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
Jiacheng Ye
Jiahui Gao
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
VLM
73
70
0
22 Oct 2022
Creating Training Sets via Weak Indirect Supervision
Jieyu Zhang
Bohan Wang
Xiangchen Song
Yujing Wang
Yaming Yang
Jing Bai
Alexander Ratner
OffRL
51
17
0
07 Oct 2021
What Makes Good In-Context Examples for GPT-
3
3
3
?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,312
0
17 Jan 2021
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
180
451
0
10 Mar 2020
A Survey on Knowledge Graphs: Representation, Acquisition and Applications
Shaoxiong Ji
Shirui Pan
Erik Cambria
Pekka Marttinen
Philip S. Yu
181
1,940
0
02 Feb 2020
FewRel 2.0: Towards More Challenging Few-Shot Relation Classification
Tianyu Gao
Xu Han
Hao Zhu
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
205
244
0
16 Oct 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1