ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.10379
  4. Cited By
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
  Workflows

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

16 February 2024
Ajay Patel
Colin Raffel
Chris Callison-Burch
    SyDa
    AI4CE
ArXivPDFHTML

Papers citing "DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows"

22 / 22 papers shown
Title
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding
Siyang Jiang
Bufang Yang
Lilin Xu
Mu Yuan
Yeerzhati Abudunuer
...
Liekang Zeng
Hongkai Chen
Zhenyu Yan
Xiaofan Jiang
Guoliang Xing
VLM
92
0
0
03 May 2025
High-Fidelity And Complex Test Data Generation For Real-World SQL Code Generation Services
High-Fidelity And Complex Test Data Generation For Real-World SQL Code Generation Services
Shivasankari Kannan
Yeounoh Chung
Amita Gondi
Tristan Swadell
Fatma Ozcan
27
0
0
24 Apr 2025
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Mihai Nadas
Laura Diosan
Andreea Tomescu
SyDa
72
0
0
18 Mar 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Yi-Ling Chung
Aurora Cobo
Pablo Serna
SyDa
HILM
63
0
0
24 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Yuqing Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
78
2
0
21 Feb 2025
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Ajay Patel
Jiacheng Zhu
Justin Qiu
Zachary Horvitz
Marianna Apidianaki
Kathleen McKeown
Chris Callison-Burch
63
3
0
16 Oct 2024
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Syeda Nahida Akter
Shrimai Prabhumoye
John Kamalu
S. Satheesh
Eric Nyberg
M. Patwary
M. Shoeybi
Bryan Catanzaro
LRM
SyDa
ReLM
98
1
0
15 Oct 2024
Did You Hear That? Introducing AADG: A Framework for Generating
  Benchmark Data in Audio Anomaly Detection
Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection
Ksheeraja Raghavan
Samiran Gode
Ankit Parag Shah
Surabhi Raghavan
Wolfram Burgard
Bhiksha Raj
Rita Singh
25
0
0
04 Oct 2024
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
Zeyu Gan
Yong Liu
SyDa
43
1
0
02 Oct 2024
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Catherine Yeh
Donghao Ren
Yannick Assogba
Dominik Moritz
Fred Hohman
36
0
0
01 Oct 2024
AI-Assisted Generation of Difficult Math Questions
AI-Assisted Generation of Difficult Math Questions
Vedant Shah
Dingli Yu
Kaifeng Lyu
Simon Park
Nan Rosemary Ke
...
Yoshua Bengio
Sanjeev Arora
Anirudh Goyal
Sanjeev Arora
Anirudh Goyal
47
15
0
30 Jul 2024
On Pre-training of Multimodal Language Models Customized for Chart
  Understanding
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Wan-Cyuan Fan
Yen-Chun Chen
Mengchen Liu
Lu Yuan
Leonid Sigal
45
5
0
19 Jul 2024
SELF-GUIDE: Better Task-Specific Instruction Following via
  Self-Synthetic Finetuning
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Chenyang Zhao
Xueying Jia
Vijay Viswanathan
Tongshuang Wu
Graham Neubig
SyDa
ALM
45
25
0
16 Jul 2024
Training Task Experts through Retrieval Based Distillation
Training Task Experts through Retrieval Based Distillation
Jiaxin Ge
Xueying Jia
Vijay Viswanathan
Hongyin Luo
Graham Neubig
38
3
0
07 Jul 2024
SS-Bench: A Benchmark for Social Story Generation and Evaluation
SS-Bench: A Benchmark for Social Story Generation and Evaluation
Yi Feng
Mingyang Song
Jiaqi Wang
Mao Zheng
Liping Jing
Jian-hong Yu
27
0
0
22 Jun 2024
Is Programming by Example solved by LLMs?
Is Programming by Example solved by LLMs?
Wen-Ding Li
Kevin Ellis
37
10
0
12 Jun 2024
Improving Text Generation on Images with Synthetic Captions
Improving Text Generation on Images with Synthetic Captions
Jun Young Koh
Sang Hyun Park
Joy Song
DiffM
51
2
0
01 Jun 2024
Large Language Models Can Self-Improve At Web Agent Tasks
Large Language Models Can Self-Improve At Web Agent Tasks
Ajay Patel
M. Hofmarcher
Claudiu Leoveanu-Condrei
Marius-Constantin Dinu
Chris Callison-Burch
Sepp Hochreiter
LLMAG
21
23
0
30 May 2024
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
220
499
0
03 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
224
572
0
03 May 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,657
0
15 Oct 2021
1