ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.15895
  4. Cited By
Large Language Model as Attributed Training Data Generator: A Tale of
  Diversity and Bias

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

28 June 2023
Yue Yu
Yuchen Zhuang
Jieyu Zhang
Yu Meng
Alexander Ratner
Ranjay Krishna
Jiaming Shen
Chao Zhang
    ALM
ArXivPDFHTML

Papers citing "Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias"

43 / 43 papers shown
Title
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
Viacheslav Vasilev
V. Arkhipkin
Julia Agafonova
Tatiana Nikulina
Evelina Mironova
Alisa Shichanina
Nikolai Gerasimenko
Mikhail Shoytov
Denis Dimitrov
46
0
0
07 May 2025
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Abdelkarim El-Hajjami
Camille Salinesi
SyDa
39
0
0
06 May 2025
An Illusion of Progress? Assessing the Current State of Web Agents
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
Presented at ResearchTrend Connect | LLMAG on 21 May 2025
108
4
1
02 Apr 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
77
3
0
24 Feb 2025
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
Shenglai Zeng
Jiankun Zhang
Pengfei He
J. Ren
Tianqi Zheng
Hanqing Lu
Han Xu
Hui Liu
Yue Xing
Jiliang Tang
146
9
0
21 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Yuqing Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
78
2
0
20 Feb 2025
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Dylan Zhang
Justin Wang
Tianran Sun
56
1
0
17 Feb 2025
Measuring Diversity in Synthetic Datasets
Measuring Diversity in Synthetic Datasets
Yuchang Zhu
Huizhe Zhang
Bingzhe Wu
Jintang Li
Zibin Zheng
Peilin Zhao
Liang Chen
Yatao Bian
100
0
0
12 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching
Few-shot LLM Synthetic Data with Distribution Matching
Jiyuan Ren
Zhaocheng Du
Zhihao Wen
Qinglin Jia
Sunhao Dai
Chuhan Wu
Zhenhua Dong
SyDa
87
0
0
09 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
130
9
0
05 Feb 2025
Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models
Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models
Riddhi More
Jeremy S. Bradbury
59
0
0
03 Feb 2025
Diverse Preference Optimization
Diverse Preference Optimization
Jack Lanchantin
Angelica Chen
S. Dhuliawala
Ping Yu
Jason Weston
Sainbayar Sukhbaatar
Ilia Kulikov
97
4
0
30 Jan 2025
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Ekaterina Artemova
Akim Tsvigun
Dominik Schlechtweg
Natalia Fedorova
Konstantin Chernyshev
Sergei Tilga
Boris Obmoroshev
SyDa
VLM
163
0
0
28 Jan 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
69
14
0
28 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Qingyang Wu
Ying Xu
Tingsong Xiao
Yunze Xiao
Yitong Li
...
Yichi Zhang
Shanghai Zhong
Yuwei Zhang
Wei Lu
Yifan Yang
78
2
0
17 Jan 2025
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Huawen Feng
Pu Zhao
Qingfeng Sun
Can Xu
Fangkai Yang
...
Qianli Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
AAML
ALM
62
0
0
23 Dec 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Montessori-Instruct: Generate Influential Training Data Tailored for
  Student Learning
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li
Zichun Yu
Chenyan Xiong
SyDa
33
1
0
18 Oct 2024
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Ajay Patel
Jiacheng Zhu
Justin Qiu
Zachary Horvitz
Marianna Apidianaki
Kathleen McKeown
Chris Callison-Burch
63
3
0
16 Oct 2024
Mitigating Propensity Bias of Large Language Models for Recommender Systems
Mitigating Propensity Bias of Large Language Models for Recommender Systems
Guixian Zhang
Guan Yuan
Debo Cheng
Lin Liu
Jiuyong Li
Shichao Zhang
44
2
0
30 Sep 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Zheng Hui
Zhaoxiao Guo
Hang Zhao
Juanyong Duan
Congrui Huang
44
6
0
23 Sep 2024
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic
  CheckLists
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Raoyuan Zhao
Abdullatif Köksal
Yihong Liu
Leonie Weissweiler
Anna Korhonen
Hinrich Schütze
SyDa
41
1
0
30 Aug 2024
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
SyDa
67
4
0
22 Jul 2024
Leveraging Large Language Models for Integrated
  Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
39
9
0
05 Jul 2024
When Search Engine Services meet Large Language Models: Visions and
  Challenges
When Search Engine Services meet Large Language Models: Visions and Challenges
Haoyi Xiong
Jiang Bian
Yuchen Li
Xuhong Li
Jundong Li
Shuaiqiang Wang
Dawei Yin
Sumi Helal
53
29
0
28 Jun 2024
1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge
  Aggregators?
1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?
Yue Huang
Chenrui Fan
Yuan Li
Siyuan Wu
Tianyi Zhou
Xiangliang Zhang
Lichao Sun
53
3
0
20 Jun 2024
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction
  Tuning
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
Jiaqi Li
Yixuan Tang
Yi Yang
46
5
0
14 Jun 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
38
35
0
31 May 2024
Privacy Preserving Prompt Engineering: A Survey
Privacy Preserving Prompt Engineering: A Survey
Kennedy Edemacu
Xintao Wu
53
18
0
09 Apr 2024
Measuring Political Bias in Large Language Models: What Is Said and How
  It Is Said
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
Yejin Bang
Delong Chen
Nayeon Lee
Pascale Fung
37
26
0
27 Mar 2024
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators
  for Reasoning-Based Chart VQA
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li
Bhavan A. Jasani
Peng Tang
Shabnam Ghadar
LRM
39
8
0
25 Mar 2024
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Yunyi Zhang
Ruozhen Yang
Xueqiang Xu
Rui Li
Jinfeng Xiao
Jiaming Shen
Jiawei Han
45
10
0
29 Feb 2024
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
Wenlong Deng
Blair Chen
Beidi Zhao
Chiyu Zhang
Xiaoxiao Li
Christos Thrampoulidis
35
0
0
22 Feb 2024
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
W. Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric Xing
49
70
0
11 Dec 2023
When does In-context Learning Fall Short and Why? A Study on
  Specification-Heavy Tasks
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Y. Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
34
28
0
15 Nov 2023
PolyIE: A Dataset of Information Extraction from Polymer Material
  Scientific Literature
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature
Jerry Junyang Cheung
Yuchen Zhuang
Yinghao Li
Pranav Shetty
Wantian Zhao
Sanjeev Grampurohit
R. Ramprasad
Chao Zhang
AI4CE
14
11
0
13 Nov 2023
Bias Testing and Mitigation in LLM-based Code Generation
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
48
20
0
03 Sep 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
165
579
0
06 Apr 2023
Mixture of Soft Prompts for Controllable Data Generation
Mixture of Soft Prompts for Controllable Data Generation
Derek Chen
Celine Lee
Yunan Lu
Domenic Rosati
Zhou Yu
117
22
0
02 Mar 2023
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
Jiacheng Ye
Jiahui Gao
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
VLM
78
72
0
22 Oct 2022
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Jiahui Gao
Renjie Pi
Yong Lin
Hang Xu
Jiacheng Ye
Zhiyong Wu
Weizhong Zhang
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
SyDa
VLM
75
45
0
25 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
366
12,003
0
04 Mar 2022
RAFT: A Real-World Few-Shot Text Classification Benchmark
RAFT: A Real-World Few-Shot Text Classification Benchmark
Neel Alex
Eli Lifland
Lewis Tunstall
A. Thakur
Pegah Maham
...
Carolyn Ashurst
Paul Sedille
A. Carlier
M. Noetel
Andreas Stuhlmuller
RALM
184
56
0
28 Sep 2021
1