ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06467
  4. Cited By
SynthBio: A Case Study in Human-AI Collaborative Curation of Text
  Datasets

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

11 November 2021
Ann Yuan
Daphne Ippolito
Vitaly Nikolaev
Chris Callison-Burch
Andy Coenen
Sebastian Gehrmann
    SyDa
ArXivPDFHTML

Papers citing "SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets"

18 / 18 papers shown
Title
The Evolution of LLM Adoption in Industry Data Curation Practices
The Evolution of LLM Adoption in Industry Data Curation Practices
Crystal Qian
Michael Xieyang Liu
Emily Reif
Grady Simon
Nada Hussein
Nathan Clement
James Wexler
Carrie J. Cai
Michael Terry
Minsuk Kahng
AILaw
ELM
75
4
0
20 Dec 2024
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
Emad A. Alghamdi
Reem I. Masoud
Deema Alnuhait
Afnan Y. Alomairi
Ahmed Ashraf
Mohamed Zaytoon
42
4
0
14 Mar 2024
Aya Dataset: An Open-Access Collection for Multilingual Instruction
  Tuning
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
Shivalika Singh
Freddie Vargus
Daniel D'souza
Börje F. Karlsson
Abinaya Mahendiran
...
Max Bartolo
Julia Kreutzer
A. Ustun
Marzieh Fadaee
Sara Hooker
119
117
0
09 Feb 2024
How Far Can We Extract Diverse Perspectives from Large Language Models?
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
40
10
0
16 Nov 2023
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
  Language Models for Data Annotation
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation
Minzhi Li
Taiwei Shi
Caleb Ziems
Min-Yen Kan
Nancy F. Chen
Zhengyuan Liu
Diyi Yang
29
68
0
24 Oct 2023
Choice-75: A Dataset on Decision Branching in Script Learning
Choice-75: A Dataset on Decision Branching in Script Learning
Zhaoyi Hou
Li Zhang
Chris Callison-Burch
32
4
0
21 Sep 2023
Visualizing Linguistic Diversity of Text Datasets Synthesized by Large
  Language Models
Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models
Emily Reif
Minsuk Kahng
S. Petridis
25
6
0
19 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
  Human Utility of Free-Text Rationales
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAI
LRM
26
29
0
11 May 2023
Contrastive Error Attribution for Finetuned Language Models
Contrastive Error Attribution for Finetuned Language Models
Faisal Ladhak
Esin Durmus
Tatsunori Hashimoto
HILM
25
9
0
21 Dec 2022
TaTa: A Multilingual Table-to-Text Dataset for African Languages
TaTa: A Multilingual Table-to-Text Dataset for African Languages
Sebastian Gehrmann
Sebastian Ruder
Vitaly Nikolaev
Jan A. Botha
Michael Chavinda
Ankur P. Parikh
Clara E. Rivera
LMTD
19
10
0
31 Oct 2022
Unsupervised Text Deidentification
Unsupervised Text Deidentification
John X. Morris
Justin T. Chiu
Ramin Zabih
Alexander M. Rush
16
6
0
20 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
62
2,989
0
20 Oct 2022
How to Prompt? Opportunities and Challenges of Zero- and Few-Shot
  Learning for Human-AI Interaction in Creative Applications of Generative
  Models
How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models
Hai Dang
Lukas Mecke
Florian Lehmann
Sven Goller
Daniel Buschek
20
97
0
03 Sep 2022
Data Representativity for Machine Learning and AI Systems
Data Representativity for Machine Learning and AI Systems
Line H. Clemmensen
R. Kjærsgaard
28
19
0
09 Mar 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
50
211
0
16 Jan 2022
Medically Aware GPT-3 as a Data Generator for Medical Dialogue
  Summarization
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
Bharath Chintagunta
Namit Katariya
X. Amatriain
Anitha Kannan
LM&MA
MedIm
128
149
0
09 Sep 2021
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
99
93
0
06 Aug 2021
The Impact of Multiple Parallel Phrase Suggestions on Email Input and
  Composition Behaviour of Native and Non-Native English Writers
The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers
Daniel Buschek
Martin Zurn
Malin Eiband
118
99
0
22 Jan 2021
1