Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22830
Cited By
What Has Been Lost with Synthetic Evaluation?
28 May 2025
Alexander Gill
Abhilasha Ravichander
Ana Marasović
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Has Been Lost with Synthetic Evaluation?"
43 / 43 papers shown
Title
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Kareem Amin
Sara Babakniya
Alex Bie
Weiwei Kong
Umar Syed
Sergei Vassilvitskii
90
3
0
13 Feb 2025
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
Chengwen Qi
Ren Ma
Bowen Li
He Du
Binyuan Hui
Jinwang Wu
Yuanjun Laili
Conghui He
ReLM
LRM
112
5
0
10 Feb 2025
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology
Junior Cedric Tonga
Benjamin Clément
Pierre-Yves Oudeyer
LRM
47
3
0
05 Nov 2024
From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items
Melissa Roemmele
Andrew S. Gordon
47
2
0
18 Oct 2024
Efficacy of Synthetic Data as a Benchmark
Gaurav Maheshwari
Dmitry Ivanov
Kevin El Haddad
SyDa
52
8
0
18 Sep 2024
A Survey on Natural Language Counterfactual Generation
Yongjie Wang
Xiaoqi Qiu
Yu Yue
Xu Guo
Zhiwei Zeng
Yuhong Feng
Zhiqi Shen
52
8
0
04 Jul 2024
Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology
Federico Ruggeri
Eleonora Misino
Arianna Muti
Katerina Korre
Paolo Torroni
Alberto Barrón-Cedeño
82
1
0
20 Jun 2024
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Lin Long
Rui Wang
Ruixuan Xiao
Junbo Zhao
Xiao Ding
Gang Chen
Haobo Wang
SyDa
79
107
0
14 Jun 2024
CEval: A Benchmark for Evaluating Counterfactual Text Generation
Van Bach Nguyen
Jorg Schlotterer
Christin Seifert
66
7
0
26 Apr 2024
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
54
10
0
19 Feb 2024
Under the Surface: Tracking the Artifactuality of LLM-Generated Data
Debarati Das
Karin de Langis
Anna Martin
Jaehyung Kim
Minhwa Lee
...
Aahan Tyagi
Libby Ferland
Sanjali Roy
Vincent Liu
Dongyeop Kang
18
18
0
26 Jan 2024
Genie: Achieving Human Parity in Content-Grounded Datasets Generation
Asaf Yehudai
Boaz Carmeli
Y. Mass
Ofir Arviv
Nathaniel Mills
Assaf Toledo
Eyal Shnarch
Leshem Choshen
52
25
0
25 Jan 2024
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Peter West
Ximing Lu
Nouha Dziri
Faeze Brahman
Linjie Li
...
Khyathi Chandu
Benjamin Newman
Pang Wei Koh
Allyson Ettinger
Yejin Choi
AIMat
69
76
0
31 Oct 2023
Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges
Vinay Samuel
Houda Aynaou
Arijit Ghosh Chowdhury
Karthik Venkat Ramanan
Aman Chadha
SyDa
71
8
0
21 Sep 2023
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness
Ján Cegin
Jakub Simko
Peter Brusilovsky
60
44
0
22 May 2023
The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks
Anders Giovanni Møller
Jacob Aarup Dalsgaard
Arianna Pera
L. Aiello
86
37
0
26 Apr 2023
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
Fabrizio Gilardi
Meysam Alizadeh
M. Kubli
AI4MH
95
892
0
27 Mar 2023
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
Kalpesh Krishna
Yixiao Song
Marzena Karpinska
John Wieting
Mohit Iyyer
DeLMO
33
312
0
23 Mar 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALM
SyDa
LRM
79
2,166
0
20 Dec 2022
Reasoning Circuits: Few-shot Multihop Question Generation with Structured Rationales
Saurabh Kulshreshtha
Anna Rumshisky
ReLM
LRM
46
3
0
15 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Abhilasha Ravichander
Matt Gardner
Ana Marasović
61
35
0
01 Nov 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
161
25
0
10 Oct 2022
Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio
Fernando Alva-Manchego
Jose Camacho-Collados
ELM
34
46
0
08 Oct 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
71
215
0
16 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
120
219
0
16 Jan 2022
Tailor: Generating and Perturbing Text with Semantic Controls
Alexis Ross
Tongshuang Wu
Hao Peng
Matthew E. Peters
Matt Gardner
148
78
0
15 Jul 2021
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
Nikita Nangia
Saku Sugawara
H. Trivedi
Alex Warstadt
Clara Vania
Sam Bowman
101
36
0
01 Jun 2021
Factorising Meaning and Form for Intent-Preserving Paraphrasing
Tom Hosking
Mirella Lapata
OOD
39
41
0
31 May 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
Samuel R. Bowman
George E. Dahl
ELM
ALM
47
159
0
05 Apr 2021
Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing
Kun Wu
Lijie Wang
Zhenghua Li
Ao Zhang
Xinyan Xiao
Hua Wu
Min Zhang
Haifeng Wang
18
34
0
03 Mar 2021
Explaining NLP Models via Minimal Contrastive Editing (MiCE)
Alexis Ross
Ana Marasović
Matthew E. Peters
62
122
0
27 Dec 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
129
1,089
0
08 May 2020
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
99
247
0
11 Feb 2020
Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
Divyansh Kaushik
Eduard H. Hovy
Zachary Chase Lipton
CML
59
567
0
26 Sep 2019
Recent Advances in Neural Question Generation
Liangming Pan
Wenqiang Lei
Tat-Seng Chua
Min-Yen Kan
47
117
0
22 May 2019
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Dheeru Dua
Yizhong Wang
Pradeep Dasigi
Gabriel Stanovsky
Sameer Singh
Matt Gardner
AIMat
73
933
0
01 Mar 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
118
2,577
0
25 Sep 2018
Stress Test Evaluation for Natural Language Inference
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
ELM
59
375
0
02 Jun 2018
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
106
1,167
0
06 Mar 2018
The NarrativeQA Reading Comprehension Challenge
Tomás Kociský
Jonathan Richard Schwarz
Phil Blunsom
Chris Dyer
Karl Moritz Hermann
Gábor Melis
Edward Grefenstette
98
759
0
19 Dec 2017
Making Neural QA as Simple as Possible but not Simpler
Dirk Weissenborn
Georg Wiese
Laura Seiffe
50
210
0
14 Mar 2017
Bidirectional Attention Flow for Machine Comprehension
Minjoon Seo
Aniruddha Kembhavi
Ali Farhadi
Hannaneh Hajishirzi
105
2,088
0
05 Nov 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
153
8,067
0
16 Jun 2016
1