Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.07540
Cited By
Generating Datasets with Pretrained Language Models
15 April 2021
Timo Schick
Hinrich Schütze
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Datasets with Pretrained Language Models"
50 / 61 papers shown
Title
Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model
Mingruo Yuan
Ben Kao
Tien-Hsuan Wu
Michael M. K. Cheung
Henry W. H. Chan
Anne S. Y. Cheung
Felix W. H. Chan
Yongxi Chen
AILaw
ELM
166
3
0
07 May 2025
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
180
0
0
22 Oct 2024
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar
Sonal Kumar
Hemant Kumar Giri
Nishit Anand
Ashish Seth
Sreyan Ghosh
Dinesh Manocha
AuLLM
VLM
55
1
0
21 Oct 2024
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
Artem Snegirev
Maria Tikhonova
Anna Maksimova
Alena Fenogenova
Alexander Abramov
34
4
0
22 Aug 2024
GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory
Wei Fan
Haoran Li
Zheye Deng
Weiqi Wang
Yangqiu Song
AILaw
35
9
0
17 Jun 2024
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Patrick Emami
Zhaonan Li
Saumya Sinha
Truc Nguyen
56
1
0
30 May 2024
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li
Bhavan A. Jasani
Peng Tang
Shabnam Ghadar
LRM
39
8
0
25 Mar 2024
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Boshi Wang
Hao Fang
Jason Eisner
Benjamin Van Durme
Yu-Chuan Su
CLL
29
7
0
07 Mar 2024
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
Wenlong Deng
Blair Chen
Beidi Zhao
Chiyu Zhang
Xiaoxiao Li
Christos Thrampoulidis
35
0
0
22 Feb 2024
GIRT-Model: Automated Generation of Issue Report Templates
Nafiseh Nikeghbal
Amir Hossein Kargaran
Abbas Heydarnoori
25
2
0
04 Feb 2024
Faithful Persona-based Conversational Dataset Generation with Large Language Models
Pegah Jandaghi
XiangHai Sheng
Xinyi Bai
Jay Pujara
Hakim Sidahmed
37
21
0
15 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
51
12
0
04 Dec 2023
The Role of Federated Learning in a Wireless World with Foundation Models
Zihan Chen
Howard H. Yang
Y. C. Tay
Kai Fong Ernest Chong
Tony Q.S. Quek
AI4CE
29
6
0
06 Oct 2023
Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges
Vinay Samuel
Houda Aynaou
Arijit Ghosh Chowdhury
Karthik Venkat Ramanan
Aman Chadha
SyDa
33
7
0
21 Sep 2023
Collective Human Opinions in Semantic Textual Similarity
Yuxia Wang
Shimin Tao
Ning Xie
Hao Yang
Timothy Baldwin
Karin Verspoor
29
4
0
08 Aug 2023
I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection
Yongzhu Chang
Rongsheng Zhang
Jiashu Pu
38
1
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
46
3
0
08 Aug 2023
PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records
Viktor Schlegel
Hao Li
Yuping Wu
Anand Subramanian
Thanh-Tung Nguyen
...
Daniel Beck
Xiaojun Zeng
R. Batista-Navarro
Stefan Winkler
Goran Nenadic
LM&MA
MedIm
29
9
0
05 Jul 2023
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Qiang Zhang
Jason Naradowsky
Yusuke Miyao
ELM
26
32
0
29 May 2023
A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond
Abhinav Ramesh Kashyap
Thang-Tung Nguyen
Viktor Schlegel
Stefan Winkler
See-Kiong Ng
Soujanya Poria
AI4TS
3DV
SSL
37
6
0
22 May 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
50
412
0
31 Mar 2023
Language Model Crossover: Variation through Few-Shot Prompting
Elliot Meyerson
M. Nelson
Herbie Bradley
Adam Gaier
Arash Moradi
Amy K. Hoover
Joel Lehman
VLM
40
79
0
23 Feb 2023
What happens before and after: Multi-Event Commonsense in Event Coreference Resolution
Sahithya Ravi
Christy Tanner
R. Ng
Vered Shwarz
45
16
0
20 Feb 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDa
RALM
43
1,604
0
09 Feb 2023
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness
Shuaichen Chang
Jun Wang
Mingwen Dong
Lin Pan
Henghui Zhu
...
William Yang Wang
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Bing Xiang
OOD
44
34
0
21 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
Leonid Boytsov
Preksha Patel
Vivek Sourabh
Riddhi Nisar
Sayan Kundu
R. Ramanathan
Eric Nyberg
29
19
0
08 Jan 2023
Geographic and Geopolitical Biases of Language Models
Fahim Faisal
Antonios Anastasopoulos
22
20
0
20 Dec 2022
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
ALM
48
362
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
367
0
19 Dec 2022
SumREN: Summarizing Reported Speech about Events in News
R. Reddy
Heba Elfardy
Hou Pong Chan
Kevin Small
Chenhui Xu
28
5
0
02 Dec 2022
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
Yu Meng
Martin Michalski
Jiaxin Huang
Yu Zhang
Tarek F. Abdelzaher
Jiawei Han
VLM
56
47
0
06 Nov 2022
Pneg: Prompt-based Negative Response Generation for Dialogue Response Selection Task
Nyoungwoo Lee
chaeHun Park
Ho-Jin Choi
Jaegul Choo
30
6
0
31 Oct 2022
GPS: Genetic Prompt Search for Efficient Few-shot Learning
Hanwei Xu
Yujun Chen
Yulun Du
Nan Shao
Yanggang Wang
Haiyu Li
Zhilin Yang
VLM
14
28
0
31 Oct 2022
Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues
Jiao Ou
Jinchao Zhang
Yang Feng
Jie Zhou
36
13
0
30 Oct 2022
Contrastive Search Is What You Need For Neural Text Generation
Yixuan Su
Nigel Collier
25
50
0
25 Oct 2022
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
Melanie Sclar
Peter West
Sachin Kumar
Yulia Tsvetkov
Yejin Choi
22
19
0
25 Oct 2022
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
Jiacheng Ye
Jiahui Gao
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
VLM
78
72
0
22 Oct 2022
Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks
Laura Aina
Nikos Voskarides
Roi Blanco
22
0
0
21 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
34
0
0
14 Oct 2022
Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP
Johann Frei
Frank Kramer
29
1
0
30 Aug 2022
ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in Natural Language Understanding Dataset
Zhihua Jin
Xingbo Wang
Furui Cheng
Chunhui Sun
Qun Liu
Huamin Qu
32
9
0
17 Aug 2022
Plot Writing From Pre-Trained Language Models
Yiping Jin
Vishakha Kadam
Dittaya Wanvarie
ReLM
27
2
0
07 Jun 2022
Leveraging QA Datasets to Improve Generative Data Augmentation
Dheeraj Mekala
Tu Vu
Timo Schick
Jingbo Shang
27
18
0
25 May 2022
Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation
Kevin Yang
Olivia Deng
Charles C. Chen
Richard Shin
Subhro Roy
Benjamin Van Durme
51
10
0
18 May 2022
Few-shot Mining of Naturally Occurring Inputs and Outputs
Mandar Joshi
Terra Blevins
M. Lewis
Daniel S. Weld
Luke Zettlemoyer
33
1
0
09 May 2022
Language Models in the Loop: Incorporating Prompting into Weak Supervision
Ryan Smith
Jason Alan Fries
Braden Hancock
Stephen H. Bach
53
53
0
04 May 2022
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
37
67
0
24 Mar 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
45
212
0
16 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
82
212
0
16 Jan 2022
1
2
Next