ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.08553
  4. Cited By
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs

CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs

13 November 2024
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
    SyDa
ArXiv (abs)PDFHTML

Papers citing "CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs"

35 / 35 papers shown
Title
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation
Abhishek Divekar
Greg Durrett
128
8
0
16 May 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRMALM
169
1,265
0
22 Apr 2024
Genie: Achieving Human Parity in Content-Grounded Datasets Generation
Genie: Achieving Human Parity in Content-Grounded Datasets Generation
Asaf Yehudai
Boaz Carmeli
Y. Mass
Ofir Arviv
Nathaniel Mills
Assaf Toledo
Eyal Shnarch
Leshem Choshen
62
25
0
25 Jan 2024
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
  Language Models by Extrapolating Errors from Small Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
85
32
0
20 Oct 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
169
1,723
0
06 Jul 2023
Stay on topic with Classifier-Free Guidance
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
105
55
0
30 Jun 2023
Large Language Model as Attributed Training Data Generator: A Tale of
  Diversity and Bias
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Yue Yu
Yuchen Zhuang
Jieyu Zhang
Yu Meng
Alexander Ratner
Ranjay Krishna
Jiaming Shen
Chao Zhang
ALM
102
234
0
28 Jun 2023
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Weijia Shi
Xiaochuang Han
M. Lewis
Yulia Tsvetkov
Luke Zettlemoyer
Scott Yih
HILM
75
213
0
24 May 2023
ReGen: Zero-Shot Text Classification via Training Data Generation with
  Progressive Dense Retrieval
ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval
Yue Yu
Yuchen Zhuang
Rongzhi Zhang
Yu Meng
Jiaming Shen
Chao Zhang
VLM
73
37
0
18 May 2023
The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers
The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers
Ariel Gera
Roni Friedman
Ofir Arviv
Chulaka Gunasekara
Benjamin Sznajder
Noam Slonim
Eyal Shnarch
91
22
0
02 May 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALMSyDaLRM
164
2,256
0
20 Dec 2022
Unnatural Instructions: Tuning Language Models with (Almost) No Human
  Labor
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
ALM
135
374
0
19 Dec 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
Contrastive Decoding: Open-ended Text Generation as Optimization
Xiang Lisa Li
Ari Holtzman
Daniel Fried
Percy Liang
Jason Eisner
Tatsunori Hashimoto
Luke Zettlemoyer
M. Lewis
130
374
0
27 Oct 2022
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
Jiacheng Ye
Jiahui Gao
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDaVLM
161
78
0
22 Oct 2022
Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
196
3,971
0
26 Jul 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
256
2,627
0
12 Apr 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
125
220
0
16 Feb 2022
Generating Training Data with Language Models: Towards Zero-Shot
  Language Understanding
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Yu Meng
Jiaxin Huang
Yu Zhang
Jiawei Han
SyDa
75
235
0
09 Feb 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
367
4,598
0
27 Oct 2021
Symbolic Knowledge Distillation: from General Language Models to
  Commonsense Models
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Peter West
Chandrasekhar Bhagavatula
Jack Hessel
Jena D. Hwang
Liwei Jiang
Ronan Le Bras
Ximing Lu
Sean Welleck
Yejin Choi
SyDa
114
332
0
14 Oct 2021
Towards Zero-Label Language Learning
Towards Zero-Label Language Learning
Zirui Wang
Adams Wei Yu
Orhan Firat
Yuan Cao
SyDa
244
105
0
19 Sep 2021
Divergence Frontiers for Generative Models: Sample Complexity,
  Quantization Effects, and Frontier Integrals
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals
Lang Liu
Krishna Pillutla
Sean Welleck
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
MQ
84
14
0
15 Jun 2021
Generating Datasets with Pretrained Language Models
Generating Datasets with Pretrained Language Models
Timo Schick
Hinrich Schütze
161
235
0
15 Apr 2021
Prompt Programming for Large Language Models: Beyond the Few-Shot
  Paradigm
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria Reynolds
Kyle McDonell
114
920
0
15 Feb 2021
Neural Data Augmentation via Example Extrapolation
Neural Data Augmentation via Example Extrapolation
Kenton Lee
Kelvin Guu
Luheng He
Timothy Dozat
Hyung Won Chung
76
72
0
02 Feb 2021
Dataset Cartography: Mapping and Diagnosing Datasets with Training
  Dynamics
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Swabha Swayamdipta
Roy Schwartz
Nicholas Lourie
Yizhong Wang
Hannaneh Hajishirzi
Noah A. Smith
Yejin Choi
140
452
0
22 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
905
42,520
0
28 May 2020
Training Question Answering Models From Synthetic Data
Training Question Answering Models From Synthetic Data
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
81
160
0
22 Feb 2020
How Can We Know What Language Models Know?
How Can We Know What Language Models Know?
Zhengbao Jiang
Frank F. Xu
Jun Araki
Graham Neubig
KELM
149
1,412
0
28 Nov 2019
Not Enough Data? Deep Learning to the Rescue!
Not Enough Data? Deep Learning to the Rescue!
Ateret Anaby-Tavor
Boaz Carmeli
Esther Goldbraich
Amir Kantor
George Kour
Segev Shlomov
N. Tepper
Naama Zwerdling
91
371
0
08 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
269
7,554
0
02 Oct 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,324
0
11 Oct 2018
UMAP: Uniform Manifold Approximation and Projection for Dimension
  Reduction
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes
John Healy
James Melville
205
9,492
0
09 Feb 2018
Texygen: A Benchmarking Platform for Text Generation Models
Texygen: A Benchmarking Platform for Text Generation Models
Yaoming Zhu
Sidi Lu
Lei Zheng
Jiaxian Guo
Weinan Zhang
Jun Wang
Yong Yu
115
693
0
06 Feb 2018
Character-level Convolutional Networks for Text Classification
Character-level Convolutional Networks for Text Classification
Xiang Zhang
Jiaqi Zhao
Yann LeCun
270
6,135
0
04 Sep 2015
1