Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.02318
Cited By
Diversity Measurement and Subset Selection for Instruction Tuning Datasets
4 February 2024
Peiqi Wang
Songlin Yang
Zhen Guo
Matt Stallone
Yoon Kim
Polina Golland
Yikang Shen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Diversity Measurement and Subset Selection for Instruction Tuning Datasets"
11 / 11 papers shown
Title
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
Yicheng Chen
Yining Li
Kai Hu
Zerun Ma
Haochen Ye
Kai Chen
34
0
0
18 Apr 2025
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
Yisen Li
Lingfeng Yang
Wenxuan Shen
Pan Zhou
Yao Wan
Weiwei Lin
Danny Chen
75
0
0
03 Mar 2025
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
Arnav M. Das
Gantavya Bhatt
Lilly Kumari
Sahil Verma
J. Bilmes
31
0
0
23 Dec 2024
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Simon Yu
Liangyu Chen
Sara Ahmadian
Marzieh Fadaee
37
7
0
17 Sep 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
44
180
0
02 May 2024
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
81
36
0
21 Nov 2023
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
Dan Friedman
Adji Bousso Dieng
EGVM
94
111
0
05 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,661
0
15 Oct 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
Determinantal point processes for machine learning
Alex Kulesza
B. Taskar
165
1,123
0
25 Jul 2012
1