Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.14736
Cited By
v1
v2 (latest)
Data Diversity Matters for Robust Instruction Tuning
21 November 2023
Alexander Bukharin
Tuo Zhao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Data Diversity Matters for Robust Instruction Tuning"
12 / 12 papers shown
Title
ArgInstruct: Specialized Instruction Fine-Tuning for Computational Argumentation
Maja Stahl
Timon Ziegenbein
Joonsuk Park
Henning Wachsmuth
ALM
LRM
36
0
0
28 May 2025
Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
37
0
0
28 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
98
0
0
22 May 2025
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
Shaobo Wang
Xiangqi Jin
Ziming Wang
Jinqiao Wang
Jingyun Zhang
...
Zichen Wen
Zhong Li
Zeang Sheng
Xuming Hu
Linfeng Zhang
SyDa
114
3
0
18 May 2025
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
Vladislav Mikhailov
Tita Ranveig Enstad
David Samuel
Hans Christian Farsethås
Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
ELM
113
1
0
10 Apr 2025
TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
Sheng Wang
Pengan Chen
Jingqi Zhou
Qintong Li
Jingwei Dong
Jiahui Gao
Boyang Xue
Jiyue Jiang
Dianbo Sui
Chuan Wu
SyDa
114
0
0
21 Mar 2025
Enhancing LLM Knowledge Learning through Generalization
Mingkang Zhu
Xi Chen
Ziyi Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
109
0
0
05 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
109
0
0
02 Mar 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
207
13
0
31 Dec 2024
DELIFT: Data Efficient Language model Instruction Fine Tuning
Ishika Agarwal
Krishnateja Killamsetty
Lucian Popa
Marina Danilevksy
ALM
VLM
129
4
0
07 Nov 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min Lin
MoE
172
54
1
01 Jul 2024
Diversity Measurement and Subset Selection for Instruction Tuning Datasets
Peiqi Wang
Songlin Yang
Zhen Guo
Matt Stallone
Yoon Kim
Polina Golland
Yikang Shen
77
12
0
04 Feb 2024
1