Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10430
Cited By
Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models
16 February 2024
Dheeraj Mekala
Alex Nguyen
Jingbo Shang
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models"
14 / 14 papers shown
Title
ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Haoran Gu
Handing Wang
Yi Mei
Mengjie Zhang
Yaochu Jin
27
1
0
23 Apr 2025
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
Jingtian Yan
Zhifei Li
William Kang
Yulun Zhang
Stephen Smith
Jiaoyang Li
48
0
0
03 Mar 2025
MergeIT: From Selection to Merging for Efficient Instruction Tuning
Hongyi Cai
Yuqian Fu
Hongming Fu
Bo Zhao
MoMe
53
0
0
25 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
117
3
0
06 Feb 2025
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Qirun Dai
Dylan Zhang
Jiaqi W. Ma
Hao Peng
TDI
55
1
0
21 Jan 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
Yang Wu
Huayi Zhang
Yizheng Jiao
Lin Ma
Xiaozhong Liu
Jinhong Yu
Dongyu Zhang
Dezhi Yu
Wei Xu
85
1
0
01 Dec 2024
Learning from "Silly" Questions Improves Large Language Models, But Only Slightly
Tingyuan Zhu
Shudong Liu
Yidong Wang
Derek F. Wong
Han Yu
T. Shinozaki
Jindong Wang
ALM
LRM
79
0
0
21 Nov 2024
DFlow: Diverse Dialogue Flow Simulation with Large Language Models
Wanyu Du
Song Feng
James Gung
Lijia Sun
Yi Zhang
Saab Mansour
Yanjun Qi
52
0
0
18 Oct 2024
IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection
Jielin Song
Siyu Liu
Bin Zhu
Yanghui Rao
30
2
0
17 Oct 2024
Language Model-Driven Data Pruning Enables Efficient Active Learning
Abdul Hameed Azeemi
I. Qazi
Agha Ali Raza
VLM
33
1
0
05 Oct 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
74
40
1
01 Jul 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
43
8
0
16 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
43
24
0
30 May 2024
I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
Xuan Ren
Biao Wu
Lingqiao Liu
33
5
0
17 Feb 2024
1