ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.15685
  4. Cited By
What Makes Good Data for Alignment? A Comprehensive Study of Automatic
  Data Selection in Instruction Tuning

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

25 December 2023
Wei Liu
Weihao Zeng
Keqing He
Yong Jiang
Junxian He
    ALM
ArXivPDFHTML

Papers citing "What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning"

50 / 172 papers shown
Title
Learning from "Silly" Questions Improves Large Language Models, But Only
  Slightly
Learning from "Silly" Questions Improves Large Language Models, But Only Slightly
Tingyuan Zhu
Shudong Liu
Yidong Wang
Derek F. Wong
Han Yu
T. Shinozaki
Jindong Wang
ALM
LRM
82
0
0
21 Nov 2024
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction
  Tuning
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning
Hang Zhou
Yehui Tang
Haochen Qin
Yujie Yang
Renren Jin
Deyi Xiong
Kai Han
Yunhe Wang
59
2
0
21 Nov 2024
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from
  Human Demonstrations
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
Gaurav Verma
Rachneet Kaur
Nishan Srishankar
Zhen Zeng
T. Balch
Manuela Veloso
LLMAG
72
5
0
20 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yunfan LU
Hongyu Lin
ALM
86
2
0
18 Nov 2024
Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Yutao Hou
Yajing Luo
Zhiwen Ruan
H. Wang
Weifeng Ge
Yuxiao Chen
Guanhua Chen
ELM
47
0
0
15 Nov 2024
Efficient Alignment of Large Language Models via Data Sampling
Efficient Alignment of Large Language Models via Data Sampling
Amrit Khera
Rajat Ghosh
Debojyoti Dutta
36
1
0
15 Nov 2024
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
ALM
56
5
0
11 Nov 2024
Active Preference-based Learning for Multi-dimensional Personalization
Active Preference-based Learning for Multi-dimensional Personalization
Minhyeon Oh
Seungjoon Lee
Jungseul Ok
31
1
0
01 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
69
6
0
31 Oct 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
72
4
0
30 Oct 2024
A Lightweight Multi Aspect Controlled Text Generation Solution For Large
  Language Models
A Lightweight Multi Aspect Controlled Text Generation Solution For Large Language Models
Chenyang Zhang
Jiayi Lin
Haibo Tong
Bingxuan Hou
Dongyu Zhang
Jialin Li
Junli Wang
31
0
0
18 Oct 2024
Mastering the Craft of Data Synthesis for CodeLLMs
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen
Philip Arthur
Qianyu Feng
Cong Duy Vu Hoang
Yu-Heng Hong
...
Mark Johnson
Kemal Kurniawan
Don Dharmasiri
Long Duong
Yuan-Fang Li
SyDa
60
1
0
16 Oct 2024
TSDS: Data Selection for Task-Specific Model Finetuning
TSDS: Data Selection for Task-Specific Model Finetuning
Zifan Liu
Amin Karbasi
Theodoros Rekatsinas
34
4
0
15 Oct 2024
Data Quality Control in Federated Instruction-tuning of Large Language Models
Data Quality Control in Federated Instruction-tuning of Large Language Models
Yaxin Du
Guangyi Liu
Fengting Yuchi
W. Zhao
Jingjing Qu
Yanjie Wang
Siheng Chen
ALM
FedML
56
0
0
15 Oct 2024
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary
  Space with Tree Search
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
Chenglin Li
Qianglong Chen
Zhi Li
Feng Tao
Yicheng Li
Hao Chen
Fei Yu
Yin Zhang
SyDa
33
0
0
14 Oct 2024
3DS: Decomposed Difficulty Data Selection's Case Study on LLM Medical
  Domain Adaptation
3DS: Decomposed Difficulty Data Selection's Case Study on LLM Medical Domain Adaptation
Hongxin Ding
Yue Fang
Runchuan Zhu
Xinke Jiang
Jinyang Zhang
Yongxin Xu
Xu Chu
Junfeng Zhao
Yasha Wang
30
0
0
13 Oct 2024
Toward General Instruction-Following Alignment for Retrieval-Augmented
  Generation
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Guanting Dong
Xiaoshuai Song
Yichen Zhu
Runqi Qiao
Zhicheng Dou
Zhicheng Dou
3DV
92
4
0
12 Oct 2024
Rethinking Data Selection at Scale: Random Selection is Almost All You
  Need
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
Tingyu Xia
Bowen Yu
K. Dang
An Yang
Yuan Wu
Yuan Tian
Yi-Ju Chang
Junyang Lin
ALM
54
5
0
12 Oct 2024
Modeling User Preferences with Automatic Metrics: Creating a
  High-Quality Preference Dataset for Machine Translation
Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
Sweta Agrawal
José G. C. de Souza
Ricardo Rei
António Farinhas
Gonçalo Faria
Patrick Fernandes
Nuno M. Guerreiro
Andre Martins
32
5
0
10 Oct 2024
Evolutionary Contrastive Distillation for Language Model Alignment
Evolutionary Contrastive Distillation for Language Model Alignment
Julian Katz-Samuels
Zheng Li
Hyokun Yun
Priyanka Nigam
Yi Xu
Vaclav Petricek
Bing Yin
Trishul Chilimbi
ALM
SyDa
31
0
0
10 Oct 2024
DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing
  with Language Models
DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models
Ranchi Zhao
Zhen Leng Thai
Yifan Zhang
Shengding Hu
Yunqi Ba
Jie Zhou
Jie Cai
Zhiyuan Liu
Maosong Sun
41
1
0
08 Oct 2024
Data Advisor: Dynamic Data Curation for Safety Alignment of Large
  Language Models
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Fei Wang
Ninareh Mehrabi
Palash Goyal
Rahul Gupta
Kai-Wei Chang
Aram Galstyan
ALM
45
1
0
07 Oct 2024
HyperINF: Unleashing the HyperPower of the Schulz's Method for Data
  Influence Estimation
HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Xinyu Zhou
Simin Fan
Martin Jaggi
TDI
31
0
0
07 Oct 2024
CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text
CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text
Jun Hirako
Ryohei Sasano
Koichi Takeda
37
2
0
06 Oct 2024
Intelligence at the Edge of Chaos
Intelligence at the Edge of Chaos
Sifan Wang
Aakash Patel
S. Rizvi
Nianchen Liu
Shiyang Zhang
Amin Karbasi
E. Zappala
David van Dijk
28
3
0
03 Oct 2024
FactAlign: Long-form Factuality Alignment of Large Language Models
FactAlign: Long-form Factuality Alignment of Large Language Models
Chao-Wei Huang
Yun-Nung Chen
HILM
30
2
0
02 Oct 2024
Speculative Coreset Selection for Task-Specific Fine-tuning
Speculative Coreset Selection for Task-Specific Fine-tuning
Xiaoyu Zhang
Juan Zhai
Shiqing Ma
Chao Shen
Tianlin Li
Weipeng Jiang
Yang Liu
30
2
0
02 Oct 2024
Instruction Embedding: Latent Representations of Instructions Towards
  Task Identification
Instruction Embedding: Latent Representations of Instructions Towards Task Identification
Yiwei Li
Jiayi Shi
Shaoxiong Feng
Peiwen Yuan
Xinglin Wang
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
33
2
0
29 Sep 2024
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending
  Against Prompt Injection Attacks
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks
Rongchang Li
Minjie Chen
Chang Hu
Han Chen
Wenpeng Xing
Meng Han
SILM
ELM
39
1
0
29 Sep 2024
Ruler: A Model-Agnostic Method to Control Generated Length for Large
  Language Models
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
Jiaming Li
Lei Zhang
Yunshui Li
Ziqiang Liu
Yuelin Bai
Run Luo
Longze Chen
Min Yang
ALM
30
0
0
27 Sep 2024
Diversify and Conquer: Diversity-Centric Data Selection with Iterative
  Refinement
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Simon Yu
Liangyu Chen
Sara Ahmadian
Marzieh Fadaee
34
7
0
17 Sep 2024
Eir: Thai Medical Large Language Models
Eir: Thai Medical Large Language Models
Yutthakorn Thiprak
Rungtam Ngodngamthaweesuk
Songtam Ngodngamtaweesuk
LM&MA
ELM
43
0
0
13 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
45
12
0
11 Sep 2024
Beyond IID: Optimizing Instruction Learning from the Perspective of
  Instruction Interaction and Dependency
Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency
hanyu Zhao
Li Du
Yiming Ju
Chengwei Wu
Tengfei Pan
27
5
0
11 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with
  High-Quality Data
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
Yejie Wang
Keqing He
Dayuan Fu
Zhuoma Gongque
Heyang Xu
...
Muxi Diao
Jingang Wang
Hao Fei
Xunliang Cai
Weiran Xu
ALM
SyDa
43
3
0
05 Sep 2024
Leveraging Open Knowledge for Advancing Task Expertise in Large Language
  Models
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
Yuncheng Yang
Yulei Qin
Tong Wu
Zihan Xu
Gang Li
...
Yuchen Shi
Ke Li
Xing Sun
Jie Yang
Yun Gu
ALM
OffRL
MoE
54
0
0
28 Aug 2024
REInstruct: Building Instruction Data from Unlabeled Corpus
REInstruct: Building Instruction Data from Unlabeled Corpus
Shu Chen
Xinyan Guan
Yunfan LU
Hongyu Lin
Xianpei Han
Le Sun
ALM
SyDa
22
2
0
20 Aug 2024
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Jiancheng Dong
Lei Jiang
Wei Jin
Lu Cheng
44
1
0
18 Aug 2024
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative
  Self-Enhancement Paradigm
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
Yiming Liang
Ge Zhang
Xingwei Qu
Tianyu Zheng
Jiawei Guo
...
Jiaheng Liu
Chenghua Lin
Lei Ma
Wenhao Huang
Jiajun Zhang
ALM
51
5
0
15 Aug 2024
Reciprocal Learning
Reciprocal Learning
Julian Rodemann
Christoph Jansen
G. Schollmeyer
FedML
40
0
0
12 Aug 2024
Better Alignment with Instruction Back-and-Forth Translation
Better Alignment with Instruction Back-and-Forth Translation
Thao Nguyen
Jeffrey Li
Sewoong Oh
Ludwig Schmidt
Jason Weston
Luke Zettlemoyer
Xian Li
SyDa
38
6
0
08 Aug 2024
CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs
CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs
Weijie Lv
Xuan Xia
Sheng-Jun Huang
ALM
36
3
0
05 Aug 2024
FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs
  Only
FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only
He Zhu
Junyou Su
Tianle Lun
Yicheng Tao
Wenjia Zhang
Zipei Fan
Guanhua Chen
ALM
37
2
0
02 Aug 2024
Entropy Law: The Story Behind Data Compression and LLM Performance
Entropy Law: The Story Behind Data Compression and LLM Performance
Mingjia Yin
Chuhan Wu
Yufei Wang
Hao Wang
Wei Guo
Yasheng Wang
Yong Liu
Ruiming Tang
Defu Lian
Enhong Chen
42
19
0
09 Jul 2024
LIONs: An Empirically Optimized Approach to Align Language Models
LIONs: An Empirically Optimized Approach to Align Language Models
Xiao Yu
Qingyang Wu
Yu Li
Zhou Yu
ALM
40
3
0
09 Jul 2024
Code Less, Align More: Efficient LLM Fine-tuning for Code Generation
  with Data Pruning
Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning
Yun-Da Tsai
Mingjie Liu
Haoxing Ren
SyDa
31
9
0
06 Jul 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
74
41
1
01 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Zhiyu Li
Linpeng Tang
Weinan E
50
12
0
01 Jul 2024
Understand What LLM Needs: Dual Preference Alignment for
  Retrieval-Augmented Generation
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Guanting Dong
Yutao Zhu
Chenghao Zhang
Zechen Wang
Zhicheng Dou
Ji-Rong Wen
RALM
44
10
0
26 Jun 2024
Previous
1234
Next