Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.21598
Cited By
Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives
27 May 2025
Yajiao Liu
Congliang Chen
Junchi Yang
Ruoyu Sun
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives"
4 / 4 papers shown
Title
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min Lin
MoE
172
54
1
01 Jul 2024
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Boyao Wang
Dylan Zhang
Hanning Zhang
Xingyuan Pan
Minrui Xu
Jipeng Zhang
Renjie Pi
Xiaoyu Wang
Tong Zhang
135
10
0
28 Jun 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
128
76
0
25 Mar 2024
A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems
Jiawei Zhang
Peijun Xiao
Ruoyu Sun
Zhi-Quan Luo
120
99
0
29 Oct 2020
1