ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09739
  4. Cited By
QuRating: Selecting High-Quality Data for Training Language Models

QuRating: Selecting High-Quality Data for Training Language Models

15 February 2024
Alexander Wettig
Aatmik Gupta
Saumya Malik
Danqi Chen
ArXivPDFHTML

Papers citing "QuRating: Selecting High-Quality Data for Training Language Models"

12 / 12 papers shown
Title
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
Xinlin Zhuang
Jiahui Peng
Ren Ma
Yucheng Wang
Tianyi Bai
Xingjian Wei
Jiantao Qiu
Chi Zhang
Ying Qian
Conghui He
92
0
0
19 Apr 2025
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum
Yuanmin Huang
Hongjian Zou
Qi Ding
Yixuan Liao
Xiao Chen
Qian Liu
Junxian He
115
4
0
02 Mar 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Yansen Wang
Yichun Yin
Yijiao Wang
Lifeng Shang
Qiang Liu
LRM
98
3
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
155
14
0
17 Feb 2025
Weak-to-Strong Generalization Through the Data-Centric Lens
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin
John Cooper
Frederic Sala
139
8
0
05 Dec 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Elyas Obbad
Iddah Mlauzi
Brando Miranda
Rylan Schaeffer
Kamal Obbad
Suhana Bedi
Sanmi Koyejo
CVBM
89
0
0
23 Oct 2024
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan
Rylan Schaeffer
Apratim Dey
Matthias Gerstgrasser
Rafael Rafailov
D. Donoho
Sanmi Koyejo
84
17
0
22 Oct 2024
Compute-Constrained Data Selection
Compute-Constrained Data Selection
Junjie Oscar Yin
Alexander M. Rush
96
1
0
21 Oct 2024
Reverse Modeling in Large Language Models
Reverse Modeling in Large Language Models
S. Yu
Yuanchen Xu
Cunxiao Du
Yanying Zhou
Minghui Qiu
Q. Sun
Hao Zhang
Jiawei Wu
116
2
0
13 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
139
1
0
09 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
129
45
0
03 Oct 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
161
29
0
10 Sep 2024
1