Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.00913
Cited By
Self-Influence Guided Data Reweighting for Language Model Pre-training
2 November 2023
Megh Thakkar
Tolga Bolukbasi
Sriram Ganapathy
Shikhar Vashishth
Sarath Chandar
Partha P. Talukdar
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Influence Guided Data Reweighting for Language Model Pre-training"
8 / 8 papers shown
Title
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
42
0
0
01 Apr 2025
RELexED: Retrieval-Enhanced Legal Summarization with Exemplar Diversity
T. Y. S. S. Santosh
Chen Jia
Patrick Goroncy
Matthias Grabmair
AILaw
44
1
0
23 Jan 2025
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Jialiang Cheng
Ning Gao
Yun Yue
Zhiling Ye
Jiadi Jiang
Jian Sha
OffRL
77
0
0
10 Dec 2024
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
52
4
0
09 Oct 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
71
40
1
01 Jul 2024
Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs
Kelvin Guu
Albert Webson
Ellie Pavlick
Lucas Dixon
Ian Tenney
Tolga Bolukbasi
TDI
68
33
0
14 Mar 2023
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
227
103
0
27 Oct 2022
Mitigating Dataset Bias by Using Per-sample Gradient
Sumyeong Ahn
Seongyoon Kim
Se-Young Yun
45
20
0
31 May 2022
1