Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.01052
Cited By
v1
v2 (latest)
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation
2 March 2025
Yanzhou Pan
Huawei Lin
Yide Ran
Jiamin Chen
Xiaodong Yu
Weijie Zhao
Denghui Zhang
Zhaozhuo Xu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation"
31 / 31 papers shown
Title
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation
Huawei Lin
Tong Geng
Zhaozhuo Xu
Weijie Zhao
VLM
162
1
0
19 May 2025
Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
Tianyi Bai
Ling Yang
Zhen Hao Wong
Fupeng Sun
Jiahui Peng
...
Lijun Wu
Jiantao Qiu
Wentao Zhang
Binhang Yuan
Conghui He
LLMAG
71
6
0
10 Oct 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
135
908
0
31 Jul 2024
Entropy Law: The Story Behind Data Compression and LLM Performance
Mingjia Yin
Chuhan Wu
Yufei Wang
Hao Wang
Wei Guo
Yasheng Wang
Yong Liu
Ruiming Tang
Defu Lian
Enhong Chen
96
27
0
09 Jul 2024
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Jiangshu Du
Yibo Wang
Wenting Zhao
Zhongfen Deng
Shuaiqi Liu
...
Eduardo Blanco
Yixin Cao
Rui Zhang
Philip S. Yu
Wenpeng Yin
71
34
0
24 Jun 2024
Can LLMs Reason in the Wild with Programs?
Yuan Yang
Siheng Xiong
Ali Payani
Ehsan Shareghi
Faramarz Fekri
LRM
83
16
0
19 Jun 2024
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
Zichun Yu
Spandan Das
Chenyan Xiong
107
36
0
10 Jun 2024
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Sang Keun Choe
Hwijeen Ahn
Juhan Bae
Kewen Zhao
Minsoo Kang
...
Teruko Mitamura
Jeff Schneider
Eduard Hovy
Roger C. Grosse
Eric Xing
TDI
87
44
0
22 May 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Dinesh Manocha
127
58
0
15 Feb 2024
How to Train Data-Efficient LLMs
Noveen Sachdeva
Benjamin Coleman
Wang-Cheng Kang
Jianmo Ni
Lichan Hong
Ed H. Chi
James Caverlee
Julian McAuley
D. Cheng
85
64
0
15 Feb 2024
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia
Sadhika Malladi
Suchin Gururangan
Sanjeev Arora
Danqi Chen
148
242
0
06 Feb 2024
A Survey on Data Selection for LLM Instruction Tuning
Jiahao Wang
Bolin Zhang
Qianlong Du
Jiajun Zhang
Dianhui Chu
74
48
0
04 Feb 2024
A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation
Jiyi Li
92
16
0
18 Jan 2024
Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data
Yiwei Li
Peiwen Yuan
Shaoxiong Feng
Boyuan Pan
Bin Sun
Xinglin Wang
Heda Wang
Kan Li
LRM
66
21
0
20 Dec 2023
AutoDroid: LLM-powered Task Automation in Android
Hao Wen
Yuanchun Li
Guohong Liu
Shanhui Zhao
Tao Yu
Toby Jia-Jun Li
Shiqi Jiang
Yunhao Liu
Yaqin Zhang
Yunxin Liu
101
98
0
29 Aug 2023
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Kushal Tirumala
Daniel Simig
Armen Aghajanyan
Ari S. Morcos
SyDa
52
113
0
23 Aug 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Ming Li
Yong Zhang
Zhitao Li
Jiuhai Chen
Lichang Chen
Ning Cheng
Jianzong Wang
Dinesh Manocha
Jing Xiao
110
211
0
23 Aug 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
123
775
0
01 Jun 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
107
1,303
0
03 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,437
0
27 Feb 2023
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
230
3,782
0
03 Sep 2021
Data Pricing in Machine Learning Pipelines
Zicun Cong
Xuan Luo
J. Pei
Feida Zhu
Yong Zhang
54
48
0
18 Aug 2021
Input Similarity from the Neural Network Perspective
Guillaume Charpiat
N. Girard
Loris Felardos
Y. Tarabalka
90
74
0
10 Feb 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging
Han Guo
Nazneen Rajani
Peter Hase
Joey Tianyi Zhou
Caiming Xiong
TDI
118
116
0
31 Dec 2020
A Survey on Data Pricing: from Economics to Data Science
J. Pei
112
119
0
09 Sep 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
877
42,379
0
28 May 2020
The Early Phase of Neural Network Training
Jonathan Frankle
D. Schwab
Ari S. Morcos
87
174
0
24 Feb 2020
Estimating Training Data Influence by Tracing Gradient Descent
G. Pruthi
Frederick Liu
Mukund Sundararajan
Satyen Kale
TDI
99
417
0
19 Feb 2020
Data Shapley: Equitable Valuation of Data for Machine Learning
Amirata Ghorbani
James Zou
TDI
FedML
78
789
0
05 Apr 2019
Understanding Black-box Predictions via Influence Functions
Pang Wei Koh
Percy Liang
TDI
216
2,905
0
14 Mar 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
338
2,898
0
26 Sep 2016
1