Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.03703
Cited By
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
9 August 2020
Vitaly Feldman
Chiyuan Zhang
TDI
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation"
50 / 193 papers shown
Title
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Shivin Dass
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
Andrew Ilyas
Roberto Martín-Martín
26
0
0
14 May 2025
When Dynamic Data Selection Meets Data Augmentation
Steve Yang
Peng Ye
Furao Shen
Dongzhan Zhou
42
0
0
02 May 2025
Enhancing Interpretability in Generative AI Through Search-Based Data Influence Analysis
Theodoros Aivalis
Iraklis A. Klampanos
Antonis Troumpoukis
Joemon M. Jose
23
0
0
02 Apr 2025
Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models
Alireza Aghabagherloo
Aydin Abadi
Sumanta Sarkar
Vishnu Asutosh Dasu
Bart Preneel
AAML
57
0
0
01 Apr 2025
Geometric Median Matching for Robust k-Subset Selection from Noisy Data
Anish Acharya
Sujay Sanghavi
Alexandros G. Dimakis
Inderjit S Dhillon
AAML
62
0
0
01 Apr 2025
Severing Spurious Correlations with Data Pruning
Varun Mulchandani
Jung-Eun Kim
192
0
0
24 Mar 2025
BLIA: Detect model memorization in binary classification model through passive Label Inference attack
Mohammad Wahiduzzaman Khan
Sheng Chen
Ilya Mironov
Leizhen Zhang
Rabib Noor
52
0
0
17 Mar 2025
Finding the Muses: Identifying Coresets through Loss Trajectories
M. Nagaraj
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
41
0
0
12 Mar 2025
Trustworthy Machine Learning via Memorization and the Granular Long-Tail: A Survey on Interactions, Tradeoffs, and Beyond
Qiongxiu Li
Xiaoyu Luo
Yiyi Chen
Johannes Bjerva
48
0
0
10 Mar 2025
Hebbian learning the local structure of language
P. Myles Eugenio
63
0
0
03 Mar 2025
Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data
Henrik Nolte
Michèle Finck
Kristof Meding
AILaw
PILM
84
0
0
03 Mar 2025
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Matthieu Meeus
Lukas Wutschitz
Santiago Zanella Béguelin
Shruti Tople
Reza Shokri
80
0
0
24 Feb 2025
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min-Bin Lin
Ye Wang
DiffM
TDI
166
43
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
128
2
0
21 Feb 2025
Does Training with Synthetic Data Truly Protect Privacy?
Yunpeng Zhao
Jie Zhang
82
0
0
18 Feb 2025
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Wenhao Wang
Adam Dziedzic
Grace C. Kim
Michael Backes
Franziska Boenisch
93
0
0
11 Feb 2025
Early Stopping Against Label Noise Without Validation Data
Suqin Yuan
Lei Feng
Tongliang Liu
NoLa
104
16
0
11 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAML
LRM
105
9
0
10 Feb 2025
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
43
0
0
09 Feb 2025
Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation
Verna Dankers
Vikas Raunak
VLM
65
0
0
03 Feb 2025
The Silent Majority: Demystifying Memorization Effect in the Presence of Spurious Correlations
Chenyu You
Haocheng Dai
Yifei Min
Jasjeet Sekhon
S. Joshi
James S. Duncan
68
2
0
01 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
96
12
0
31 Dec 2024
Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning
Nidhin Harilal
Amit Rege
Reza Akbarian Bafghi
M. Raissi
C. Monteleoni
TDI
45
0
0
22 Dec 2024
Adaptive Dataset Quantization
Muquan Li
Dongyang Zhang
Qiang Dong
Xiurui Xie
Ke Qin
DD
MQ
88
0
0
22 Dec 2024
Downscaling Precipitation with Bias-informed Conditional Diffusion Model
Ran Lyu
Linhan Wang
Yanshen Sun
Hedanqiu Bai
Chang-Tien Lu
67
0
0
19 Dec 2024
Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification
Zi Yang
Haojin Yang
Soumajit Majumder
Jorge M. Cardoso
Guillermo Gallego
MoMe
VLM
106
1
0
13 Dec 2024
The Pitfalls of Memorization: When Memorization Hurts Generalization
Reza Bayat
Mohammad Pezeshki
Elvis Dohmatob
David Lopez-Paz
Pascal Vincent
OOD
105
3
0
10 Dec 2024
LossVal: Efficient Data Valuation for Neural Networks
Tim Wibiral
Mohamed Karim Belaid
Maximilian Rabus
Ansgar Scherp
TDI
92
0
0
05 Dec 2024
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
Yang Wu
Huayi Zhang
Yizheng Jiao
Lin Ma
Xiaozhong Liu
Jinhong Yu
Dongyu Zhang
Dezhi Yu
Wei Xu
85
1
0
01 Dec 2024
Robust Testing for Deep Learning using Human Label Noise
Gordon Lim
Stefan Larson
Kevin Leach
NoLa
65
0
0
29 Nov 2024
Delta-Influence: Unlearning Poisons via Influence Functions
Wenjie Li
Jiawei Li
Christian Schroeder de Witt
Ameya Prabhu
Amartya Sanyal
TDI
MU
97
0
0
20 Nov 2024
Generalizability of Memorization Neural Networks
Lijia Yu
Xiao-Shan Gao
Lijun Zhang
Yibo Miao
36
1
0
01 Nov 2024
A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective
Yeonsung Jung
Jaeyun Song
J. Yang
Jin-Hwa Kim
Sung-Yub Kim
Eunho Yang
47
0
0
01 Nov 2024
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
34
0
0
29 Oct 2024
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
Jinxu Lin
Linwei Tao
Minjing Dong
Chang Xu
TDI
44
2
0
24 Oct 2024
Publishing Neural Networks in Drug Discovery Might Compromise Training Data Privacy
Fabian P. Krüger
Johan Östman
Lewis H. Mervin
Igor V. Tetko
O. Engkvist
27
0
0
22 Oct 2024
Scalability of memorization-based machine unlearning
Kairan Zhao
Peter Triantafillou
MU
44
2
0
21 Oct 2024
Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark
Thomas George
Pierre Nodet
A. Bondu
Vincent Lemaire
VLM
35
0
0
21 Oct 2024
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Steve Yang
Peng Ye
Wanli Ouyang
Dongzhan Zhou
Furao Shen
29
1
0
15 Oct 2024
Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks
Isha Gupta
Hidde Lycklama
Emanuel Opel
Evan Rose
Anwar Hithnawi
AAML
37
0
0
11 Oct 2024
Machine Unlearning in Forgettability Sequence
Junjie Chen
Qian Chen
Jian Lou
Xiaoyu Zhang
Kai Wu
Zilong Wang
MU
23
0
0
09 Oct 2024
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li
Haoran Xu
Weiting Tan
Kenton Murray
Daniel Khashabi
35
1
0
06 Oct 2024
Permissive Information-Flow Analysis for Large Language Models
Shoaib Ahmed Siddiqui
Radhika Gaonkar
Boris Köpf
David M. Krueger
Andrew J. Paverd
Ahmed Salem
Shruti Tople
Lukas Wutschitz
Menglin Xia
Santiago Zanella Béguelin
36
1
0
04 Oct 2024
Deep Unlearn: Benchmarking Machine Unlearning
Xavier F. Cadet
Anastasia Borovykh
Mohammad Malekzadeh
S. Ahmadi-Abhari
Hamed Haddadi
BDL
MU
37
1
0
02 Oct 2024
Localizing Memorization in SSL Vision Encoders
Wenhao Wang
Adam Dziedzic
Michael Backes
Franziska Boenisch
34
2
0
27 Sep 2024
Predicting and analyzing memorization within fine-tuned Large Language Models
Jérémie Dentan
Davide Buscaldi
A. Shabou
Sonia Vanier
40
0
0
27 Sep 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
43
3
0
25 Sep 2024
Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI
Elisa Nguyen
Johannes Bertram
Evgenii Kortukov
Jean Y. Song
Seong Joon Oh
TDI
383
2
0
25 Sep 2024
Data-centric NLP Backdoor Defense from the Lens of Memorization
Zhenting Wang
Zhizhi Wang
Mingyu Jin
Mengnan Du
Juan Zhai
Shiqing Ma
33
3
0
21 Sep 2024
SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Maying Shen
Nadine Chang
Sifei Liu
Jose M. Alvarez
36
0
0
20 Sep 2024
1
2
3
4
Next