Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22829
Cited By
Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies
28 May 2025
Chenruo Liu
Kenan Tang
Yao Qin
Qi Lei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies"
38 / 38 papers shown
Title
Elastic Representation: Mitigating Spurious Correlations for Group Robustness
Tao Wen
Zihan Wang
Quan Zhang
Qi Lei
43
1
0
17 Feb 2025
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
54
134
0
22 Apr 2024
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Litian Liu
Yao Qin
OODD
60
7
0
02 Nov 2023
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Shihan Dou
...
Xiao Wang
Haoran Huang
Tao Gui
Qi Zhang
Xuanjing Huang
61
15
0
18 Oct 2023
Large-scale Dataset Pruning with Dynamic Uncertainty
Muyang He
Shuo Yang
Tiejun Huang
Bo Zhao
49
28
0
08 Jun 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
...
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
67
227
0
22 Feb 2023
Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
Wanqian Yang
Polina Kirichenko
Micah Goldblum
A. Wilson
DRL
35
11
0
28 Nov 2022
On Feature Learning in the Presence of Spurious Correlations
Pavel Izmailov
Polina Kirichenko
Nate Gruver
A. Wilson
47
126
0
20 Oct 2022
When Does Group Invariant Learning Survive Spurious Correlations?
Yimeng Chen
Ruibin Xiong
Zhiming Ma
Yanyan Lan
OOD
CML
52
22
0
29 Jun 2022
Towards Domain Generalization in Object Detection
Xingxuan Zhang
Zekai Xu
Renzhe Xu
Jiashuo Liu
Peng Cui
Weitao Wan
Chong Sun
Chen Li
ObjD
OOD
41
22
0
27 Mar 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
58
744
0
01 Dec 2021
Towards Principled Disentanglement for Domain Generalization
Hanlin Zhang
Yi-Fan Zhang
Weiyang Liu
Adrian Weller
Bernhard Schölkopf
Eric Xing
OOD
53
114
0
27 Nov 2021
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Yige Li
X. Lyu
Nodens Koren
Lingjuan Lyu
Yue Liu
Xingjun Ma
OnRL
46
327
0
22 Oct 2021
Just Train Twice: Improving Group Robustness without Training Group Information
Emmy Liu
Behzad Haghgoo
Annie S. Chen
Aditi Raghunathan
Pang Wei Koh
Shiori Sagawa
Percy Liang
Chelsea Finn
OOD
51
549
0
19 Jul 2021
Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations
A. Puli
Lily H. Zhang
Eric K. Oermann
Rajesh Ranganath
OOD
OODD
43
49
0
29 Jun 2021
Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization
Kartik Ahuja
Ethan Caballero
Dinghuai Zhang
Jean-Christophe Gagnon-Audet
Yoshua Bengio
Ioannis Mitliagkas
Irina Rish
OOD
25
258
0
11 Jun 2021
Causally motivated Shortcut Removal Using Auxiliary Labels
Maggie Makar
Ben Packer
D. Moldovan
Davis W. Blalock
Yoni Halpern
Alexander DÁmour
OOD
CML
49
72
0
13 May 2021
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
Damien Teney
Ehsan Abbasnejad
Simon Lucey
Anton Van Den Hengel
60
88
0
12 May 2021
Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond
Xuhong Li
Haoyi Xiong
Xingjian Li
Xuanyu Wu
Xiao Zhang
Ji Liu
Jiang Bian
Dejing Dou
AAML
FaML
XAI
HAI
43
324
0
19 Mar 2021
Distribution-Free, Risk-Controlling Prediction Sets
Stephen Bates
Anastasios Nikolas Angelopoulos
Lihua Lei
Jitendra Malik
Michael I. Jordan
OOD
203
190
0
07 Jan 2021
Learning Disentangled Semantic Representation for Domain Adaptation
Ruichu Cai
Zijian Li
Pengfei Wei
Jie Qiao
Kun Zhang
Zhifeng Hao
OOD
DRL
24
128
0
22 Dec 2020
On the Transfer of Disentangled Representations in Realistic Settings
Andrea Dittadi
Frederik Trauble
Francesco Locatello
M. Wuthrich
Vaibhav Agrawal
Ole Winther
Stefan Bauer
Bernhard Schölkopf
OOD
68
82
0
27 Oct 2020
Environment Inference for Invariant Learning
Elliot Creager
J. Jacobsen
R. Zemel
OOD
29
376
0
14 Oct 2020
Large-Scale Methods for Distributionally Robust Optimization
Daniel Levy
Y. Carmon
John C. Duchi
Aaron Sidford
46
212
0
12 Oct 2020
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
202
697
0
10 Oct 2020
BREEDS: Benchmarks for Subpopulation Shift
Shibani Santurkar
Dimitris Tsipras
Aleksander Madry
OOD
31
170
0
11 Aug 2020
Domain Generalization using Causal Matching
Divyat Mahajan
Shruti Tople
Amit Sharma
OOD
45
328
0
12 Jun 2020
Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers
Loc Truong
Chace Jones
Brian Hutchinson
Andrew August
Brenda Praggastis
Robert J. Jasper
Nicole Nichols
Aaron Tuor
AAML
13
50
0
24 Apr 2020
Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift
Rémi Tachet des Combes
Han Zhao
Yu Wang
Geoffrey J. Gordon
OOD
AAML
VLM
38
186
0
10 Mar 2020
Performative Prediction
Juan C. Perdomo
Tijana Zrnic
Celestine Mendler-Dünner
Moritz Hardt
94
313
0
16 Feb 2020
Adversarial Domain Adaptation with Domain Mixup
Minghao Xu
Jian Zhang
Bingbing Ni
Teng Li
Chengjie Wang
Qi Tian
Wenjun Zhang
22
443
0
04 Dec 2019
Stable Learning via Sample Reweighting
Zheyan Shen
Peng Cui
Tong Zhang
Kun Kuang
OOD
26
127
0
28 Nov 2019
Risks from Learned Optimization in Advanced Machine Learning Systems
Evan Hubinger
Chris van Merwijk
Vladimir Mikulik
Joar Skalse
Scott Garrabrant
50
147
0
05 Jun 2019
Detecting and Correcting for Label Shift with Black Box Predictors
Zachary Chase Lipton
Yu Wang
Alex Smola
OOD
23
548
0
12 Feb 2018
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Xinyun Chen
Chang-rui Liu
Yue Liu
Kimberly Lu
D. Song
AAML
SILM
62
1,822
0
15 Dec 2017
Domain-Adversarial Training of Neural Networks
Yaroslav Ganin
E. Ustinova
Hana Ajakan
Pascal Germain
Hugo Larochelle
François Laviolette
M. Marchand
Victor Lempitsky
GAN
OOD
296
9,418
0
28 May 2015
Learning Transferable Features with Deep Adaptation Networks
Mingsheng Long
Yue Cao
Jianmin Wang
Michael I. Jordan
OOD
155
5,163
0
10 Feb 2015
Causal inference using invariant prediction: identification and confidence intervals
J. Peters
Peter Buhlmann
N. Meinshausen
OOD
60
961
0
06 Jan 2015
1