Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.14042
Cited By
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
20 May 2025
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
Author Contacts:
kumano@cvm.t.u-tokyo.ac.jp
kera@chiba-u.jp
yamasaki@cvm.t.u-tokyo.ac.jp
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adversarially Pretrained Transformers may be Universally Robust In-Context Learners"
50 / 86 papers shown
Title
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models
Xiaojun Jia
Sensen Gao
Simeng Qin
Ke Ma
Xianrui Li
Yihao Huang
Wei Dong
Yang Liu
Xiaochun Cao
AAML
VLM
103
2
0
17 Mar 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
115
2
0
24 Feb 2025
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Shaopeng Fu
Liang Ding
Di Wang
Di Wang
106
4
0
06 Feb 2025
Adversarial Prompt Distillation for Vision-Language Models
Lin Luo
Xin Wang
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu-Gang Jiang
AAML
VLM
135
4
0
22 Nov 2024
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
Usman Anwar
Johannes Von Oswald
Louis Kirsch
David M. Krueger
Spencer Frei
SILM
AAML
44
9
0
07 Nov 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
115
20
0
10 Oct 2024
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Spencer Frei
Gal Vardi
MLT
63
6
0
02 Oct 2024
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
Xin Wang
Kai-xiang Chen
Xingjun Ma
Zhineng Chen
Jingjing Chen
Yu-Gang Jiang
AAML
95
4
0
04 Aug 2024
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
70
25
0
30 May 2024
Revisiting the Robust Generalization of Adversarial Prompt Tuning
Fan Yang
Mingxuan Xia
Sangzhou Xia
Chicheng Ma
Hui Hui
VPVLM
VLM
44
3
0
18 May 2024
Few-Shot Adversarial Prompt Learning on Vision-Language Models
Yiwei Zhou
Xiaobo Xia
Zhiwei Lin
Bo Han
Tongliang Liu
VLM
85
16
0
21 Mar 2024
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou
Liu Yang
Tianhao Wang
Dimitris Papailiopoulos
Jason D. Lee
65
20
0
05 Mar 2024
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li
Haoyan Guan
Jianing Qiu
Michael W. Spratling
AAML
VLM
VPVLM
84
24
0
04 Mar 2024
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang
Jie Zhang
Zheng Yuan
Shiguang Shan
VLM
60
24
0
09 Jan 2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
49
41
0
11 Dec 2023
Scalable Extraction of Training Data from (Production) Language Models
Milad Nasr
Nicholas Carlini
Jonathan Hayase
Matthew Jagielski
A. Feder Cooper
Daphne Ippolito
Christopher A. Choquette-Choo
Eric Wallace
Florian Tramèr
Katherine Lee
SILM
63
352
0
28 Nov 2023
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang
Xingjun Ma
Xin Wang
Lingyu Qiu
Jiaqi Wang
Yu-Gang Jiang
Jitao Sang
AAML
VPVLM
VLM
65
22
0
19 Nov 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
72
48
0
12 Oct 2023
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
Xiaogeng Liu
Nan Xu
Muhao Chen
Chaowei Xiao
SILM
77
324
0
03 Oct 2023
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
68
64
0
11 Sep 2023
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Satoshi Suzuki
Shin'ya Yamaguchi
Shoichiro Takeda
Sekitoshi Kanai
Naoki Makishima
Atsushi Ando
Ryo Masumura
AAML
70
5
0
31 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,455
0
27 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
63
101
0
07 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
200
970
0
05 Jul 2023
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Jonathan Lee
Annie Xie
Aldo Pacchiano
Yash Chandak
Chelsea Finn
Ofir Nachum
Emma Brunskill
OffRL
102
82
0
26 Jun 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
79
201
0
16 Jun 2023
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
50
193
0
07 Jun 2023
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
86
170
0
01 Jun 2023
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization
Yufeng Zhang
Fengzhuo Zhang
Zhuoran Yang
Zhaoran Wang
BDL
72
74
0
30 May 2023
Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
Suraj Srinivas
Sebastian Bordt
Hima Lakkaraju
AAML
65
12
0
30 May 2023
The Learnability of In-Context Learning
Noam Wies
Yoav Levine
Amnon Shashua
177
106
0
14 Mar 2023
Larger language models do in-context learning differently
Jerry W. Wei
Jason W. Wei
Yi Tay
Dustin Tran
Albert Webson
...
Xinyun Chen
Hanxiao Liu
Da Huang
Denny Zhou
Tengyu Ma
ReLM
LRM
97
372
0
07 Mar 2023
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking
Chang-Shu Liu
Yinpeng Dong
Wenzhao Xiang
Xiaohu Yang
Hang Su
Junyi Zhu
YueFeng Chen
Yuan He
H. Xue
Shibao Zheng
OOD
VLM
AAML
83
79
0
28 Feb 2023
Large Language Models Can Be Easily Distracted by Irrelevant Context
Freda Shi
Xinyun Chen
Kanishka Misra
Nathan Scales
David Dohan
Ed H. Chi
Nathanael Scharli
Denny Zhou
ReLM
RALM
LRM
101
587
0
31 Jan 2023
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
Damai Dai
Yutao Sun
Li Dong
Y. Hao
Shuming Ma
Zhifang Sui
Furu Wei
LRM
76
168
0
20 Dec 2022
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
110
489
0
15 Dec 2022
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models
Chengzhi Mao
Scott Geng
Junfeng Yang
Xin Eric Wang
Carl Vondrick
VLM
84
69
0
14 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
100
487
0
28 Nov 2022
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
99
438
0
17 Nov 2022
A Light Recipe to Train Robust Vision Transformers
Edoardo Debenedetti
Vikash Sehwag
Prateek Mittal
ViT
79
71
0
15 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
139
506
0
01 Aug 2022
Towards Efficient Adversarial Training on Vision Transformers
Boxi Wu
Jindong Gu
Zhifeng Li
Deng Cai
Xiaofei He
Wei Liu
ViT
AAML
72
39
0
21 Jul 2022
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
121
628
0
15 Feb 2022
Are Transformers More Robust Than CNNs?
Yutong Bai
Jieru Mei
Alan Yuille
Cihang Xie
ViT
AAML
244
263
0
10 Nov 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
85
79
0
06 Oct 2021
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques
Shiyu Tang
Ruihao Gong
Yan Wang
Aishan Liu
Jiakai Wang
...
Xianglong Liu
Basel Alomair
Alan Yuille
Philip Torr
Dacheng Tao
VLM
AAML
41
108
0
11 Sep 2021
Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks
Fan Liu
Shuyu Zhao
Xuelong Dai
Bin Xiao
VLM
63
8
0
22 Jun 2021
Reveal of Vision Transformers Robustness against Adversarial Attacks
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
ViT
43
60
0
07 Jun 2021
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
321
649
0
21 May 2021
Vision Transformers are Robust Learners
Sayak Paul
Pin-Yu Chen
ViT
61
311
0
17 May 2021
1
2
Next