Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.07847
Cited By
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
15 August 2023
Yugeng Liu
Tianshuo Cong
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models"
13 / 13 papers shown
Title
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
38
2
0
14 Mar 2025
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Shuyang Hao
Bryan Hooi
J. Liu
Kai-Wei Chang
Zi Huang
Yujun Cai
AAML
92
0
0
27 Nov 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
129
118
0
09 Nov 2023
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
107
136
0
03 Nov 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
On the Risk of Misinformation Pollution with Large Language Models
Yikang Pan
Liangming Pan
Wenhu Chen
Preslav Nakov
Min-Yen Kan
W. Wang
DeLMO
192
110
0
23 May 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
218
299
0
26 Apr 2023
MGTBench: Benchmarking Machine-Generated Text Detection
Xinlei He
Xinyue Shen
Z. Chen
Michael Backes
Yang Zhang
DeLMO
58
100
0
26 Mar 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
152
391
0
15 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
245
914
0
21 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
196
711
0
17 Apr 2018
1