Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.12095
Cited By
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
22 February 2023
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
Yidong Wang
Linyi Yang
Haojun Huang
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective"
50 / 168 papers shown
Title
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
141
0
0
04 May 2025
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
Omid Veisi
Sasan Bahrami
Roman Englert
Claudia Müller
117
0
0
25 Apr 2025
ConceptFormer: Towards Efficient Use of Knowledge-Graph Embeddings in Large Language Models
Joel Barmettler
Abraham Bernstein
Luca Rossetto
KELM
3DV
44
0
0
10 Apr 2025
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAML
ELM
40
0
0
10 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
21
0
0
07 Apr 2025
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
49
0
0
03 Apr 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Qiang Wang
Dawei Feng
Xu Zhang
Ao Shen
Yang Xu
Bo Ding
H. Wang
AAML
46
0
0
31 Mar 2025
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji
Tatsuya Hiraoka
Yuchang Cheng
Eiji Aramaki
Tomoya Iwakura
74
0
0
27 Feb 2025
Data-Efficient Multi-Agent Spatial Planning with LLMs
Huangyuan Su
Aaron Walsman
Daniel Garces
Sham Kakade
Stephanie Gil
LLMAG
Presented at
ResearchTrend Connect | LLMAG
on
28 Mar 2025
138
0
0
26 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
69
0
0
24 Feb 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
63
2
0
24 Feb 2025
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
ELM
60
2
0
18 Feb 2025
Learning from Mistakes: Self-correct Adversarial Training for Chinese Unnatural Text Correction
Xuan Feng
T. Gu
Xiaoli Liu
L. Chang
34
1
0
23 Dec 2024
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
106
1
0
20 Dec 2024
Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script
Xi Cao
Dolma Dawa
Nuo Qun
Trashi Nyima
AAML
89
3
0
03 Dec 2024
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
T. T. Wang
John Hughes
Henry Sleight
Rylan Schaeffer
Rajashree Agrawal
Fazl Barez
Mrinank Sharma
Jesse Mu
Nir Shavit
Ethan Perez
AAML
87
4
0
03 Dec 2024
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
72
2
0
01 Dec 2024
Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation
Saiful Islam Salim
Rubin Yuchan Yang
Alexander Cooper
Suryashree Ray
Saumya Debray
Sazzadur Rahaman
AAML
44
0
0
12 Oct 2024
A Survey on the Honesty of Large Language Models
Siheng Li
Cheng Yang
Taiqiang Wu
Chufan Shi
Yuji Zhang
...
Jie Zhou
Yujiu Yang
Ngai Wong
Xixin Wu
Wai Lam
HILM
32
4
0
27 Sep 2024
Towards Building a Robust Knowledge Intensive Question Answering Model with Large Language Models
Xingyun Hong
Yan Shao
Zhilin Wang
Manni Duan
Jin Xiongnan
42
0
0
09 Sep 2024
DIAGen: Diverse Image Augmentation with Generative Models
Tobias Lingenberg
Markus Reuter
Gopika Sudhakaran
Dominik Gojny
Stefan Roth
Simone Schaub-Meyer
DiffM
28
3
0
26 Aug 2024
Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning
Simret Araya Gebreegziabher
Kuangshi Ai
Zheng Zhang
Elena L. Glassman
T. Li
32
4
0
07 Aug 2024
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference
Ke Shen
M. Kejriwal
32
0
0
04 Aug 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
35
2
0
19 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
50
5
0
11 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
27
27
0
04 Jul 2024
Systematic Task Exploration with LLMs: A Study in Citation Text Generation
Furkan Şahinuç
Ilia Kuznetsov
Yufang Hou
Iryna Gurevych
27
3
0
04 Jul 2024
What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks
Chengrui Huang
Zhengliang Shi
Yuntao Wen
Xiuying Chen
Peng Han
Shen Gao
Shuo Shang
34
1
0
03 Jul 2024
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
Chuanpeng Yang
Wang Lu
Yao Zhu
Yidong Wang
Qian Chen
Chenlong Gao
Bingjie Yan
Yiqiang Chen
ALM
KELM
44
22
0
02 Jul 2024
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations
Junkai Chen
Zhenhao Li
Xing Hu
Xin Xia
AAML
44
7
0
28 Jun 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
34
6
0
26 Jun 2024
Evidence of a log scaling law for political persuasion with large language models
Kobi Hackenburg
Ben M. Tappin
Paul Röttger
Scott Hale
Jonathan Bright
Helen Z. Margetts
34
7
0
20 Jun 2024
Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization
Niyati Bafna
Kenton Murray
David Yarowsky
60
2
0
19 Jun 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRM
AAML
ELM
27
1
0
16 Jun 2024
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
Zhenyu Zhang
Bingguang Hao
Jinpeng Li
Zekai Zhang
Dongyan Zhao
31
0
0
16 Jun 2024
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
Lina Wang
42
1
0
16 Jun 2024
On the Worst Prompt Performance of Large Language Models
Bowen Cao
Deng Cai
Zhisong Zhang
Yuexian Zou
Wai Lam
ALM
LRM
30
5
0
08 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
40
16
0
08 Jun 2024
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Alina Leidinger
R. Rooij
Ekaterina Shutova
LRM
26
3
0
05 Jun 2024
TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability
Fengji Ma
Li Liu
Hei Victor Cheng
VLM
33
0
0
27 May 2024
Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval
Mengjia Niu
Hao Li
Jie Shi
Hamed Haddadi
Fan Mo
HILM
45
10
0
10 May 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu
Hayoung Jung
Anjali Singh
Monojit Choudhury
Tanushree Mitra
32
8
0
08 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
37
36
0
06 May 2024
Assessing and Verifying Task Utility in LLM-Powered Applications
Negar Arabzadeh
Siging Huo
Nikhil Mehta
Qinqyun Wu
Chi Wang
Ahmed Hassan Awadallah
Charles L. A. Clarke
Julia Kiseleva
35
10
0
03 May 2024
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
59
14
0
25 Apr 2024
Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey
Naveen Karunanayake
Ravin Gunawardena
Suranga Seneviratne
Sanjay Chawla
OOD
43
5
0
08 Apr 2024
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Simone Tedeschi
Felix Friedrich
P. Schramowski
Kristian Kersting
Roberto Navigli
Huu Nguyen
Bo Li
ELM
38
45
0
06 Apr 2024
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks
Brian Formento
Wenjie Feng
Chuan-Sheng Foo
Anh Tuan Luu
See-Kiong Ng
AAML
32
6
0
27 Mar 2024
ChatGPT Role-play Dataset: Analysis of User Motives and Model Naturalness
Sabrina Bodmer
Ameeta Agrawal
Judit Dombi
Tetyana Sydorenko
Jung In Lee
21
4
0
26 Mar 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
William James Bolton
Rafael Poyiadzi
Edward R. Morrell
Gabriela van Bergen Gonzalez Bueno
Lea Goetz
37
2
0
21 Mar 2024
1
2
3
4
Next