v1v2 (latest)

Adversarial Training for Large Neural Language Models

20 April 2020

Xiaodong Liu

ArXiv (abs)PDF HTML Github (2250★)

Papers citing "Adversarial Training for Large Neural Language Models"

50 / 124 papers shown

Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness

Sajad U P

AAML

364

15 Nov 2025

Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity

...

471

13 Oct 2025

SAGE: A Realistic Benchmark for Semantic Understanding

210

25 Sep 2025

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Qiming Guo

Jinwen Tang

Xingran Huang

186

25 Aug 2025

CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

Jiaming Hu

Haoyu Wang

Debarghya Mukherjee

Ioannis Ch. Paschalidis

AAML

154

19 Aug 2025

PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training

Pengfei Du

AAML

175

14 Jul 2025

Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives

...

514

11 Jun 2025

LLMs are Frequency Pattern Learners in Natural Language Inference

Liang Cheng

Zhaowei Wang

Mark Steedman

244

27 May 2025

Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation

336

03 Apr 2025

Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank AdaptationInternational Conference on Multimedia Retrieval (ICMR), 2024

428

21 Feb 2025

Evaluating Concurrent Robustness of Language Models Across Diverse Challenge SetsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

623

03 Jan 2025

Achieving Domain-Independent Certified Robustness via Knowledge ContinuityNeural Information Processing Systems (NeurIPS), 2024

314

03 Nov 2024

Adversarial Training: A Survey

Lihe Zhang

Baocai Yin

333

19 Oct 2024

Estimating the Probabilities of Rare Outputs in Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Gabriel Wu

Jacob Hilton

AAML UQCV

418

17 Oct 2024

Recent Advances in Attack and Defense Approaches of Large Language Models

387

05 Sep 2024

MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic DialoguesAAAI Conference on Artificial Intelligence (AAAI), 2024

Kuluhan Binici

Abhinav Ramesh Kashyap

Viktor Schlegel

Andy T. Liu

Vijay Prakash Dwivedi

Thanh-Tung Nguyen

Xiaoxue Gao

Nancy F. Chen

Stefan Winkler

264

26 Aug 2024

Detecting and Understanding Vulnerabilities in Language Models via Mechanistic InterpretabilityInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Jorge García-Carrasco

A. Maté

Juan Trujillo

AAML

228

29 Jul 2024

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

Eric Wong

733

21 Jun 2024

Efficient Adversarial Training in LLMs with Continuous Attacks

Stephan Günnemann

429

105

24 May 2024

PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context LearningInternational Conference on Machine Learning (ICML), 2024

Hyeong Kyu Choi

Yixuan Li

342

03 May 2024

Adversarial Attacks and Defense for Conversation Entailment Task

238

01 May 2024

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Stephen Casper

Lennart Schulze

Oam Patel

Dylan Hadfield-Menell

AAML

763

08 Mar 2024

On the Challenges and Opportunities in Generative AI

...

915

28 Feb 2024

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Stephan Gunnemann

505

14 Feb 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

...

428

876

06 Feb 2024

IllusionX: An LLM-powered mixed reality personal companion

213

04 Feb 2024

Building Guardrails for Large Language Models

543

02 Feb 2024

Black-Box Access is Insufficient for Rigorous AI AuditsConference on Fairness, Accountability and Transparency (FAccT), 2024

...

Dylan Hadfield-Menell

AAML

660

148

25 Jan 2024

Fast Adversarial Training against Textual Adversarial Attacks

231

23 Jan 2024

METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

Sangwon Hyun

Mingyu Guo

Muhammad Ali Babar

270

11 Dec 2023

Prompt Optimization via Adversarial In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

512

05 Dec 2023

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the UglyHigh-Confidence Computing (HC), 2023

647

1,051

04 Dec 2023

Improving the Robustness of Transformer-based Large Language Models with Dynamic AttentionNetwork and Distributed System Security Symposium (NDSS), 2023

Yuwen Pu

Xuhong Zhang

216

29 Nov 2023

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

255

20 Nov 2023

Hijacking Large Language Models via Adversarial In-Context Learning

614

16 Nov 2023

Robust Text Classification: Analyzing Prototype-Based NetworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

322

11 Nov 2023

BERT Lost Patience Won't Be Robust to Adversarial SlowdownNeural Information Processing Systems (NeurIPS), 2023

372

29 Oct 2023

Data Optimization in Deep Learning: A SurveyIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

Ou Wu

Rujing Yao

355

25 Oct 2023

VIBE: Topic-Driven Temporal Adaptation for Twitter ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

480

16 Oct 2023

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

George J. Pappas

614

423

05 Oct 2023

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

1.1K

550

19 Sep 2023

SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning

Kiana Kheiri

Hamid Karimi

259

110

16 Jul 2023

A Comprehensive Overview of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Saeed Anwar

Muhammad Usman

1.1K

1,395

12 Jul 2023

MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Zhehua Zhong

Tianyi Chen

Zhen Wang

AAML

148

27 Jun 2023

Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading ComprehensionInternational Conference on Computational Linguistics (COLING), 2023

232

21 Jun 2023

Prompt Injection attack against LLM-integrated Applications

Yi Liu

Kailong Wang

...

Haoyu Wang

Leo Yu Zhang

Yang Liu

SILM

591

641

08 Jun 2023

Toward Adversarial Training on Contextualized Language RepresentationInternational Conference on Learning Representations (ICLR), 2023

171

08 May 2023

USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NERInternational Workshop on Semantic Evaluation (SemEval), 2023

228

04 May 2023

A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions

Mohammad Fraiwan

Natheer Khasawneh

332

29 Apr 2023

CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance DataJournal of Advances in Information Technology (JAIT), 2023

173

08 Feb 2023