Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.12833
Cited By
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
24 August 2023
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities"
37 / 37 papers shown
Title
One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models
Haoran Gu
Handing Wang
Yi Mei
Mengjie Zhang
Yaochu Jin
25
0
0
12 May 2025
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
70
0
0
28 Apr 2025
FDLLM: A Text Fingerprint Detection Method for LLMs in Multi-Language, Multi-Domain Black-Box Environments
Zhiyuan Fu
Junfan Chen
Hongyu Sun
Ting Yang
Ruidong Li
Yuqing Zhang
47
0
0
28 Jan 2025
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
95
3
2
10 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
39
2
0
09 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
78
1
0
09 Oct 2024
Fine-tuning can Help Detect Pretraining Data from Large Language Models
H. Zhang
Songxin Zhang
Bingyi Jing
Hongxin Wei
40
0
0
09 Oct 2024
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method
Weichao Zhang
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Yixing Fan
Xueqi Cheng
32
7
0
23 Sep 2024
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Tianle Gu
Kexin Huang
Ruilin Luo
Yuanqi Yao
Yujiu Yang
Yan Teng
Yingchun Wang
MU
42
4
0
18 Sep 2024
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches
Jamal N. Al-Karaki
Muhammad Al-Zafar Khan
Marwan Omar
34
6
0
11 Sep 2024
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
Cheng Wang
Yiwei Wang
Bryan Hooi
Yujun Cai
Nanyun Peng
Kai-Wei Chang
42
2
0
05 Sep 2024
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Adam Dahlgren Lindstrom
Leila Methnani
Lea Krause
Petter Ericson
Ínigo Martínez de Rituerto de Troya
Dimitri Coelho Mollo
Roel Dobbe
ALM
31
2
0
26 Jun 2024
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study
Jerson Francia
Derek Hansen
Ben Schooley
Matthew Taylor
Shydra Murray
Greg Snow
26
1
0
18 Jun 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices
Shivani Kapania
Ruiyi Wang
Toby Jia-Jun Li
Tianshi Li
Hong Shen
34
7
0
28 Mar 2024
Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation
Aditya Desu
Xuanli He
Qiongkai Xu
Wei Lu
WIGM
24
1
0
23 Feb 2024
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Zhen Xiang
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
31
86
0
19 Feb 2024
An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi
Alisia Marianne Michel
R. Mukkamala
J. Thatcher
SILM
AAML
24
16
0
31 Jan 2024
PROMISE: A Framework for Developing Complex Conversational Interactions (Technical Report)
Wenyuan Wu
Jasmin Heierli
Max Meisterhans
Adrian Moser
Andri Farber
Mateusz Dolata
Elena Gavagnin
Alexandre de Spindler
Gerhard Schwabe
28
0
0
06 Dec 2023
UOR: Universal Backdoor Attacks on Pre-trained Language Models
Wei Du
Peixuan Li
Bo-wen Li
Haodong Zhao
Gongshen Liu
AAML
37
7
0
16 May 2023
Defending against Insertion-based Textual Backdoor Attacks via Attribution
Jiazhao Li
Zhuofeng Wu
Wei Ping
Chaowei Xiao
V. Vydiswaran
40
23
0
03 May 2023
Mitigating Approximate Memorization in Language Models via Dissimilarity Learned Policy
Aly M. Kassem
26
2
0
02 May 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
104
117
0
11 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
289
2,232
0
22 Mar 2023
Susceptibility to Influence of Large Language Models
Lewis D. Griffin
Bennett Kleinberg
Maximilian Mozes
Kimberly T. Mai
Maria Vau
M. Caldwell
Augustine N. Mavor-Parker
47
14
0
10 Mar 2023
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He
Qiongkai Xu
Yi Zeng
Lingjuan Lyu
Fangzhao Wu
Jiwei Li
R. Jia
WaLM
183
71
0
19 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
443
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
311
11,915
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
355
8,457
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
367
0
15 Oct 2021
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
346
0
13 Oct 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,120
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,844
0
18 Apr 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,812
0
14 Dec 2020
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification
Chuanshuai Chen
Jiazhu Dai
SILM
55
126
0
11 Jul 2020
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
208
616
0
03 Sep 2019
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
245
914
0
21 Apr 2018
1