Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.06807
Cited By
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
15 September 2020
Kris McGuffie
Alex Newhouse
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Radicalization Risks of GPT-3 and Advanced Neural Language Models"
50 / 74 papers shown
Title
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models
S. Tong
Eliott Zemour
Rawisara Lohanimit
Lalana Kagal
65
0
0
02 Dec 2024
Boardwalk Empire: How Generative AI is Revolutionizing Economic Paradigms
Subramanyam Sahoo
Kamlesh Dutta
33
1
0
19 Oct 2024
GenAI Advertising: Risks of Personalizing Ads with LLMs
Brian Tang
Kaiwen Sun
Noah T. Curran
F. Schaub
Kang G. Shin
SILM
29
2
0
23 Sep 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
53
42
0
01 Aug 2024
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Feiyang Kang
H. Just
Yifan Sun
Himanshu Jahagirdar
Yuanzhi Zhang
Rongxing Du
Anit Kumar Sahu
Ruoxi Jia
56
18
0
05 May 2024
Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale
Freddy Heppell
M. Bakir
Kalina Bontcheva
DeLMO
33
1
0
13 Feb 2024
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
49
74
0
30 Jan 2024
Killer Apps: Low-Speed, Large-Scale AI Weapons
Philip G. Feldman
Aaron Dant
James R. Foulds
26
2
0
14 Jan 2024
A Group Fairness Lens for Large Language Models
Guanqun Bi
Lei Shen
Yuqiang Xie
Yanan Cao
Tiangang Zhu
Xiao-feng He
ALM
34
4
0
24 Dec 2023
PathFinder: Guided Search over Multi-Step Reasoning Paths
O. Yu. Golovneva
Sean O'Brien
Ramakanth Pasunuru
Tianlu Wang
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
LRM
27
7
0
08 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
7
0
21 Nov 2023
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu
Jialu Wang
Hao Cheng
Yang Liu
29
15
0
19 Nov 2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Wei Ping
Jinyuan Jia
Bo Li
Radha Poovendran
AAML
21
19
0
07 Nov 2023
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
Hsin-Min Wang
32
5
0
05 Nov 2023
From Text to Source: Results in Detecting Large Language Model-Generated Content
Wissam Antoun
Benoît Sagot
Djamé Seddah
DeLMO
33
11
0
23 Sep 2023
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Jonathan Pei
Kevin Kaichuang Yang
Dan Klein
40
21
0
06 Jul 2023
AI could create a perfect storm of climate misinformation
V. Galaz
Hannah Metzler
Stefan Daume
A. Olsson
B. Lindström
A. Marklund
23
5
0
22 Jun 2023
Deceptive AI Ecosystems: The Case of ChatGPT
Xiao Zhan
Yifan Xu
Stefan Sarkadi
SILM
34
21
0
18 Jun 2023
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?
Wissam Antoun
Virginie Mouilleron
Benoît Sagot
Djamé Seddah
DeLMO
22
33
0
09 Jun 2023
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning
Chujie Zheng
Pei Ke
Zheng Zhang
Minlie Huang
BDL
23
31
0
06 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
29
5
0
01 Jun 2023
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
16
7
0
18 May 2023
Prompt Engineering for Healthcare: Methodologies and Applications
Jiaqi Wang
Enze Shi
Sigang Yu
Zihao Wu
Chong Ma
...
Dajiang Zhu
Yixuan Yuan
Dinggang Shen
Tianming Liu
Shu Zhang
LM&MA
44
111
0
28 Apr 2023
An Evaluation on Large Language Model Outputs: Discourse and Memorization
Adrian de Wynter
Xun Wang
Alex Sokolov
Qilong Gu
Si-Qing Chen
ELM
87
32
0
17 Apr 2023
Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense
Andrei Kucharavy
Z. Schillaci
Loic Maréchal
Maxime Wursch
Ljiljana Dolamic
Remi Sabonnadiere
Dimitri Percia David
Alain Mermoud
Vincent Lenders
ELM
AI4CE
35
31
0
21 Mar 2023
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Shrimai Prabhumoye
M. Patwary
M. Shoeybi
Bryan Catanzaro
LM&MA
30
19
0
14 Feb 2023
Academic Writing with GPT-3.5: Reflections on Practices, Efficacy and Transparency
Ouguz Óz' Buruk
AI4CE
11
14
0
12 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
30
237
0
11 Feb 2023
The Gradient of Generative AI Release: Methods and Considerations
Irene Solaiman
33
98
0
05 Feb 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
41
633
0
31 Jan 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
36
102
0
30 Jan 2023
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
Skyler Hallinan
Alisa Liu
Yejin Choi
Maarten Sap
22
36
0
20 Dec 2022
Chatbots in a Botnet World
Forrest McKee
David A. Noever
23
26
0
18 Dec 2022
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
24
9
0
15 Dec 2022
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Zhexin Zhang
Jiale Cheng
Hao Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Minlie Huang
SILM
32
8
0
04 Dec 2022
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
51
402
0
17 Nov 2022
Deepfake Text Detection: Limitations and Opportunities
Jiameng Pu
Zain Sarwar
Sifat Muhammad Abdullah
A. Rehman
Yoonjin Kim
P. Bhattacharya
M. Javed
Bimal Viswanath
AAML
24
54
0
17 Oct 2022
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Evan Crothers
Nathalie Japkowicz
H. Viktor
DeLMO
41
107
0
13 Oct 2022
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
30
31
0
30 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
506
0
28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
446
0
23 Aug 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
T. Shamardina
Vladislav Mikhailov
Daniil Chernianskii
Alena Fenogenova
Marat Saidov
A. Valeeva
Tatiana Shavrina
I. Smurov
E. Tutubalina
Ekaterina Artemova
DeLMO
16
30
0
03 Jun 2022
Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
Kathleen C. Fraser
S. Kiritchenko
Esma Balkir
117
37
0
25 May 2022
LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models
Mor Geva
Avi Caciularu
Guy Dar
Paul Roit
Shoval Sadde
Micah Shlain
Bar Tamir
Yoav Goldberg
KELM
27
27
0
26 Apr 2022
Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models
Philip G. Feldman
Aaron Dant
James R. Foulds
Shimei Pan
18
3
0
15 Apr 2022
Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts
Apoorv Garg
Deval Srivastava
Zhiyang Xu
Lifu Huang
16
5
0
15 Apr 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
69
336
0
28 Mar 2022
Data-Efficient Structured Pruning via Submodular Optimization
Marwa El Halabi
Suraj Srinivas
Simon Lacoste-Julien
20
18
0
09 Mar 2022
Sustainable Cloud Services for Verbal Interaction with Embodied Agents
Lucrezia Grassi
Carmine Tommaso Recchiuto
A. Sgorbissa
26
8
0
04 Mar 2022
An Equivalence Between Data Poisoning and Byzantine Gradient Attacks
Sadegh Farhadkhani
R. Guerraoui
L. Hoang
Oscar Villemaud
FedML
16
24
0
17 Feb 2022
1
2
Next