Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.07587
Cited By
v1
v2 (latest)
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
8 November 2023
C. D. Freeman
Laura J. Culp
Aaron T Parisi
Maxwell Bileschi
Gamaleldin F. Elsayed
Alex Rizkowsky
Isabelle Simpson
A. Alemi
Azade Nova
Ben Adlam
Bernd Bohnet
Gaurav Mishra
Hanie Sedghi
Igor Mordatch
Izzeddin Gur
Jaehoon Lee
JD Co-Reyes
Jeffrey Pennington
Kelvin Xu
Kevin Swersky
Kshiteej Mahajan
Lechao Xiao
Rosanne Liu
Simon Kornblith
Noah Constant
Peter J. Liu
Roman Novak
Yundi Qian
Noah Fiedel
Jascha Narain Sohl-Dickstein
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?"
6 / 6 papers shown
Title
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
313
241
0
20 Oct 2023
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
89
73
0
07 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
293
1,508
0
27 Jul 2023
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
177
667
0
07 Feb 2022
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
354
5,379
0
03 Nov 2016
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
282
14,963
1
21 Dec 2013
1