Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.17693
Cited By
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
26 November 2024
Jiaxin Wen
Vivek Hebbar
Caleb Larson
Aryan Bhatt
Ansh Radhakrishnan
Mrinank Sharma
Henry Sleight
Shi Feng
He He
Ethan Perez
Buck Shlegeris
Akbir Khan
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats"
3 / 3 papers shown
Title
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
Jonathan Kutasov
Yuqi Sun
Paul Colognese
Teun van der Weij
Linda Petrini
...
Xiang Deng
Henry Sleight
Tyler Tracy
Buck Shlegeris
Joe Benton
LLMAG
28
0
0
17 Jun 2025
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Chen Yueh-Han
Nitish Joshi
Yulin Chen
Maksym Andriushchenko
Rico Angell
He He
AAML
104
0
0
12 Jun 2025
A sketch of an AI control safety case
Tomek Korbak
Joshua Clymer
Benjamin Hilton
Buck Shlegeris
Geoffrey Irving
147
10
0
28 Jan 2025
1