Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14682
Cited By
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
17 June 2025
Ads Dawson
Rob Mulla
Nick Landers
Shane Caldwell
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models"
5 / 5 papers shown
Title
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAML
ELM
76
3
0
03 Mar 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
373
1,967
0
22 Jan 2025
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Ziyang Li
Saikat Dutta
Mayur Naik
96
55
0
27 May 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie
Danyang Zhang
Jixuan Chen
Xiaochuan Li
Siheng Zhao
...
Shuyan Zhou
Silvio Savarese
Caiming Xiong
Victor Zhong
Tao Yu
104
173
0
11 Apr 2024
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
233
5,635
0
07 Jul 2021
1