Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14682
Cited By
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
17 June 2025
Ads Dawson
Rob Mulla
Nick Landers
Shane Caldwell
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models"
2 / 2 papers shown
Title
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie
Danyang Zhang
Jixuan Chen
Xiaochuan Li
Siheng Zhao
...
Shuyan Zhou
Silvio Savarese
Caiming Xiong
Victor Zhong
Tao Yu
104
173
0
11 Apr 2024
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
233
5,635
0
07 Jul 2021
1