ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14682
  4. Cited By
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

17 June 2025
Ads Dawson
Rob Mulla
Nick Landers
Shane Caldwell
    ELM
ArXiv (abs)PDFHTML

Papers citing "AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models"

2 / 2 papers shown
Title
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
  Computer Environments
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie
Danyang Zhang
Jixuan Chen
Xiaochuan Li
Siheng Zhao
...
Shuyan Zhou
Silvio Savarese
Caiming Xiong
Victor Zhong
Tao Yu
104
173
0
11 Apr 2024
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
233
5,635
0
07 Jul 2021
1