ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14682
  4. Cited By
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

17 June 2025
Ads Dawson
Rob Mulla
Nick Landers
Shane Caldwell
    ELM
ArXiv (abs)PDFHTML

Papers citing "AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models"

5 / 5 papers shown
Title
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAMLELM
76
3
0
03 Mar 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
373
1,967
0
22 Jan 2025
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Ziyang Li
Saikat Dutta
Mayur Naik
96
55
0
27 May 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
  Computer Environments
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie
Danyang Zhang
Jixuan Chen
Xiaochuan Li
Siheng Zhao
...
Shuyan Zhou
Silvio Savarese
Caiming Xiong
Victor Zhong
Tao Yu
104
173
0
11 Apr 2024
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
233
5,635
0
07 Jul 2021
1