BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

29 October 2024

Papers citing "BENCHAGENTS: Automated Benchmark Creation with Agent Interaction"

4 / 4 papers shown

Title
Phi-4-reasoning Technical Report Marah Abdin Sahaj Agarwal Ahmed Hassan Awadallah Vidhisha Balachandran Harkirat Singh Behl ... Vaishnavi Shrivastava Vibhav Vineet Yue Wu Safoora Yousefi Guoqing Zheng ReLM LRM 90 1 0 30 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models José P. Pombal Nuno M. Guerreiro Ricardo Rei André F. T. Martins ALM 75 0 0 01 Apr 2025
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead Vidhisha Balachandran Jingya Chen Lingjiao Chen Shivam Garg Neel Joshi ... John Langford Besmira Nushi Vibhav Vineet Yue Wu Safoora Yousefi ReLM LRM 59 3 0 31 Mar 2025
Multi-agent Architecture Search via Agentic Supernet Guibin Zhang Luyang Niu Junfeng Fang Kaidi Wang Lei Bai Xinyu Wang 102 3 0 06 Feb 2025