Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.11443
Cited By
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
18 February 2024
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
Xuanjing Huang
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation"
6 / 6 papers shown
Title
UFO2: The Desktop AgentOS
Chaoyun Zhang
He Huang
Chiming Ni
J. Mu
Si Qin
...
Minghua Ma
Jian-Guang Lou
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
34
0
0
20 Apr 2025
SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models
Peter Carragher
Nikitha Rao
Abhinand Jha
R Raghav
Kathleen M. Carley
VLM
56
0
0
19 Feb 2025
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
Eunsu Kim
Juyoung Suk
Seungone Kim
Niklas Muennighoff
Dongkwan Kim
Alice H. Oh
ELM
83
1
0
31 Dec 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
107
136
0
03 Nov 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
673
0
06 Jan 2021
1