
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
Papers citing "ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition"
20 / 20 papers shown
Title |
---|
![]() WebArena: A Realistic Web Environment for Building Autonomous Agents Shuyan Zhou Frank F. Xu Hao Zhu Xuhui Zhou Robert Lo ...Tianyue Ou Yonatan Bisk Daniel Fried Uri Alon Graham Neubig |