Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.20315
Cited By
LangProBe: a Language Programs Benchmark
27 February 2025
Shangyin Tan
Lakshya A Agrawal
Arnav Singhvi
Liheng Lai
Michael J Ryan
Dan Klein
Omar Khattab
Koushik Sen
Matei A. Zaharia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LangProBe: a Language Programs Benchmark"
6 / 6 papers shown
Title
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
82
45
0
16 Oct 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
104
576
0
06 Aug 2024
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
H. Trivedi
Tushar Khot
Mareike Hartmann
R. Manku
Vinty Dong
Edward Li
Shashank Gupta
Ashish Sabharwal
Niranjan Balasubramanian
VGen
LLMAG
48
30
0
26 Jul 2024
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou
Frank F. Xu
Hao Zhu
Xuhui Zhou
Robert Lo
...
Tianyue Ou
Yonatan Bisk
Daniel Fried
Uri Alon
Graham Neubig
LLMAG
66
420
0
25 Jul 2023
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Jingyuan Selena She
Christopher Potts
Sam Bowman
Atticus Geiger
40
13
0
30 May 2023
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
LRM
46
55
0
02 Jan 2021
1