ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.20315
  4. Cited By
LangProBe: a Language Programs Benchmark

LangProBe: a Language Programs Benchmark

27 February 2025
Shangyin Tan
Lakshya A Agrawal
Arnav Singhvi
Liheng Lai
Michael J Ryan
Dan Klein
Omar Khattab
Koushik Sen
Matei A. Zaharia
ArXivPDFHTML

Papers citing "LangProBe: a Language Programs Benchmark"

6 / 6 papers shown
Title
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
82
45
0
16 Oct 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than
  Scaling Model Parameters
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
104
576
0
06 Aug 2024
AppWorld: A Controllable World of Apps and People for Benchmarking
  Interactive Coding Agents
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
H. Trivedi
Tushar Khot
Mareike Hartmann
R. Manku
Vinty Dong
Edward Li
Shashank Gupta
Ashish Sabharwal
Niranjan Balasubramanian
VGen
LLMAG
48
30
0
26 Jul 2024
WebArena: A Realistic Web Environment for Building Autonomous Agents
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou
Frank F. Xu
Hao Zhu
Xuhui Zhou
Robert Lo
...
Tianyue Ou
Yonatan Bisk
Daniel Fried
Uri Alon
Graham Neubig
LLMAG
66
420
0
25 Jul 2023
ScoNe: Benchmarking Negation Reasoning in Language Models With
  Fine-Tuning and In-Context Learning
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Jingyuan Selena She
Christopher Potts
Sam Bowman
Atticus Geiger
40
13
0
30 May 2023
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
LRM
46
55
0
02 Jan 2021
1