Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.06411
Cited By
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
9 April 2024
Luca Gioacchini
G. Siracusano
D. Sanvito
Kiril Gashteovski
David Friede
Roberto Bifulco
Carolin (Haas) Lawrence
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents"
6 / 6 papers shown
Title
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
141
25
0
18 Jun 2024
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELM
CLL
ALM
SyDa
81
552
0
24 May 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
64
35
0
22 May 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
420
2,849
0
06 Oct 2022
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&Ro
LLMAG
83
431
0
08 Oct 2020
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
321
5,801
0
21 Apr 2019
1