ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.06411
  4. Cited By
AgentQuest: A Modular Benchmark Framework to Measure Progress and
  Improve LLM Agents

AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents

9 April 2024
Luca Gioacchini
G. Siracusano
D. Sanvito
Kiril Gashteovski
David Friede
Roberto Bifulco
Carolin (Haas) Lawrence
    ELM
    LLMAG
ArXivPDFHTML

Papers citing "AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents"

6 / 6 papers shown
Title
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
141
25
0
18 Jun 2024
Gorilla: Large Language Model Connected with Massive APIs
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELM
CLL
ALM
SyDa
81
552
0
24 May 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as
  Conversational Agents
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
64
35
0
22 May 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
417
2,849
0
06 Oct 2022
ALFWorld: Aligning Text and Embodied Environments for Interactive
  Learning
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&Ro
LLMAG
83
431
0
08 Oct 2020
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
321
5,801
0
21 Apr 2019
1