Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.16745
Cited By
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
30 January 2024
Wai-Chung Kwan
Xingshan Zeng
Yuxin Jiang
Yufei Wang
Liangyou Li
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
11 / 11 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
53
3
0
09 May 2025
Training a Generally Curious Agent
Fahim Tajwar
Yiding Jiang
Abitha Thankaraj
Sumaita Sadia Rahman
J. Zico Kolter
Jeff Schneider
Ruslan Salakhutdinov
126
1
0
24 Feb 2025
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Bryan L. M. de Oliveira
Luana G. B. Martins
Bruno Brandão
Luckeciano C. Melo
ELM
261
1
0
17 Feb 2025
CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants
Lize Alberts
Benjamin Ellis
Andrei Lupu
Jakob Foerster
ELM
44
1
0
28 Oct 2024
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
50
1
0
15 Oct 2024
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
Heng Chang
Miao Zheng
Fan Yang
Bin Cui
Tengjiao Wang
Xin Wu
Guosheng Dong
Wentao Zhang
ALM
53
6
0
12 Oct 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
50
5
0
25 Sep 2024
The use of GPT-4o and Other Large Language Models for the Improvement and Design of Self-Assessment Scales for Measurement of Interpersonal Communication Skills
Goran Bubaš
LM&MA
LLMAG
AI4MH
44
0
0
21 Sep 2024
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
Rishabh Maheshwary
Vikas Yadav
Hoang Nguyen
Khyati Mahajan
Sathwik Tejaswi Madhusudhan
51
3
0
24 Jun 2024
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
133
143
0
19 Sep 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
402
12,150
0
04 Mar 2022
1