ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.08023
  4. Cited By
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
ArXivPDFHTML

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 284 papers shown
Title
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
Yifeng Di
Tianyi Zhang
26
0
0
12 May 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
71
0
0
24 Feb 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
Suvodip Dey
M. Desarkar
OffRL
46
0
0
20 Jan 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Justin Vasselli
Adam Nohejl
Taro Watanabe
AAML
54
0
0
12 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
2
0
03 Jan 2025
AutoSAM: Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems
Han Zhang
Mingyue Cheng
Qi Liu
Zichen Liu
Junzhe Jiang
Enhong Chen
AI4TS
55
3
0
03 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
131
73
0
25 Nov 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
66
23
0
10 Sep 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça
Isabel Trancoso
A. Lavie
36
3
0
16 Jul 2024
Leveraging LLMs for Dialogue Quality Measurement
Leveraging LLMs for Dialogue Quality Measurement
Jinghan Jia
A. Komma
Timothy Leffel
Xujun Peng
Ajay Nagesh
Tamer Soliman
Aram Galstyan
Anoop Kumar
44
5
0
25 Jun 2024
Stratified Prediction-Powered Inference for Hybrid Language Model
  Evaluation
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
46
8
0
06 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal
  Research Tools
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
29
66
0
30 May 2024
Apollonion: Profile-centric Dialog Agent
Apollonion: Profile-centric Dialog Agent
Shangyu Chen
Zibo Zhao
Yuanyuan Zhao
Xiang Li
LLMAG
42
1
0
10 Apr 2024
A Survey of Personality, Persona, and Profile in Conversational Agents
  and Chatbots
A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots
Richard Sutcliffe
45
3
0
31 Dec 2023
Partially Randomizing Transformer Weights for Dialogue Response
  Diversity
Partially Randomizing Transformer Weights for Dialogue Response Diversity
Jing Yang Lee
Kong Aik Lee
Woon-Seng Gan
27
0
0
18 Nov 2023
Aligning Language Models with Human Preferences via a Bayesian Approach
Aligning Language Models with Human Preferences via a Bayesian Approach
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
ALM
42
22
0
09 Oct 2023
Learning Personalized Alignment for Evaluating Open-ended Text
  Generation
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang
Kevin Kaichuang Yang
Hanlin Zhu
Xiaomeng Yang
Andrew Cohen
Lei Li
Yuandong Tian
ALM
LM&MA
23
8
0
05 Oct 2023
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Qingyue Wang
Y. Fu
Yanan Cao
Zhiliang Tian
Shi Wang
Dacheng Tao
LLMAG
KELM
RALM
70
25
0
29 Aug 2023
Three Ways of Using Large Language Models to Evaluate Chat
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
21
6
0
12 Aug 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
34
53
0
27 Jul 2023
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Yue Feng
Yunlong Jiao
Animesh Prasad
Nikolaos Aletras
Emine Yilmaz
G. Kazai
30
5
0
26 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
22
2
0
24 May 2023
Dialogue Games for Benchmarking Language Understanding: Motivation,
  Taxonomy, Strategy
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy
David Schlangen
ELM
32
13
0
14 Apr 2023
CTRLStruct: Dialogue Structure Learning for Open-Domain Response
  Generation
CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation
Congchi Yin
Pijian Li
Z. Ren
37
11
0
02 Mar 2023
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
  Response Generation Models by Causal Discovery
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery
Tao Feng
Lizhen Qu
Gholamreza Haffari
CML
27
7
0
02 Mar 2023
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Cat P. Le
Luke Dai
Michael Johnston
Yang Liu
M. Walker
R. Ghanadan
ELM
19
10
0
31 Jan 2023
Improving a sequence-to-sequence nlp model using a reinforcement
  learning policy algorithm
Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm
Jabri Ismail
Aboulbichr Ahmed
El ouaazizi Aziza
24
2
0
28 Dec 2022
CausalDialogue: Modeling Utterance-level Causality in Conversations
CausalDialogue: Modeling Utterance-level Causality in Conversations
Yi-Lin Tuan
Alon Albalak
Wenda Xu
Michael Stephen Saxon
Connor Pryor
Lise Getoor
William Yang Wang
CML
37
2
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
60
100
0
19 Dec 2022
PAL: Persona-Augmented Emotional Support Conversation Generation
PAL: Persona-Augmented Emotional Support Conversation Generation
Jiale Cheng
Sahand Sabour
Hao Sun
Zhuang Chen
Minlie Huang
27
28
0
19 Dec 2022
Don't Forget Your ABC's: Evaluating the State-of-the-Art in
  Chat-Oriented Dialogue Systems
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems
Sarah E. Finch
James D. Finch
Jinho Choi
38
12
0
18 Dec 2022
PVGRU: Generating Diverse and Relevant Dialogue Responses via
  Pseudo-Variational Mechanism
PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
31
6
0
18 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
26
7
0
18 Dec 2022
A Survey on Natural Language Processing for Programming
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
25
2
0
12 Dec 2022
Open-world Story Generation with Structured Knowledge Enhancement: A
  Comprehensive Survey
Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey
Yuxin Wang
Jieru Lin
Zhiwei Yu
Wei Hu
Börje F. Karlsson
20
17
0
09 Dec 2022
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
  with Reinforced Keywords Learning
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning
Xiao Yu
Qingyang Wu
Kun Qian
Zhou Yu
OffRL
21
11
0
30 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
45
2
0
26 Nov 2022
CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware
  Dialog Generation
CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation
Deeksha Varshney
Aizan Zafar
Niranshu Kumar Behra
Asif Ekbal
29
6
0
16 Nov 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh
Vicent Chen
Ting-Hao Haung
Lun-Wei Ku
CoGe
18
7
0
14 Nov 2022
Empathetic Dialogue Generation via Sensitive Emotion Recognition and
  Sensible Knowledge Selection
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible Knowledge Selection
Lanrui Wang
JiangNan Li
Zheng Lin
Fandong Meng
Chenxu Yang
Weiping Wang
Jie Zhou
18
30
0
21 Oct 2022
Controllable Fake Document Infilling for Cyber Deception
Controllable Fake Document Infilling for Cyber Deception
Yibo Hu
Yu Lin
Eric Parolin
Latif Khan
Kevin W. Hamlen
35
8
0
18 Oct 2022
Dialogue Evaluation with Offline Reinforcement Learning
Dialogue Evaluation with Offline Reinforcement Learning
Nurul Lubis
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Michael Heck
Shutong Feng
Milica Gavsić
OffRL
27
4
0
02 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation
  of Story Generation
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
53
51
0
24 Aug 2022
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic
  Response Generation
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation
Jinfeng Zhou
Chujie Zheng
Bo Wang
Zheng Zhang
Minlie Huang
29
29
0
18 Aug 2022
MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue
  Generation
MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
27
8
0
18 Aug 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
Longxuan Ma
Ziyu Zhuang
Weinan Zhang
Mingda Li
Ting Liu
29
4
0
17 Aug 2022
Grounding in social media: An approach to building a chit-chat dialogue
  model
Grounding in social media: An approach to building a chit-chat dialogue model
Ritvik Choudhary
Daisuke Kawahara
21
4
0
12 Jun 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
61
14
0
11 Jun 2022
123456
Next