ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.05928
  4. Cited By
RankME: Reliable Human Ratings for Natural Language Generation

RankME: Reliable Human Ratings for Natural Language Generation

15 March 2018
Jekaterina Novikova
Ondrej Dusek
Verena Rieser
    ALM
ArXivPDFHTML

Papers citing "RankME: Reliable Human Ratings for Natural Language Generation"

34 / 34 papers shown
Title
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
23
0
0
22 Apr 2025
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and
  Generation
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
37
10
0
04 Oct 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
49
3
0
25 Aug 2024
AI-Assisted Human Evaluation of Machine Translation
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
48
5
0
18 Jun 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in
  Task-Oriented Dialogue Systems
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
43
3
0
15 Apr 2024
How Much Annotation is Needed to Compare Summarization Models?
How Much Annotation is Needed to Compare Summarization Models?
Chantal Shaib
Joe Barrow
Alexa F. Siu
Byron C. Wallace
A. Nenkova
56
2
0
28 Feb 2024
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
34
5
0
17 May 2023
LENS: A Learnable Evaluation Metric for Text Simplification
LENS: A Learnable Evaluation Metric for Text Simplification
Mounica Maddela
Yao Dou
David Heineman
Wei-ping Xu
29
63
0
19 Dec 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
42
1
0
08 Nov 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
On the Effectiveness of Automated Metrics for Text Generation Systems
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
21
3
0
24 Oct 2022
Risk-graded Safety for Handling Medical Queries in Conversational AI
Risk-graded Safety for Handling Medical Queries in Conversational AI
Gavin Abercrombie
Verena Rieser
AI4MH
38
11
0
02 Oct 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
49
11
0
31 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
32
10
0
25 Jul 2022
The Authenticity Gap in Human Evaluation
The Authenticity Gap in Human Evaluation
Kawin Ethayarajh
Dan Jurafsky
87
24
0
24 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Tianbo Ji
Yvette Graham
Gareth J. F. Jones
Chenyang Lyu
Qun Liu
ALM
36
39
0
11 Mar 2022
Czech Grammar Error Correction with a Large and Diverse Corpus
Czech Grammar Error Correction with a Large and Diverse Corpus
Jakub Náplava
Milan Straka
Jana Straková
Alexandr Rosen
25
32
0
14 Jan 2022
A Survey of Controllable Text Generation using Transformer-based
  Pre-trained Language Models
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
52
214
0
14 Jan 2022
Dynamic Human Evaluation for Relative Model Comparisons
Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir
Cédric Renggli
Nora Hollenstein
Ce Zhang
36
2
0
15 Dec 2021
Better than Average: Paired Evaluation of NLP Systems
Better than Average: Paired Evaluation of NLP Systems
Maxime Peyrard
Wei-Ye Zhao
Steffen Eger
Robert West
ELM
16
24
0
20 Oct 2021
AutoChart: A Dataset for Chart-to-Text Generation Task
AutoChart: A Dataset for Chart-to-Text Generation Task
Jiawen Zhu
Jinye Ran
Roy Ka-Wei Lee
Kenny Choo
Zhi Li
25
15
0
16 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and
  Tooling
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
43
105
0
07 Jul 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
17
126
0
02 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
51
394
0
30 Jun 2021
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text
  Systems
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Craig Thomson
Ehud Reiter
19
52
0
08 Nov 2020
An Evaluation Protocol for Generative Conversational Systems
An Evaluation Protocol for Generative Conversational Systems
Seolhwa Lee
Heuiseok Lim
Jo˜ao Sedoc
ELM
35
10
0
24 Oct 2020
Local Knowledge Powered Conversational Agents
Local Knowledge Powered Conversational Agents
Sashank Santhanam
Ming-Yu Liu
Raul Puri
M. Shoeybi
M. Patwary
Bryan Catanzaro
29
4
0
20 Oct 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and
  Future Directions
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Stephen Roller
Y-Lan Boureau
Jason Weston
Antoine Bordes
Emily Dinan
...
Kurt Shuster
Eric Michael Smith
Arthur Szlam
Jack Urbanek
Mary Williamson
LLMAG
AI4CE
28
51
0
22 Jun 2020
A Crowd-based Evaluation of Abuse Response Strategies in Conversational
  Agents
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
Amanda Cercas Curry
Verena Rieser
22
31
0
10 Sep 2019
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and
  Multi-turn Comparisons
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
Margaret Li
Jason Weston
Stephen Roller
31
175
0
06 Sep 2019
Evaluating the State-of-the-Art of End-to-End Natural Language
  Generation: The E2E NLG Challenge
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
ELM
46
232
0
23 Jan 2019
Findings of the E2E NLG Challenge
Findings of the E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
20
115
0
02 Oct 2018
Efficient Online Scalar Annotation with Bounded Support
Efficient Online Scalar Annotation with Bounded Support
Keisuke Sakaguchi
Benjamin Van Durme
11
45
0
04 Jun 2018
1