ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.08920
  4. Cited By
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics

OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics

19 May 2021
Jian Guan
Zhexin Zhang
Zhuoer Feng
Zitao Liu
Wenbiao Ding
Xiaoxi Mao
Changjie Fan
Minlie Huang
ArXivPDFHTML

Papers citing "OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics"

20 / 20 papers shown
Title
Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
Grgur Kovač
Jérémy Perez
Rémy Portelas
Peter Ford Dominey
Pierre-Yves Oudeyer
35
0
0
04 Apr 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
60
3
0
21 Feb 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
46
3
0
14 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
31
4
0
07 Oct 2024
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi
Masahiro Kaneko
Ryuto Koike
Mengsay Loem
Naoaki Okazaki
40
4
0
25 Feb 2024
Zebra: Extending Context Window with Layerwise Grouped Local-Global
  Attention
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
34
7
0
14 Dec 2023
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts
Christina Chance
Da Yin
Dakuo Wang
Kai-Wei Chang
34
0
0
16 Oct 2023
Learning Personalized Alignment for Evaluating Open-ended Text
  Generation
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang
Kevin Kaichuang Yang
Hanlin Zhu
Xiaomeng Yang
Andrew Cohen
Lei Li
Yuandong Tian
ALM
LM&MA
20
8
0
05 Oct 2023
Diversifying Question Generation over Knowledge Base via External Natural Questions
Diversifying Question Generation over Knowledge Base via External Natural Questions
Shasha Guo
Jing Zhang
Xirui Ke
Cuiping Li
Hong Chen
42
3
0
23 Sep 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
62
446
0
07 Mar 2023
Open-world Story Generation with Structured Knowledge Enhancement: A
  Comprehensive Survey
Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey
Yuxin Wang
Jieru Lin
Zhiwei Yu
Wei Hu
Börje F. Karlsson
20
17
0
09 Dec 2022
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
Hong Chen
D. Vo
Hiroya Takamura
Yusuke Miyao
Hideki Nakayama
27
20
0
16 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
127
94
0
06 Oct 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation
  of Story Generation
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
53
50
0
24 Aug 2022
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating
  Controlled Text Generation
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
Pei Ke
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
Xiaoyan Zhu
Minlie Huang
21
37
0
02 Apr 2022
Rethinking and Refining the Distinct Metric
Rethinking and Refining the Distinct Metric
Siyang Liu
Sahand Sabour
Yinhe Zheng
Pei Ke
Xiaoyan Zhu
Minlie Huang
36
11
0
28 Feb 2022
A Temporal Variational Model for Story Generation
A Temporal Variational Model for Story Generation
David Wilmot
Frank Keller
DRL
35
8
0
14 Sep 2021
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text
  Understanding and Generation
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation
Jian Guan
Zhuoer Feng
Yamei Chen
Ru He
Xiaoxi Mao
Changjie Fan
Minlie Huang
39
32
0
30 Aug 2021
Adversarial Evaluation of Dialogue Models
Adversarial Evaluation of Dialogue Models
Anjuli Kannan
Oriol Vinyals
AAML
ALM
141
76
0
27 Jan 2017
1