ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.02792
  4. Cited By
Unifying Human and Statistical Evaluation for Natural Language
  Generation

Unifying Human and Statistical Evaluation for Natural Language Generation

4 April 2019
Tatsunori B. Hashimoto
Hugh Zhang
Percy Liang
ArXivPDFHTML

Papers citing "Unifying Human and Statistical Evaluation for Natural Language Generation"

50 / 69 papers shown
Title
From Superficial to Deep: Integrating External Knowledge for Follow-up Question Generation Using Knowledge Graph and LLM
From Superficial to Deep: Integrating External Knowledge for Follow-up Question Generation Using Knowledge Graph and LLM
Jianyu Liu
Yi Huang
Sheng Bi
Junlan Feng
Guilin Qi
52
2
0
08 Apr 2025
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao
Yanbin Zhao
Juncai He
Yue Zhang
VLM
100
2
0
20 Feb 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
80
6
0
28 Jan 2025
Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting
Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting
Can Chen
Jun-Kun Wang
DeLMO
44
0
0
29 Oct 2024
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen
Omar Hagrass
Jason M. Klusowski
34
3
0
04 Oct 2024
Agents' Room: Narrative Generation through Multi-step Collaboration
Agents' Room: Narrative Generation through Multi-step Collaboration
Fantine Huot
Reinald Kim Amplayo
Jennimaria Palomaki
Alice Shoshana Jakobovits
Elizabeth Clark
Mirella Lapata
47
7
0
03 Oct 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal
  Research Tools
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
29
66
0
30 May 2024
Zur Darstellung eines mehrstufigen Prototypbegriffs in der
  multilingualen automatischen Sprachgenerierung: vom Korpus über word
  embeddings bis hin zum automatischen Wörterbuch
Zur Darstellung eines mehrstufigen Prototypbegriffs in der multilingualen automatischen Sprachgenerierung: vom Korpus über word embeddings bis hin zum automatischen Wörterbuch
M. J. Domínguez Vázquez
24
3
0
26 Dec 2023
How Far Can We Extract Diverse Perspectives from Large Language Models?
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
42
10
0
16 Nov 2023
MixCE: Training Autoregressive Language Models by Mixing Forward and
  Reverse Cross-Entropies
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
Shiyue Zhang
Shijie Wu
Ozan Irsoy
Steven Lu
Joey Tianyi Zhou
Mark Dredze
David S. Rosenberg
27
9
0
26 May 2023
HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis
HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis
Christoforos Vasilatos
Manaar Alam
Talal Rahwan
Yasir Zaki
Michail Maniatakos
DeLMO
40
32
0
26 May 2023
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated
  Text Detection
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection
Xiao Yu
Yuang Qi
Kejiang Chen
Guoqiang Chen
Xi Yang
Pengyuan Zhu
Xiuwei Shang
Weiming Zhang
Neng H. Yu
DeLMO
23
11
0
21 May 2023
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain
  Dialogue Systems
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Sarik Ghazarian
Yijia Shao
Rujun Han
Aram Galstyan
Nanyun Peng
27
7
0
12 May 2023
Tailoring Language Generation Models under Total Variation Distance
Tailoring Language Generation Models under Total Variation Distance
Haozhe Ji
Pei Ke
Zhipeng Hu
Rongsheng Zhang
Minlie Huang
28
18
0
26 Feb 2023
MAUVE Scores for Generative Models: Theory and Practice
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
47
22
0
30 Dec 2022
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
63
100
0
19 Dec 2022
LENS: A Learnable Evaluation Metric for Text Simplification
LENS: A Learnable Evaluation Metric for Text Simplification
Mounica Maddela
Yao Dou
David Heineman
Wei Xu
29
63
0
19 Dec 2022
Implicit causality in GPT-2: a case study
Implicit causality in GPT-2: a case study
H. Huynh
T. Lentz
Emiel van Miltenburg
LRM
27
3
0
08 Dec 2022
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum
  Bayes Risk Decoding
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Mirac Suzgun
Luke Melas-Kyriazi
Dan Jurafsky
35
43
0
14 Nov 2022
Truncation Sampling as Language Model Desmoothing
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
46
76
0
27 Oct 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
On the Effectiveness of Automated Metrics for Text Generation Systems
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
33
3
0
24 Oct 2022
A Comprehensive Survey of Natural Language Generation Advances from the
  Perspective of Digital Deception
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
44
3
0
11 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
39
10
0
25 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and
  Metrics
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
83
122
0
03 Jul 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
67
14
0
11 Jun 2022
Computational Storytelling and Emotions: A Survey
Computational Storytelling and Emotions: A Survey
Yusuke Mori
Hiroaki Yamane
Yusuke Mukuta
Tatsuya Harada
48
2
0
23 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
43
5
0
13 May 2022
Vector Representations of Idioms in Conversational Systems
Vector Representations of Idioms in Conversational Systems
Tosin Adewumi
F. Liwicki
Marcus Liwicki
52
8
0
07 May 2022
Toward More Effective Human Evaluation for Machine Translation
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías
George F. Foster
Markus Freitag
Qijun Tan
25
10
0
11 Apr 2022
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating
  Controlled Text Generation
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
Pei Ke
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
Xiaoyan Zhu
Minlie Huang
23
38
0
02 Apr 2022
A Well-Composed Text is Half Done! Composition Sampling for Diverse
  Conditional Generation
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan
Gonccalo Simoes
Yao-Min Zhao
Joshua Maynez
Dipanjan Das
Michael Collins
Mirella Lapata
29
30
0
28 Mar 2022
A Survey of Controllable Text Generation using Transformer-based
  Pre-trained Language Models
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
57
215
0
14 Jan 2022
Dynamic Human Evaluation for Relative Model Comparisons
Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir
Cédric Renggli
Nora Hollenstein
Ce Zhang
44
2
0
15 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
48
2
0
11 Dec 2021
How much do language models copy from their training data? Evaluating
  linguistic novelty in text generation using RAVEN
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
Using Sampling to Estimate and Improve Performance of Automated Scoring
  Systems with Guarantees
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla
Sriram Krishna
R. Shah
Changyou Chen
18
6
0
17 Nov 2021
Visual Intelligence through Human Interaction
Visual Intelligence through Human Interaction
Ranjay Krishna
Mitchell L. Gordon
Fei-Fei Li
Michael S. Bernstein
29
8
0
12 Nov 2021
HydraSum: Disentangling Stylistic Features in Text Summarization using
  Multi-Decoder Models
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models
Tanya Goyal
Nazneen Rajani
Wenhao Liu
Wojciech Kry'sciñski
AI4CE
20
12
0
08 Oct 2021
Compression, Transduction, and Creation: A Unified Framework for
  Evaluating Natural Language Generation
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Mingkai Deng
Bowen Tan
Zhengzhong Liu
Eric Xing
Zhiting Hu
16
73
0
14 Sep 2021
Language Model Evaluation in Open-ended Text Generation
Language Model Evaluation in Open-ended Text Generation
An Nguyen
44
3
0
08 Aug 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
30
8
0
03 Aug 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
17
128
0
02 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
54
398
0
30 Jun 2021
What Context Features Can Transformer Language Models Use?
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
29
75
0
15 Jun 2021
Focus Attention: Promoting Faithfulness and Diversity in Summarization
Focus Attention: Promoting Faithfulness and Diversity in Summarization
Rahul Aralikatte
Shashi Narayan
Joshua Maynez
S. Rothe
Ryan T. McDonald
40
45
0
25 May 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
260
285
0
02 Feb 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using
  Divergence Frontiers
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
45
343
0
02 Feb 2021
GENIE: Toward Reproducible and Standardized Human Evaluation for Text
  Generation
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
Daniel Khashabi
Gabriel Stanovsky
Jonathan Bragg
Nicholas Lourie
Jungo Kasai
Yejin Choi
Noah A. Smith
Daniel S. Weld
29
20
0
17 Jan 2021
GLUCOSE: GeneraLized and COntextualized Story Explanations
GLUCOSE: GeneraLized and COntextualized Story Explanations
N. Mostafazadeh
Aditya Kalyanpur
Lori Moon
David W. Buchanan
Lauren Berkowitz
Or Biran
Jennifer Chu-Carroll
32
121
0
16 Sep 2020
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian Guan
Minlie Huang
29
69
0
16 Sep 2020
12
Next