Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02792
Cited By
Unifying Human and Statistical Evaluation for Natural Language Generation
4 April 2019
Tatsunori B. Hashimoto
Hugh Zhang
Percy Liang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unifying Human and Statistical Evaluation for Natural Language Generation"
50 / 69 papers shown
Title
From Superficial to Deep: Integrating External Knowledge for Follow-up Question Generation Using Knowledge Graph and LLM
Jianyu Liu
Yi Huang
Sheng Bi
Junlan Feng
Guilin Qi
52
2
0
08 Apr 2025
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao
Yanbin Zhao
Juncai He
Yue Zhang
VLM
98
2
0
20 Feb 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
80
6
0
28 Jan 2025
Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting
Can Chen
Jun-Kun Wang
DeLMO
42
0
0
29 Oct 2024
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen
Omar Hagrass
Jason M. Klusowski
32
3
0
04 Oct 2024
Agents' Room: Narrative Generation through Multi-step Collaboration
Fantine Huot
Reinald Kim Amplayo
Jennimaria Palomaki
Alice Shoshana Jakobovits
Elizabeth Clark
Mirella Lapata
47
7
0
03 Oct 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
29
66
0
30 May 2024
Zur Darstellung eines mehrstufigen Prototypbegriffs in der multilingualen automatischen Sprachgenerierung: vom Korpus über word embeddings bis hin zum automatischen Wörterbuch
M. J. Domínguez Vázquez
21
3
0
26 Dec 2023
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
40
10
0
16 Nov 2023
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
Shiyue Zhang
Shijie Wu
Ozan Irsoy
Steven Lu
Joey Tianyi Zhou
Mark Dredze
David S. Rosenberg
25
9
0
26 May 2023
HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis
Christoforos Vasilatos
Manaar Alam
Talal Rahwan
Yasir Zaki
Michail Maniatakos
DeLMO
40
32
0
26 May 2023
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection
Xiao Yu
Yuang Qi
Kejiang Chen
Guoqiang Chen
Xi Yang
Pengyuan Zhu
Xiuwei Shang
Weiming Zhang
Neng H. Yu
DeLMO
23
11
0
21 May 2023
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Sarik Ghazarian
Yijia Shao
Rujun Han
Aram Galstyan
Nanyun Peng
27
7
0
12 May 2023
Tailoring Language Generation Models under Total Variation Distance
Haozhe Ji
Pei Ke
Zhipeng Hu
Rongsheng Zhang
Minlie Huang
28
18
0
26 Feb 2023
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
47
22
0
30 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
63
100
0
19 Dec 2022
LENS: A Learnable Evaluation Metric for Text Simplification
Mounica Maddela
Yao Dou
David Heineman
Wei Xu
29
63
0
19 Dec 2022
Implicit causality in GPT-2: a case study
H. Huynh
T. Lentz
Emiel van Miltenburg
LRM
27
3
0
08 Dec 2022
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Mirac Suzgun
Luke Melas-Kyriazi
Dan Jurafsky
35
43
0
14 Nov 2022
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
44
76
0
27 Oct 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
33
3
0
24 Oct 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
42
3
0
11 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
36
10
0
25 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
83
122
0
03 Jul 2022
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
64
14
0
11 Jun 2022
Computational Storytelling and Emotions: A Survey
Yusuke Mori
Hiroaki Yamane
Yusuke Mukuta
Tatsuya Harada
48
2
0
23 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
43
5
0
13 May 2022
Vector Representations of Idioms in Conversational Systems
Tosin Adewumi
F. Liwicki
Marcus Liwicki
50
8
0
07 May 2022
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías
George F. Foster
Markus Freitag
Qijun Tan
25
10
0
11 Apr 2022
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
Pei Ke
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
Xiaoyan Zhu
Minlie Huang
21
38
0
02 Apr 2022
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan
Gonccalo Simoes
Yao-Min Zhao
Joshua Maynez
Dipanjan Das
Michael Collins
Mirella Lapata
29
30
0
28 Mar 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
57
215
0
14 Jan 2022
Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir
Cédric Renggli
Nora Hollenstein
Ce Zhang
44
2
0
15 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
48
2
0
11 Dec 2021
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla
Sriram Krishna
R. Shah
Changyou Chen
18
6
0
17 Nov 2021
Visual Intelligence through Human Interaction
Ranjay Krishna
Mitchell L. Gordon
Fei-Fei Li
Michael S. Bernstein
29
8
0
12 Nov 2021
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models
Tanya Goyal
Nazneen Rajani
Wenhao Liu
Wojciech Kry'sciñski
AI4CE
20
12
0
08 Oct 2021
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Mingkai Deng
Bowen Tan
Zhengzhong Liu
Eric Xing
Zhiting Hu
16
73
0
14 Sep 2021
Language Model Evaluation in Open-ended Text Generation
An Nguyen
44
3
0
08 Aug 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
30
8
0
03 Aug 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
17
128
0
02 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
54
398
0
30 Jun 2021
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
29
75
0
15 Jun 2021
Focus Attention: Promoting Faithfulness and Diversity in Summarization
Rahul Aralikatte
Shashi Narayan
Joshua Maynez
S. Rothe
Ryan T. McDonald
37
45
0
25 May 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
260
285
0
02 Feb 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
45
343
0
02 Feb 2021
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
Daniel Khashabi
Gabriel Stanovsky
Jonathan Bragg
Nicholas Lourie
Jungo Kasai
Yejin Choi
Noah A. Smith
Daniel S. Weld
26
20
0
17 Jan 2021
GLUCOSE: GeneraLized and COntextualized Story Explanations
N. Mostafazadeh
Aditya Kalyanpur
Lori Moon
David W. Buchanan
Lauren Berkowitz
Or Biran
Jennifer Chu-Carroll
32
121
0
16 Sep 2020
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian Guan
Minlie Huang
29
69
0
16 Sep 2020
1
2
Next