Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00061
Cited By
v1
v2 (latest)
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
30 June 2021
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"
50 / 224 papers shown
Title
The Career Interests of Large Language Models
Meng Hua
Yuan Cheng
Hengshu Zhu
116
0
0
11 Jul 2024
Paraphrase Types Elicit Prompt Engineering Capabilities
Jan Philip Wahle
Terry Ruas
Yang Xu
Bela Gipp
143
10
0
28 Jun 2024
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
Yalan Qin
Chongye Guo
Borong Zhang
Boyuan Chen
Josef Dai
...
Kaile Wang
Boxuan Li
Sirui Han
Yike Guo
Yaodong Yang
95
51
0
20 Jun 2024
Evaluation and Continual Improvement for an Enterprise AI Assistant
Akash Maharaj
Kun Qian
Uttaran Bhattacharya
Sally Fang
Horia Galatanu
...
Rachel Hanessian
Nishant Kapoor
Ken Russell
Shivakumar Vaithyanathan
Yunyao Li
48
4
0
15 Jun 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Tianle Zhang
Langtian Ma
Yuchen Yan
Yuchen Zhang
Kai Wang
...
Wenqi Shao
Yang You
Yu Qiao
Ping Luo
Kaipeng Zhang
VGen
145
2
0
13 Jun 2024
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches
Bhashithe Abeysinghe
Ruhan Circi
ELM
108
23
0
05 Jun 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
103
10
0
28 May 2024
Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection
Teodor-George Marchitan
Claudiu Creanga
Liviu P. Dinu
DeLMO
52
3
0
28 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
120
11
0
24 May 2024
Your Large Language Models Are Leaving Fingerprints
Hope McGovern
Rickard Stureborg
Yoshi Suhara
Dimitris Alikaniotis
DeLMO
93
14
0
22 May 2024
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation
Cyril Chhun
Fabian M. Suchanek
Chloé Clavel
LRM
114
18
0
22 May 2024
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs
Akhila Yerukola
Saujas Vaduguru
Daniel Fried
Maarten Sap
56
1
0
14 May 2024
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Liam Dugan
Alyssa Hwang
Filip Trhlik
Josh Magnus Ludan
Andrew Zhu
Hainiu Xu
Daphne Ippolito
Christopher Callison-Burch
DeLMO
AAML
120
52
0
13 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
163
8
0
09 May 2024
Explainability for Transparent Conversational Information-Seeking
Weronika Lajewska
Damiano Spina
Johanne Trippas
K. Balog
68
7
0
06 May 2024
Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts
Tolga Buz
Benjamin Frost
Nikola Genchev
Moritz Schneider
Lucie-Aimée Kaffee
Gerard de Melo
DeLMO
91
9
0
02 May 2024
Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability
Meng Li
Haoran Jin
Ruixuan Huang
Zhihao Xu
Defu Lian
Zijia Lin
Di Zhang
Xiting Wang
LRM
26
4
0
29 Apr 2024
Towards Intent-based User Interfaces: Charting the Design Space of Intent-AI Interactions Across Task Types
Zijian Ding
113
6
0
28 Apr 2024
Text Quality-Based Pruning for Efficient Training of Language Models
Vasu Sharma
Karthik Padthe
Newsha Ardalani
Kushal Tirumala
Russell Howes
...
Po-Yao Huang
Shang-Wen Li
Armen Aghajanyan
Gargi Ghosh
Luke Zettlemoyer
120
6
0
26 Apr 2024
ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations
Tyler Loakman
Chenghua Lin
46
0
0
26 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
253
22
0
25 Apr 2024
Snake Story: Exploring Game Mechanics for Mixed-Initiative Co-creative Storytelling Games
Daijin Yang
Erica Kleinman
G. M. Troiano
Elina Tochilnikova
Casper Harteveld
65
2
0
11 Apr 2024
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Mahjabin Nahar
Haeseung Seo
Eun-Ju Lee
Aiping Xiong
Dongwon Lee
HILM
93
12
0
04 Apr 2024
METAL: Towards Multilingual Meta-Evaluation
Rishav Hada
Varun Gumma
Mohamed Ahmed
Kalika Bali
Sunayana Sitaram
ELM
80
3
0
02 Apr 2024
MUGC: Machine Generated versus User Generated Content Detection
Yaqi Xie
Anjali Rawal
Yujing Cen
Dixuan Zhao
S. K. Narang
Shanu Sushmita
DeLMO
111
3
0
28 Mar 2024
EAGLE: A Domain Generalization Framework for AI-generated Text Detection
Amrita Bhattacharjee
Raha Moraffah
Joshua Garland
Huan Liu
DeLMO
79
7
0
23 Mar 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
William James Bolton
Rafael Poyiadzi
Edward R. Morrell
Gabriela van Bergen Gonzalez Bueno
Lea Goetz
84
2
0
21 Mar 2024
A Design Space for Intelligent and Interactive Writing Assistants
Mina Lee
Katy Ilonka Gero
John Joon Young Chung
S. Buckingham Shum
Vipul Raheja
...
Joonsuk Park
Roy Pea
Eugenia H Rho
Shannon Zejiang Shen
Pao Siangliulue
101
99
0
21 Mar 2024
Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases
Tyler Loakman
Chen Tang
Chenghua Lin
98
4
0
20 Mar 2024
Emergence of Social Norms in Generative Agent Societies: Principles and Architecture
Siyue Ren
Zhiyao Cui
Ruiqi Song
Zhen Wang
Shuyue Hu
LLMAG
86
10
0
13 Mar 2024
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Weixin Liang
Zachary Izzo
Yaohui Zhang
Haley Lepp
Hancheng Cao
...
Haotian Ye
Sheng Liu
Zhi Huang
Daniel A. McFarland
James Y. Zou
DeLMO
161
103
0
11 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models
Vanshika Vats
Marzia Binta Nizam
Minghao Liu
Ziyuan Wang
Richard Ho
...
Celeste Shen
Rachel Shen
Nafisa Hussain
Kesav Ravichandran
James Davis
LM&MA
124
9
0
07 Mar 2024
Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights
Zijie Zeng
Shiqi Liu
Lele Sha
Zhuang Li
Kaixun Yang
Sannyuya Liu
Dragan Gavsević
Guanliang Chen
DeLMO
103
7
0
06 Mar 2024
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
Tanmay Rajore
Nishanth Chandran
Sunayana Sitaram
Divya Gupta
Rahul Sharma
Kashish Mittal
Manohar Swaminathan
113
16
0
01 Mar 2024
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate
Jimin Mun
Cathy Buerger
Jenny T Liang
Joshua Garland
Maarten Sap
77
12
0
29 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
286
22
0
28 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
208
113
0
16 Feb 2024
Can AI and humans genuinely communicate?
Constant Bonard
70
1
0
14 Feb 2024
Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale
Freddy Heppell
M. Bakir
Kalina Bontcheva
DeLMO
73
1
0
13 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
85
12
0
18 Jan 2024
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
Kaitlyn Zhou
Jena D. Hwang
Xiang Ren
Maarten Sap
90
68
0
12 Jan 2024
Advancing GUI for Generative AI: Charting the Design Space of Human-AI Interactions through Task Creativity and Complexity
Zijian Ding
LLMAG
83
5
0
04 Jan 2024
Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel
Chi-Heng Yang
Junjie Hu
Hsinchun Chen
116
8
0
27 Dec 2023
New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking
Karanpartap Singh
James Zou
WaLM
166
9
0
04 Dec 2023
I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text
Kaan Efe Keles
Ömer Kaan Gürbüz
Mucahid Kutlu
WaLM
41
2
0
29 Nov 2023
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Yao Dou
Isadora Krsek
Tarek Naous
Anubha Kabra
Sauvik Das
Alan Ritter
Wei Xu
103
30
0
16 Nov 2023
AI-generated text boundary detection with RoFT
Laida Kushnareva
T. Gaintseva
German Magai
S. Barannikov
Dmitry Abulkhanov
Kristian Kuznetsov
Eduard Tulchinskii
Irina Piontkovskaya
Sergey I. Nikolenko
DeLMO
65
7
0
14 Nov 2023
Evaluation of GPT-4 for chest X-ray impression generation: A reader study on performance and perception
Sebastian Ziegelmayer
Alexander W. Marka
Nicolas Lenhart
Nadja Nehls
S. Reischl
Felix Harder
Andreas Sauter
Marcus R. Makowski
Markus Graf
J. Gawlitza
MedIm
LM&MA
43
14
0
12 Nov 2023
The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation
Tyler Loakman
Aaron Maladry
Chenghua Lin
56
10
0
09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
56
8
0
08 Nov 2023
Previous
1
2
3
4
5
Next