v1v2 (latest)

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

19 May 2023

ArXiv (abs)PDF HTML Github (5★)

Papers citing "What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability"

27 / 27 papers shown

Title
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics Silvia Casola Yang Liu Siyao Peng Oliver Kraus Albert Gatt Barbara Plank 23 0 0 17 Jun 2025
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation Beiduo Chen Yang Liu Anna Korhonen Barbara Plank LRM 73 0 0 29 May 2025
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement Gabriele Sarti Vilém Zouhar Malvina Nissim Arianna Bisazza 47 0 0 29 May 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey Xiaoou Liu Tiejin Chen Longchao Da Chacha Chen Zhen Lin Hua Wei HILM 144 8 0 20 Mar 2025
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI Beiduo Chen Siyao Peng Anna Korhonen Barbara Plank 139 2 0 18 Dec 2024
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs Yanzhu Guo Simone Conia Zelin Zhou Min Li Saloni Potdar Henry Xiao 83 3 0 21 Oct 2024
Large-scale cloze evaluation reveals that token prediction tasks are neither lexically nor semantically aligned Cassandra L. Jacobs Loïc Grobol Alvin Tsang 51 0 0 15 Oct 2024
"I Never Said That": A dataset, taxonomy and baselines on response clarity classification Konstantinos Thomas Giorgos Filandrianos Maria Lymperaiou Chrysoula Zerva Giorgos Stamou 58 0 0 20 Sep 2024
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond Nils Dycke Matej Zečević Ilia Kuznetsov Beatrix Suess Kristian Kersting Iryna Gurevych LRM 114 0 0 09 Sep 2024
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition Aditya K Surikuchi Raquel Fernández Sandro Pezzelle 61 6 0 05 Jul 2024
Compare without Despair: Reliable Preference Evaluation with Generation Separability Sayan Ghosh Tejas Srinivasan Swabha Swayamdipta 77 2 0 02 Jul 2024
The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories Xi Yu Huang Krishnapriya Vishnubhotla Frank Rudzicz 74 3 0 24 Jun 2024
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? G. Yona Roee Aharoni Mor Geva HILM 101 32 0 27 May 2024
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods Polina Tsvilodub Hening Wang Sharon Grosch Michael Franke 82 9 0 01 Mar 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 286 22 0 28 Feb 2024
Predict the Next Word: Humans exhibit uncertainty in this task and language models _____ Evgenia Ilia Wilker Aziz 67 2 0 27 Feb 2024
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models Ivar Frisch Mario Giulianelli 74 11 0 05 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? Marcio Fonseca Shay B. Cohen 85 12 0 18 Jan 2024
Benchmarking Large Language Model Volatility Boyang Yu 33 4 0 26 Nov 2023
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue Aron Molnar Jaap Jumelet Mario Giulianelli Arabella J. Sinclair 68 2 0 21 Nov 2023
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text Yanzhu Guo Guokan Shang Michalis Vazirgiannis Chloé Clavel 102 59 0 16 Nov 2023
How Far Can We Extract Diverse Perspectives from Large Language Models? Shirley Anugrah Hayati Minhwa Lee Dheeraj Rajagopal Dongyeop Kang 97 11 0 16 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models Naomi Saphra Eve Fleisig Kyunghyun Cho Adam Lopez LRM 56 8 0 08 Nov 2023
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution Jaap Jumelet Willem H. Zuidema 84 6 0 23 Oct 2023
Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives Mario Giulianelli Sarenne Wallbridge Raquel Fernández 54 15 0 20 Oct 2023
Uncertainty in Natural Language Generation: From Theory to Applications Joris Baan Nico Daheim Evgenia Ilia Dennis Ulmer Haau-Sing Li Raquel Fernández Barbara Plank Rico Sennrich Chrysoula Zerva Wilker Aziz UQLM 158 45 0 28 Jul 2023
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models Zhen Lin Shubhendu Trivedi Jimeng Sun HILM 201 157 0 30 May 2023