ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11707
  4. Cited By
What Comes Next? Evaluating Uncertainty in Neural Text Generators
  Against Human Production Variability
v1v2 (latest)

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

19 May 2023
Mario Giulianelli
Joris Baan
Wilker Aziz
Raquel Fernández
Barbara Plank
    UQLM
ArXiv (abs)PDFHTMLGithub (5★)

Papers citing "What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability"

27 / 27 papers shown
Title
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics
Silvia Casola
Yang Liu
Siyao Peng
Oliver Kraus
Albert Gatt
Barbara Plank
23
0
0
17 Jun 2025
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Beiduo Chen
Yang Liu
Anna Korhonen
Barbara Plank
LRM
71
0
0
29 May 2025
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Gabriele Sarti
Vilém Zouhar
Malvina Nissim
Arianna Bisazza
47
0
0
29 May 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Xiaoou Liu
Tiejin Chen
Longchao Da
Chacha Chen
Zhen Lin
Hua Wei
HILM
144
8
0
20 Mar 2025
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
Beiduo Chen
Siyao Peng
Anna Korhonen
Barbara Plank
139
2
0
18 Dec 2024
Do Large Language Models Have an English Accent? Evaluating and
  Improving the Naturalness of Multilingual LLMs
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs
Yanzhu Guo
Simone Conia
Zelin Zhou
Min Li
Saloni Potdar
Henry Xiao
83
3
0
21 Oct 2024
Large-scale cloze evaluation reveals that token prediction tasks are
  neither lexically nor semantically aligned
Large-scale cloze evaluation reveals that token prediction tasks are neither lexically nor semantically aligned
Cassandra L. Jacobs
Loïc Grobol
Alvin Tsang
51
0
0
15 Oct 2024
"I Never Said That": A dataset, taxonomy and baselines on response
  clarity classification
"I Never Said That": A dataset, taxonomy and baselines on response clarity classification
Konstantinos Thomas
Giorgos Filandrianos
Maria Lymperaiou
Chrysoula Zerva
Giorgos Stamou
58
0
0
20 Sep 2024
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
Nils Dycke
Matej Zečević
Ilia Kuznetsov
Beatrix Suess
Kristian Kersting
Iryna Gurevych
LRM
114
0
0
09 Sep 2024
Not (yet) the whole story: Evaluating Visual Storytelling Requires More
  than Measuring Coherence, Grounding, and Repetition
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
61
6
0
05 Jul 2024
Compare without Despair: Reliable Preference Evaluation with Generation
  Separability
Compare without Despair: Reliable Preference Evaluation with Generation Separability
Sayan Ghosh
Tejas Srinivasan
Swabha Swayamdipta
77
2
0
02 Jul 2024
The GPT-WritingPrompts Dataset: A Comparative Analysis of Character
  Portrayal in Short Stories
The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories
Xi Yu Huang
Krishnapriya Vishnubhotla
Frank Rudzicz
74
3
0
24 Jun 2024
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty
  in Words?
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
G. Yona
Roee Aharoni
Mor Geva
HILM
101
32
0
27 May 2024
Predictions from language models for multiple-choice tasks are not
  robust under variation of scoring methods
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods
Polina Tsvilodub
Hening Wang
Sharon Grosch
Michael Franke
82
9
0
01 Mar 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
286
22
0
28 Feb 2024
Predict the Next Word: Humans exhibit uncertainty in this task and
  language models _____
Predict the Next Word: Humans exhibit uncertainty in this task and language models _____
Evgenia Ilia
Wilker Aziz
67
2
0
27 Feb 2024
LLM Agents in Interaction: Measuring Personality Consistency and
  Linguistic Alignment in Interacting Populations of Large Language Models
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models
Ivar Frisch
Mario Giulianelli
74
11
0
05 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific
  Communication Goals?
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
85
12
0
18 Jan 2024
Benchmarking Large Language Model Volatility
Benchmarking Large Language Model Volatility
Boyang Yu
33
4
0
26 Nov 2023
Attribution and Alignment: Effects of Local Context Repetition on
  Utterance Production and Comprehension in Dialogue
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue
Aron Molnar
Jaap Jumelet
Mario Giulianelli
Arabella J. Sinclair
68
2
0
21 Nov 2023
The Curious Decline of Linguistic Diversity: Training Language Models on
  Synthetic Text
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
Yanzhu Guo
Guokan Shang
Michalis Vazirgiannis
Chloé Clavel
102
59
0
16 Nov 2023
How Far Can We Extract Diverse Perspectives from Large Language Models?
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
97
11
0
16 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of
  Large Language Models
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
56
8
0
08 Nov 2023
Transparency at the Source: Evaluating and Interpreting Language Models
  With Access to the True Distribution
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution
Jaap Jumelet
Willem H. Zuidema
84
6
0
23 Oct 2023
Information Value: Measuring Utterance Predictability as Distance from
  Plausible Alternatives
Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives
Mario Giulianelli
Sarenne Wallbridge
Raquel Fernández
54
15
0
20 Oct 2023
Uncertainty in Natural Language Generation: From Theory to Applications
Uncertainty in Natural Language Generation: From Theory to Applications
Joris Baan
Nico Daheim
Evgenia Ilia
Dennis Ulmer
Haau-Sing Li
Raquel Fernández
Barbara Plank
Rico Sennrich
Chrysoula Zerva
Wilker Aziz
UQLM
158
45
0
28 Jul 2023
Generating with Confidence: Uncertainty Quantification for Black-box
  Large Language Models
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Zhen Lin
Shubhendu Trivedi
Jimeng Sun
HILM
201
157
0
30 May 2023
1