Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.07981
Cited By
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
15 December 2022
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
Ruilin Han
Simeng Han
Shafiq R. Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation"
50 / 118 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
42
1
0
09 May 2025
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents
Takyoung Kim
Janvijay Singh
Shuhaib Mehri
Emre Can Acikgoz
Sagnik Mukherjee
Nimet Beyza Bozdag
Sumuk Shashidhar
Gökhan Tür
Dilek Hakkani-Tür
LLMAG
27
0
0
02 May 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Xiuying Chen
Tairan Wang
Juexiao Zhou
Zirui Song
Xin Gao
X. Zhang
MedIm
42
1
0
24 Apr 2025
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa
Teruko Mitamura
RALM
34
0
0
17 Apr 2025
LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA
Xanh Ho
Jiahao Huang
Florian Boudin
Akiko Aizawa
ELM
36
0
0
16 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
44
0
0
10 Apr 2025
PreSumm: Predicting Summarization Performance Without Summarizing
Steven Koniaev
Ori Ernst
Jackie Chi Kit Cheung
31
0
0
07 Apr 2025
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
152
0
0
27 Mar 2025
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
Varsha Embar
Ritvik Shrivastava
Vinay Damodaran
Travis Mehlinger
Yu-Chung Hsiao
Karthik Raghunathan
34
0
0
24 Mar 2025
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis
Raúl Ortega
José Manuel Gómez-Pérez
63
0
0
24 Mar 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao
Xiang Zhang
Raymond Li
Chuyuan Li
Shafiq R. Joty
Giuseppe Carenini
59
1
0
27 Feb 2025
BRIDO: Bringing Democratic Order to Abstractive Summarization
Junhyun Lee
Harshith Goka
Hyeonmok Ko
HILM
49
0
0
25 Feb 2025
Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization
Lionel Richy Panlap Houamegni
Fatih Gedikli
39
0
0
24 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
57
3
0
21 Feb 2025
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa
Teruko Mitamura
94
1
0
10 Feb 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
49
5
0
28 Jan 2025
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Mohit Bansal
81
0
0
10 Dec 2024
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Lifu Tu
Rui Meng
Shafiq R. Joty
Yingbo Zhou
Semih Yavuz
HILM
67
0
0
24 Nov 2024
SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Shruti Singh
Nandan Sarkar
Arman Cohan
29
0
0
08 Nov 2024
On Positional Bias of Faithfulness for Long-form Summarization
David Wan
Jesse Vig
Mohit Bansal
Shafiq R. Joty
HILM
50
3
0
31 Oct 2024
Optimizing the role of human evaluation in LLM-based spoken document summarization systems
Margaret Kroll
Kelsey Kraus
19
2
0
23 Oct 2024
DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph
Maitreya Prafulla Chitale
Uday Bindal
Rajakrishnan Rajkumar
Rahul Mishra
24
0
0
18 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
67
1
0
17 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation
Yixin Liu
Kejian Shi
Alexander R. Fabbri
Yilun Zhao
Peifeng Wang
Chien-Sheng Wu
Shafiq Joty
Arman Cohan
22
6
0
09 Oct 2024
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
Théo Gigant
Camille Guinaudeau
Marc Decombas
Frédéric Dufaux
45
1
0
08 Oct 2024
Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization
Lei Xu
Mohammed Asad Karim
Saket Dingliwal
Aparna Elangovan
29
0
0
03 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
33
0
0
29 Sep 2024
NovAScore: A New Automated Metric for Evaluating Document Level Novelty
Lin Ai
Ziwei Gong
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Ahmad Emami
Julia Hirschberg
18
1
0
14 Sep 2024
When Context Leads but Parametric Memory Follows in Large Language Models
Yufei Tao
Adam Hiatt
Erik Haake
Antonie J. Jetter
Ameeta Agrawal
KELM
38
0
0
13 Sep 2024
Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy
Priyanka Mandikal
RALM
VLM
45
0
0
21 Aug 2024
Localizing and Mitigating Errors in Long-form Question Answering
Rachneet Sachdeva
Yixiao Song
Mohit Iyyer
Iryna Gurevych
HILM
44
1
0
16 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
48
41
0
01 Jul 2024
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
Anisha Gunjal
Greg Durrett
HILM
46
13
0
28 Jun 2024
Scalable and Domain-General Abstractive Proposition Segmentation
Mohammad Javad Hosseini
Yang Gao
Tim Baumgärtner
Alex Fabrikant
Reinald Kim Amplayo
33
0
0
28 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
34
8
0
26 Jun 2024
PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection
Jooyoung Lee
Toshini Agrawal
Adaku Uchendu
Thai V. Le
Jinghui Chen
Dongwon Lee
31
1
0
24 Jun 2024
Verifiable Generation with Subsentence-Level Fine-Grained Citations
Shuyang Cao
Lu Wang
31
6
0
10 Jun 2024
Flexible and Adaptable Summarization via Expertise Separation
Xiuying Chen
Mingzhe Li
Shen Gao
Xin Cheng
Qingqing Zhu
Rui Yan
Xin Gao
Xiangliang Zhang
MoE
36
3
0
08 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Pengyuan Lyu
Yulin Li
Hao Zhou
Weihong Ma
Xingyu Wan
...
Liang Wu
Chengquan Zhang
Kun Yao
Errui Ding
Jingdong Wang
36
7
0
31 May 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li
Xilun Chen
Ari Holtzman
Beidi Chen
Jimmy Lin
Wen-tau Yih
Xi Victoria Lin
RALM
BDL
108
10
0
29 May 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
22
9
0
28 May 2024
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Minbyul Jeong
Hyeon Hwang
Chanwoong Yoon
Taewhoo Lee
Jaewoo Kang
MedIm
HILM
LM&MA
38
12
0
21 May 2024
Large Language Models are Inconsistent and Biased Evaluators
Rickard Stureborg
Dimitris Alikaniotis
Yoshi Suhara
ALM
37
50
0
02 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
36
14
0
02 May 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang
Seunghyun Yoon
Byeongjeong Kim
Hwanhee Lee
HILM
26
3
0
17 Apr 2024
Auctions with LLM Summaries
Kumar Avinava Dubey
Zhe Feng
Rahul Kidambi
Aranyak Mehta
Di Wang
30
10
0
11 Apr 2024
On the Role of Summary Content Units in Text Summarization Evaluation
Marcel Nawrath
Agnieszka Nowak
Tristan Ratz
Danilo C. Walenta
Juri Opitz
...
Sebastian Gehrmann
Saad Mahamood
Miruna Clinciu
Khyathi Raghavi Chandu
Yufang Hou
ELM
21
5
0
02 Apr 2024
Towards a Robust Retrieval-Based Summarization System
Shengjie Liu
Jing Wu
Jingyuan Bao
Wenyi Wang
N. Hovakimyan
Christopher G. Healey
RALM
25
9
0
29 Mar 2024
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
Yukyung Lee
Joonghoon Kim
Jaehee Kim
Hyowon Cho
Pilsung Kang
Pilsung Kang
Najoung Kim
ELM
47
4
0
27 Mar 2024
SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation
Dongqi Pu
Yifan Wang
Jia E. Loy
Vera Demberg
29
6
0
26 Mar 2024
1
2
3
Next