ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.12870
  4. Cited By
Measuring Attribution in Natural Language Generation Models

Measuring Attribution in Natural Language Generation Models

23 December 2021
Hannah Rashkin
Vitaly Nikolaev
Matthew Lamm
Lora Aroyo
Michael Collins
Dipanjan Das
Slav Petrov
Gaurav Singh Tomar
Iulia Turc
David Reitter
ArXivPDFHTML

Papers citing "Measuring Attribution in Natural Language Generation Models"

39 / 139 papers shown
Title
PURR: Efficiently Editing Language Model Hallucinations by Denoising
  Language Model Corruptions
PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions
Anthony Chen
Panupong Pasupat
Sameer Singh
Hongrae Lee
Kelvin Guu
32
40
0
24 May 2023
Allies: Prompting Large Language Model with Beam Search
Allies: Prompting Large Language Model with Beam Search
Hao Sun
Xiao Liu
Yeyun Gong
Yan Zhang
Daxin Jiang
Linjun Yang
Nan Duan
RALM
28
5
0
24 May 2023
Enabling Large Language Models to Generate Text with Citations
Enabling Large Language Models to Generate Text with Citations
Tianyu Gao
Howard Yen
Jiatong Yu
Danqi Chen
LM&MA
HILM
29
311
0
24 May 2023
Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Benjamin Muller
John Wieting
J. Clark
Tom Kwiatkowski
Sebastian Ruder
Livio Baldini Soares
Roee Aharoni
Jonathan Herzig
Xinyi Wang
HILM
19
16
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
56
606
0
23 May 2023
LM vs LM: Detecting Factual Errors via Cross Examination
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
32
120
0
22 May 2023
"According to ...": Prompting Language Models Improves Quoting from
  Pre-Training Data
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
Orion Weller
Marc Marone
Nathaniel Weir
Dawn J Lawrie
Daniel Khashabi
Benjamin Van Durme
HILM
75
44
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
  Evaluation
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
26
38
0
22 May 2023
Pointwise Mutual Information Based Metric and Decoding Strategy for
  Faithful Generation in Document Grounded Dialogs
Pointwise Mutual Information Based Metric and Decoding Strategy for Faithful Generation in Document Grounded Dialogs
Yatin Nandwani
Vineet Kumar
Dinesh Raghu
Sachindra Joshi
Luis A. Lastras
27
6
0
20 May 2023
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
  Language Models
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
HILM
VLM
20
232
0
19 May 2023
QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set
  Operations
QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations
Chaitanya Malaviya
Peter Shaw
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
11
14
0
19 May 2023
Attributable and Scalable Opinion Summarization
Attributable and Scalable Opinion Summarization
Tom Hosking
Hao Tang
Mirella Lapata
30
8
0
19 May 2023
TrueTeacher: Learning Factual Consistency Evaluation with Large Language
  Models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILM
ELM
29
71
0
18 May 2023
Evaluating Open-Domain Question Answering in the Era of Large Language
  Models
Evaluating Open-Domain Question Answering in the Era of Large Language Models
Ehsan Kamalloo
Nouha Dziri
C. Clarke
Davood Rafiei
ELM
14
99
0
11 May 2023
Automatic Evaluation of Attribution by Large Language Models
Automatic Evaluation of Attribution by Large Language Models
Xiang Yue
Boshi Wang
Ziru Chen
Kai Zhang
Yu-Chuan Su
Huan Sun
ALM
LRM
HILM
33
54
0
10 May 2023
Evaluating Verifiability in Generative Search Engines
Evaluating Verifiability in Generative Search Engines
Nelson F. Liu
Tianyi Zhang
Percy Liang
HILM
31
233
0
19 Apr 2023
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation
Nico Daheim
Nouha Dziri
Mrinmaya Sachan
Iryna Gurevych
E. Ponti
MoMe
34
30
0
30 Mar 2023
WiCE: Real-World Entailment for Claims in Wikipedia
WiCE: Real-World Entailment for Claims in Wikipedia
Ryo Kamoi
Tanya Goyal
Juan Diego Rodriguez
Greg Durrett
38
80
0
02 Mar 2023
Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts
  health answer correctness
Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness
Guido Zuccon
Bevan Koopman
KELM
AI4MH
MedIm
12
41
0
23 Feb 2023
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
  Large Language Models
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models
Renat Aksitov
Chung-Ching Chang
David Reitter
Siamak Shakeri
Yun-hsuan Sung
RALM
19
16
0
11 Feb 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form
  Summarization
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
19
89
0
30 Jan 2023
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems
Souvik Das
Sougata Saha
R. Srihari
HILM
15
30
0
11 Jan 2023
mFACE: Multilingual Summarization with Factual Consistency Evaluation
mFACE: Multilingual Summarization with Factual Consistency Evaluation
Roee Aharoni
Shashi Narayan
Joshua Maynez
Jonathan Herzig
Elizabeth Clark
Mirella Lapata
HILM
27
43
0
20 Dec 2022
Statistical Dataset Evaluation: Reliability, Difficulty, and Validity
Statistical Dataset Evaluation: Reliability, Difficulty, and Validity
Chengwen Wang
Qingxiu Dong
Xiaochen Wang
Haitao Wang
Zhifang Sui
XAI
29
3
0
19 Dec 2022
Attributed Question Answering: Evaluation and Modeling for Attributed
  Large Language Models
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Bernd Bohnet
Vinh Q. Tran
Pat Verga
Roee Aharoni
D. Andor
...
Michael Collins
Dipanjan Das
Donald Metzler
Slav Petrov
Kellie Webster
43
59
0
15 Dec 2022
DisentQA: Disentangling Parametric and Contextual Knowledge with
  Counterfactual Question Answering
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
Ella Neeman
Roee Aharoni
Or Honovich
Leshem Choshen
Idan Szpektor
Omri Abend
KELM
CML
18
77
0
10 Nov 2022
TaTa: A Multilingual Table-to-Text Dataset for African Languages
TaTa: A Multilingual Table-to-Text Dataset for African Languages
Sebastian Gehrmann
Sebastian Ruder
Vitaly Nikolaev
Jan A. Botha
Michael Chavinda
Ankur P. Parikh
Clara E. Rivera
LMTD
24
10
0
31 Oct 2022
RARR: Researching and Revising What Language Models Say, Using Language
  Models
RARR: Researching and Revising What Language Models Say, Using Language Models
Luyu Gao
Zhuyun Dai
Panupong Pasupat
Anthony Chen
Arun Tejasvi Chaganty
...
Vincent Zhao
Ni Lao
Hongrae Lee
Da-Cheng Juan
Kelvin Guu
HILM
KELM
41
256
0
17 Oct 2022
FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
Sebastian Hofstatter
Jiecao Chen
K. Raman
Hamed Zamani
RALM
57
77
0
28 Sep 2022
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
Nouha Dziri
Ehsan Kamalloo
Sivan Milton
Osmar Zaiane
Mo Yu
E. Ponti
Siva Reddy
HILM
26
87
0
22 Apr 2022
On the Origin of Hallucinations in Conversational Models: Is it the
  Datasets or the Models?
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Nouha Dziri
Sivan Milton
Mo Yu
Osmar Zaiane
Siva Reddy
HILM
13
188
0
17 Apr 2022
TRUE: Re-evaluating Factual Consistency Evaluation
TRUE: Re-evaluating Factual Consistency Evaluation
Or Honovich
Roee Aharoni
Jonathan Herzig
Hagai Taitelbaum
Doron Kukliansy
Vered Cohen
Thomas Scialom
Idan Szpektor
Avinatan Hassidim
Yossi Matias
HILM
29
3
0
11 Apr 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
63
183
0
14 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
367
8,495
0
28 Jan 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
13
1,557
0
20 Jan 2022
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable
  Features
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin
David Reitter
Gaurav Singh Tomar
Dipanjan Das
167
101
0
14 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
192
79
0
30 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
254
285
0
02 Feb 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
282
2,015
0
28 Jul 2020
Previous
123