Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.07852
Cited By
ExpertQA: Expert-Curated Questions and Attributed Answers
14 September 2023
Chaitanya Malaviya
Subin Lee
Sihao Chen
Elizabeth Sieber
Mark Yatskar
Dan Roth
ELM
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ExpertQA: Expert-Curated Questions and Attributed Answers"
50 / 51 papers shown
Title
EnronQA: Towards Personalized RAG over Private Documents
Michael J. Ryan
Danmei Xu
Chris Nivera
Daniel Campos
SILM
67
0
0
01 May 2025
EvalAgent: Discovering Implicit Evaluation Criteria from the Web
Manya Wadhwa
Zayne Sprague
Chaitanya Malaviya
Philippe Laban
Junyi Jessy Li
Greg Durrett
34
0
0
21 Apr 2025
Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs
Yingjian Chen
Feiyang Li
Xingyu Song
Tianxiao Li
Zixin Xu
Xiujie Chen
Issey Sukeda
Irene Z Li
28
0
0
15 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
156
0
0
28 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations
Chaitanya Malaviya
Joseph Chee Chang
Dan Roth
Mohit Iyyer
Mark Yatskar
Kyle Lo
ELM
45
4
0
11 Nov 2024
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
Farima Fatahi Bayat
Lechen Zhang
Sheza Munir
Lu Wang
HILM
52
3
0
29 Oct 2024
Enhancing Answer Attribution for Faithful Text Generation with Large Language Models
Juraj Vladika
Luca Mülln
Florian Matthes
30
0
0
22 Oct 2024
Neurosymbolic AI approach to Attribution in Large Language Models
Deepa Tilwani
R. Venkataramanan
Amit P. Sheth
40
1
0
30 Sep 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Haoran Que
Feiyu Duan
Liqun He
Yutao Mou
Wangchunshu Zhou
...
Ge Zhang
Junran Peng
Zhaoxiang Zhang
Songyang Zhang
Kai Chen
LM&MA
ELM
VLM
51
11
0
24 Sep 2024
Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology
Aidan Gilson
Xuguang Ai
Thilaka Arunachalam
Ziyou Chen
Ki Xiong Cheong
...
Zhiyong Lu
Hua Xu
Ron A. Adelman
Yih-Chung Tham
Qingyu Chen
RALM
39
1
0
20 Sep 2024
Claim Verification in the Age of Large Language Models: A Survey
A. Dmonte
Roland Oruche
Marcos Zampieri
Prasad Calyam
Isabelle Augenstein
49
8
0
26 Aug 2024
Zero-shot Factual Consistency Evaluation Across Domains
Raunak Agarwal
HILM
47
0
0
07 Aug 2024
CiteME: Can Language Models Accurately Cite Scientific Claims?
Ori Press
Andreas Hochlehnert
Ameya Prabhu
Vishaal Udandarao
Ofir Press
Matthias Bethge
47
13
0
10 Jul 2024
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Ronak Pradeep
Nandan Thakur
Sahel Sharifymoghaddam
Eric Zhang
Ryan Nguyen
Daniel Campos
Nick Craswell
Jimmy Lin
38
12
0
24 Jun 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li
Xilun Chen
Ari Holtzman
Beidi Chen
Jimmy Lin
Wen-tau Yih
Xi Lin
RALM
BDL
108
10
0
29 May 2024
Generative AI Search Engines as Arbiters of Public Knowledge: An Audit of Bias and Authority
Alice Li
Luanne Sinnamon
31
3
0
22 May 2024
DOLOMITES: Domain-Specific Long-Form Methodical Tasks
Chaitanya Malaviya
Priyanka Agrawal
Kuzman Ganchev
Pranesh Srinivasan
Fantine Huot
Jonathan Berant
Mark Yatskar
Dipanjan Das
Mirella Lapata
Chris Alberti
40
6
0
09 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
38
14
0
02 May 2024
Assessing The Potential Of Mid-Sized Language Models For Clinical QA
Elliot Bolton
Betty Xiong
Vijaytha Muralidharan
J. Schamroth
Vivek Muralidharan
Christopher D. Manning
R. Daneshjou
AI4MH
ELM
LM&MA
29
4
0
24 Apr 2024
RAR-b: Reasoning as Retrieval Benchmark
Chenghao Xiao
G. Thomas
Al Moubayed
LRM
RALM
36
8
0
09 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
93
9
0
05 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
44
20
0
04 Apr 2024
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems
Sara Rosenthal
Avirup Sil
Radu Florian
Salim Roukos
48
11
0
02 Apr 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
49
53
0
05 Mar 2024
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
Xinran Zhao
Hongming Zhang
Xiaoman Pan
Wenlin Yao
Dong Yu
Tongshuang Wu
Jianshu Chen
HILM
LRM
29
4
0
27 Feb 2024
AttributionBench: How Hard is Automatic Attribution Evaluation?
Yifei Li
Xiang Yue
Zeyi Liao
Huan Sun
HILM
32
13
0
23 Feb 2024
Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers
Milos Kosprdic
Adela Ljajić
Bojana Bašaragin
Darija Medvecki
Nikola Milosevic
11
3
0
09 Feb 2024
How well do LLMs cite relevant medical references? An evaluation framework and analyses
Kevin Wu
Eric Wu
Ally Cassasola
Angela Zhang
Kevin Wei
Teresa Nguyen
Sith Riantawan
Patricia Shi Riantawan
Daniel E. Ho
James Zou
LM&MA
ELM
AI4MH
31
26
0
03 Feb 2024
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs
Nan Hu
Jiaoyan Chen
Yike Wu
Guilin Qi
Sheng Bi
Tongtong Wu
Jeff Z. Pan
HILM
37
8
0
26 Jan 2024
Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
Rui Yang
Qingcheng Zeng
Keen You
Yujie Qiao
Lucas Huang
...
Dragomir R. Radev
Zhiyong Lu
Hua Xu
Qingyu Chen
Irene Z Li
ELM
LM&MA
28
3
0
28 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
31
471
0
20 Nov 2023
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
22
6
0
13 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
44
732
0
09 Nov 2023
Integrating UMLS Knowledge into Large Language Models for Medical Question Answering
Rui Yang
Edison Marrese-Taylor
Yuhe Ke
Lechao Cheng
Qingyu Chen
Irene Z Li
ELM
AI4MH
LM&MA
17
15
0
04 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
34
21
0
28 Sep 2023
HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking
Juraj Vladika
Phillip Schneider
Florian Matthes
18
10
0
15 Sep 2023
Generating Benchmarks for Factuality Evaluation of Language Models
Dor Muhlgay
Ori Ram
Inbal Magar
Yoav Levine
Nir Ratner
Yonatan Belinkov
Omri Abend
Kevin Leyton-Brown
Amnon Shashua
Y. Shoham
HILM
27
91
0
13 Jul 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
59
606
0
23 May 2023
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
Orion Weller
Marc Marone
Nathaniel Weir
Dawn J Lawrie
Daniel Khashabi
Benjamin Van Durme
HILM
78
44
0
22 May 2023
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
Debadutta Dash
Rahul Thapa
Juan M. Banda
Akshay Swaminathan
Morgan Cheatham
...
Garret K. Morris
H. Magon
M. Lungren
Eric Horvitz
N. Shah
ELM
LM&MA
AI4MH
68
51
0
26 Apr 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
218
301
0
26 Apr 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
152
396
0
15 Mar 2023
Rethinking with Retrieval: Faithful Large Language Model Inference
Hangfeng He
Hongming Zhang
Dan Roth
KELM
LRM
141
158
0
31 Dec 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
246
259
0
21 Mar 2022
Truthful AI: Developing and governing AI that does not lie
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
236
110
0
13 Oct 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
195
79
0
30 Apr 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
231
305
0
27 Apr 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
228
144
0
18 Apr 2021
1
2
Next