ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.16877
  4. Cited By
Evaluating Correctness and Faithfulness of Instruction-Following Models
  for Question Answering

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

31 July 2023
Vaibhav Adlakha
Parishad BehnamGhader
Xing Han Lù
Nicholas Meade
Siva Reddy
ArXivPDFHTML

Papers citing "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"

30 / 30 papers shown
Title
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation
Chengwei Qin
Wenxuan Zhou
Karthik Abinav Sankararaman
Nanshu Wang
Tengyu Xu
...
Aditya Tayade
Sinong Wang
Chenyu You
Han Fang
Hao Ma
HILM
LRM
7
0
0
18 May 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
66
1
0
01 Apr 2025
Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
Rishabh Uapadhyay
Marco Viviani
69
0
0
07 Feb 2025
LLM as HPC Expert: Extending RAG Architecture for HPC Data
Yusuke Miyashita
Patrick Kin Man Tung
Johan Barthélemy
46
0
0
28 Jan 2025
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Melissa Kazemi Rad
Huy Nghiem
Andy Luo
Sahil Wadhwa
Mohammad Sorower
Stephen Rawls
AAML
93
2
0
22 Jan 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
82
19
0
17 Jan 2025
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context
Zhiyuan Chang
Mingyang Li
Xiaojun Jia
Junjie Wang
Yuekai Huang
Qing Wang
Yihao Huang
Yang Liu
186
0
0
17 Dec 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
46
3
0
14 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
47
8
0
09 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Chenyu You
HILM
114
16
0
30 Sep 2024
Enabling Real-Time Conversations with Minimal Training Costs
Enabling Real-Time Conversations with Minimal Training Costs
Wang Xu
Shuo Wang
Weilin Zhao
Xu Han
Yukun Yan
Yudi Zhang
Zhe Tao
Zhiyuan Liu
Wanxiang Che
34
4
0
18 Sep 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi
Ori Yoran
Jonathan Berant
Mor Geva
HILM
66
8
0
08 Jul 2024
Evaluating the Retrieval Component in LLM-Based Question Answering
  Systems
Evaluating the Retrieval Component in LLM-Based Question Answering Systems
Ashkan Alinejad
Krtin Kumar
Ali Vahdat
64
5
0
10 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial
  Actions across X Community Notes and Wikipedia edits
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip Torr
João F. Henriques
Jakob N. Foerster
35
1
0
05 Jun 2024
Large Language Models Meet NLP: A Survey
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Hai-Tao Zheng
Min Li
Wanxiang Che
Philip S. Yu
ALM
LM&MA
ELM
LRM
52
48
0
21 May 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
S. Feizi
MIALM
43
34
0
23 Feb 2024
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on
  Agriculture
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
M. A. D. L. Balaguer
Vinamra Benara
Renato Luiz de Freitas Cunha
Roberto de M. Estevao Filho
Todd Hendry
...
Morris Sharp
B. Silva
Swati Sharma
Vijay Aski
Ranveer Chandra
FaML
38
82
0
16 Jan 2024
LawBench: Benchmarking Legal Knowledge of Large Language Models
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
36
36
0
28 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
48
77
0
13 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
A. Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
48
522
0
03 Sep 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
574
0
03 May 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
165
579
0
06 Apr 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
369
12,003
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
215
1,661
0
15 Oct 2021
Hindsight: Posterior-guided training of retrievers for improved
  open-ended generation
Hindsight: Posterior-guided training of retrievers for improved open-ended generation
Ashwin Paranjape
Omar Khattab
Christopher Potts
Matei A. Zaharia
Christopher D. Manning
RALM
72
43
0
14 Oct 2021
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable
  Features
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin
David Reitter
Gaurav Singh Tomar
Dipanjan Das
167
101
0
14 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
195
79
0
30 Apr 2021
Reducing conversational agents' overconfidence through linguistic
  calibration
Reducing conversational agents' overconfidence through linguistic calibration
Sabrina J. Mielke
Arthur Szlam
Emily Dinan
Y-Lan Boureau
209
154
0
30 Dec 2020
Answering Open-Domain Questions of Varying Reasoning Steps from Text
Answering Open-Domain Questions of Varying Reasoning Steps from Text
Peng Qi
Haejun Lee
OghenetegiriTGSido
Christopher D. Manning
KELM
RALM
LRM
191
55
0
23 Oct 2020
1