ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.18359
  4. Cited By
Improving Model Factuality with Fine-grained Critique-based Evaluator
v1v2v3 (latest)

Improving Model Factuality with Fine-grained Critique-based Evaluator

24 October 2024
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
Quintin Fettes
Arya Talebzadeh
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
    HILM
ArXiv (abs)PDFHTML

Papers citing "Improving Model Factuality with Fine-grained Critique-based Evaluator"

49 / 49 papers shown
Title
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
157
3
0
26 Feb 2025
You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations
You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations
Frederic Kirstein
Muneeb Khan
Jan Philip Wahle
Terry Ruas
Bela Gipp
24
0
0
18 Feb 2025
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
HILMLRM
86
0
0
02 Jan 2025
The Superalignment of Superhuman Intelligence with Large Language Models
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
176
1
0
15 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELMAILaw
326
112
0
25 Nov 2024
Understanding Finetuning for Factual Knowledge Extraction
Understanding Finetuning for Factual Knowledge Extraction
Gaurav R. Ghosal
Tatsunori Hashimoto
Aditi Raghunathan
75
18
0
20 Jun 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
127
135
0
09 May 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang
Philippe Laban
Greg Durrett
HILMSyDa
86
103
0
16 Apr 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Katie Kang
Eric Wallace
Claire Tomlin
Aviral Kumar
Sergey Levine
HILMLRM
103
57
0
08 Mar 2024
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Wenda Xu
Guanglei Zhu
Xuandong Zhao
Liangming Pan
Lei Li
Wenjie Wang
98
63
0
18 Feb 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via
  Self-Evaluation
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Lifeng Jin
Linfeng Song
Haitao Mi
Helen Meng
HILM
82
52
0
14 Feb 2024
Fine-grained Hallucination Detection and Editing for Language Models
Fine-grained Hallucination Detection and Editing for Language Models
Abhika Mishra
Akari Asai
Vidhisha Balachandran
Yizhong Wang
Graham Neubig
Yulia Tsvetkov
Hannaneh Hajishirzi
HILM
103
87
0
12 Jan 2024
RAGTruth: A Hallucination Corpus for Developing Trustworthy
  Retrieval-Augmented Language Models
RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
Cheng Niu
Yuanhao Wu
Juno Zhu
Siliang Xu
Kashun Shum
Randy Zhong
Juntong Song
Tong Zhang
HILM
76
109
0
31 Dec 2023
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text
  Generation
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation
Yiqing Xie
Sheng Zhang
Hao Cheng
Pengfei Liu
Zelalem Gero
Cliff Wong
Tristan Naumann
Hoifung Poon
Carolyn Rose
MedIm
52
5
0
16 Nov 2023
Ever: Mitigating Hallucination in Large Language Models through
  Real-Time Verification and Rectification
Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification
Haoqiang Kang
Juntong Ni
Huaxiu Yao
HILMLRM
101
37
0
15 Nov 2023
Fine-tuning Language Models for Factuality
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELMHILMSyDa
85
185
0
14 Nov 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALMLM&MAELM
98
240
0
12 Oct 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation
  Tasks
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang
Yishan Li
Ge Zhang
Wenhao Huang
Bill Yuchen Lin
Wenhu Chen
ALM
80
69
0
01 Oct 2023
Shepherd: A Critic for Language Model Generation
Shepherd: A Critic for Language Model Generation
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ALM
84
87
0
08 Aug 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
  APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLLALMLLMAGELMLM&MA
192
709
0
31 Jul 2023
FacTool: Factuality Detection in Generative AI -- A Tool Augmented
  Framework for Multi-Task and Multi-Domain Scenarios
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Ethan Chern
Steffi Chern
Shiqi Chen
Weizhe Yuan
Kehua Feng
Chunting Zhou
Junxian He
Graham Neubig
Pengfei Liu
HILM
75
207
0
25 Jul 2023
ToolAlpaca: Generalized Tool Learning for Language Models with 3000
  Simulated Cases
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Qiaoyu Tang
Ziliang Deng
Hongyu Lin
Xianpei Han
Qiao Liang
Boxi Cao
Le Sun
CLLSyDa
128
202
0
08 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language
  Model
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELMHILM
130
583
0
06 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,169
0
29 May 2023
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Weijia Shi
Xiaochuang Han
M. Lewis
Yulia Tsvetkov
Luke Zettlemoyer
Scott Yih
HILM
75
213
0
24 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILMALM
153
703
0
23 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
  Critiquing
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELMLRM
140
396
0
19 May 2023
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLMLRMDiffM
199
1,682
0
30 Mar 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDaRALM
176
1,772
0
09 Feb 2023
Generating Sequences by Learning to Self-Correct
Generating Sequences by Learning to Self-Correct
Sean Welleck
Ximing Lu
Peter West
Faeze Brahman
T. Shen
Daniel Khashabi
Yejin Choi
LRM
100
237
0
31 Oct 2022
Large Language Models Can Self-Improve
Large Language Models Can Self-Improve
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
ReLMAI4MHLRM
209
616
0
20 Oct 2022
Self-critiquing models for assisting human evaluators
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALMELM
116
306
0
12 Jun 2022
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
Nouha Dziri
Ehsan Kamalloo
Sivan Milton
Osmar Zaiane
Mo Yu
Edoardo Ponti
Siva Reddy
HILM
142
91
0
22 Apr 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELMRALM
310
267
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
900
13,228
0
04 Mar 2022
Large Dual Encoders Are Generalizable Retrievers
Large Dual Encoders Are Generalizable Retrievers
Jianmo Ni
Chen Qu
Jing Lu
Zhuyun Dai
Gustavo Hernández Ábrego
...
Vincent Zhao
Yi Luan
Keith B. Hall
Ming-Wei Chang
Yinfei Yang
DML
172
463
0
15 Dec 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
151
1,944
0
08 Sep 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
298
83
0
30 Apr 2021
Understanding Factuality in Abstractive Summarization with FRANK: A
  Benchmark for Factuality Metrics
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
286
311
0
27 Apr 2021
$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues
  via Question Generation and Question Answering
Q2Q^{2}Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Or Honovich
Leshem Choshen
Roee Aharoni
Ella Neeman
Idan Szpektor
Omri Abend
HILM
91
142
0
16 Apr 2021
Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence
Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence
Tal Schuster
Adam Fisch
Regina Barzilay
104
238
0
15 Mar 2021
On Faithfulness and Factuality in Abstractive Summarization
On Faithfulness and Factuality in Abstractive Summarization
Joshua Maynez
Shashi Narayan
Bernd Bohnet
Ryan T. McDonald
HILM
93
1,041
0
02 May 2020
Asking and Answering Questions to Evaluate the Factual Consistency of
  Summaries
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
90
485
0
08 Apr 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
491
1,770
0
18 Sep 2019
Wizard of Wikipedia: Knowledge-Powered Conversational agents
Wizard of Wikipedia: Knowledge-Powered Conversational agents
Emily Dinan
Stephen Roller
Kurt Shuster
Angela Fan
Michael Auli
Jason Weston
RALMKELM
153
951
0
03 Nov 2018
A Dataset for Document Grounded Conversations
A Dataset for Document Grounded Conversations
Kangyan Zhou
Shrimai Prabhumoye
A. Black
84
241
0
19 Sep 2018
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional
  Neural Networks for Extreme Summarization
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
Shashi Narayan
Shay B. Cohen
Mirella Lapata
AILaw
161
1,686
0
27 Aug 2018
Get To The Point: Summarization with Pointer-Generator Networks
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
316
4,031
0
14 Apr 2017
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Payal Bajaj
Daniel Fernando Campos
Nick Craswell
Li Deng
Jianfeng Gao
...
Mir Rosenberg
Xia Song
Alina Stoica
Saurabh Tiwary
Tong Wang
RALM
186
2,748
0
28 Nov 2016
1