ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14540
  4. Cited By
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

23 May 2023
Philippe Laban
Wojciech Kry'sciñski
Divyansh Agarwal
Alexander R. Fabbri
Caiming Xiong
Shafiq R. Joty
Chien-Sheng Wu
    ALM
    HILM
ArXivPDFHTML

Papers citing "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"

29 / 29 papers shown
Title
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
56
7
0
25 Oct 2024
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Akira Kawabata
Saku Sugawara
LRM
31
3
0
07 Oct 2024
SQLucid: Grounding Natural Language Database Queries with Interactive
  Explanations
SQLucid: Grounding Natural Language Database Queries with Interactive Explanations
Yuan Tian
Jonathan K. Kummerfeld
Toby Jia-Jun Li
Tianyi Zhang
AAML
29
2
0
10 Sep 2024
LoraMap: Harnessing the Power of LoRA Connections
LoraMap: Harnessing the Power of LoRA Connections
Hyeryun Park
Jeongwon Kwak
Dongsuk Jang
Sumin Park
Jinwook Choi
MoMe
28
0
0
29 Aug 2024
Integrating Large Language Models and Knowledge Graphs for Extraction
  and Validation of Textual Test Data
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data
Zili Wang
Marco Balduini
Federico De Santis
Andrea Proia
Arsenio Leo
Marco Brambilla
Shiming Xiang
24
2
0
03 Aug 2024
Automatic Generation of Model and Data Cards: A Step Towards Responsible
  AI
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
Jiarui Liu
Wenkai Li
Zhijing Jin
Mona T. Diab
SyDa
55
3
0
10 May 2024
Less is More for Improving Automatic Evaluation of Factual Consistency
Less is More for Improving Automatic Evaluation of Factual Consistency
Tong Wang
Ninad Kulkarni
Yanjun Qi
ALM
41
2
0
09 Apr 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible
  Practices
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
67
24
0
19 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable
  Hallucination Benchmark for Large Language Models
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
36
1
0
08 Mar 2024
How Far Are We from Intelligent Visual Deductive Reasoning?
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang
Richard He Bai
Ruixiang Zhang
Jiatao Gu
Shuangfei Zhai
J. Susskind
Navdeep Jaitly
ReLM
LRM
44
13
0
07 Mar 2024
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations
  from Large Language Models
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
Chirag Agarwal
Sree Harsha Tanneru
Himabindu Lakkaraju
LRM
37
35
0
07 Feb 2024
Integration of cognitive tasks into artificial general intelligence test
  for large models
Integration of cognitive tasks into artificial general intelligence test for large models
Youzhi Qu
Chen Wei
Penghui Du
Wenxin Che
Chi Zhang
...
Bin Hu
Kai Du
Haiyan Wu
Jia Liu
Quanying Liu
ELM
34
7
0
04 Feb 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
57
56
0
11 Jan 2024
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq R. Joty
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven
  Negative Samples Generation
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
28
6
0
16 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language
  Models for Instruction Controllable Summarization
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq R. Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
46
57
0
15 Nov 2023
Are You Sure? Challenging LLMs Leads to Performance Drops in The
  FlipFlop Experiment
Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Philippe Laban
Lidiya Murakhovs'ka
Caiming Xiong
Chien-Sheng Wu
LRM
26
19
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
39
722
0
09 Nov 2023
Are Large Language Models Reliable Judges? A Study on the Factuality
  Evaluation Capabilities of LLMs
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs
Xue-Yong Fu
Md Tahmid Rahman Laskar
Cheng-Hsiung Chen
TN ShashiBhushan
HILM
ELM
68
18
0
01 Nov 2023
Salespeople vs SalesBot: Exploring the Role of Educational Value in
  Conversational Recommender Systems
Salespeople vs SalesBot: Exploring the Role of Educational Value in Conversational Recommender Systems
Lidiya Murakhovs'ka
Philippe Laban
Tian Xie
Caiming Xiong
Chien-Sheng Wu
25
6
0
26 Oct 2023
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question
  Games
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
Yizhe Zhang
Jiarui Lu
Navdeep Jaitly
LRM
ELM
16
9
0
02 Oct 2023
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs
Philippe Laban
Jesse Vig
Marti A. Hearst
Caiming Xiong
Chien-Sheng Wu
KELM
34
27
0
27 Sep 2023
Art or Artifice? Large Language Models and the False Promise of
  Creativity
Art or Artifice? Large Language Models and the False Promise of Creativity
Tuhin Chakrabarty
Philippe Laban
Divyansh Agarwal
Smaranda Muresan
Chien-Sheng Wu
21
116
0
25 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
37
66
0
21 Sep 2023
FaNS: a Facet-based Narrative Similarity Metric
FaNS: a Facet-based Narrative Similarity Metric
Mousumi Akter
Shubhra (Santu) Karmaker
17
1
0
09 Sep 2023
ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long
  Earnings Call Transcripts
ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts
Rajdeep Mukherjee
Abhinav Bohra
Akash Banerjee
Soumya Sharma
Manjunath Hegde
...
Shivani Shrivastava
Koustuv Dasgupta
Niloy Ganguly
Saptarshi Ghosh
Pawan Goyal
RALM
41
44
0
22 Oct 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
33
5
0
13 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
361
8,495
0
28 Jan 2022
Understanding Factuality in Abstractive Summarization with FRANK: A
  Benchmark for Factuality Metrics
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
228
305
0
27 Apr 2021
1