ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03439
  4. Cited By
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

7 April 2023
Hanmeng Liu
Ruoxi Ning
Zhiyang Teng
Jian Liu
Qiji Zhou
Yuexin Zhang
    ELM
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4"

50 / 161 papers shown
Title
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
Elahe Khatibi
Mahyar Abbasian
Zhongqi Yang
Iman Azimi
Amir M. Rahmani
67
12
0
02 May 2024
Can a Hallucinating Model help in Reducing Human "Hallucination"?
Can a Hallucinating Model help in Reducing Human "Hallucination"?
Sowmya S. Sundaram
Balaji Alwar
HILM
LRM
39
0
0
01 May 2024
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Leonardo Ranaldi
André Freitas
LRM
ReLM
42
8
0
01 May 2024
Exploring the Limits of Fine-grained LLM-based Physics Inference via
  Premise Removal Interventions
Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions
Jordan Meadows
Tamsin James
André Freitas
ReLM
LRM
AI4CE
41
1
0
29 Apr 2024
Enhancing Pre-Trained Generative Language Models with Question Attended
  Span Extraction on Machine Reading Comprehension
Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension
Lin Ai
Zheng Hui
Zizhou Liu
Julia Hirschberg
36
1
0
27 Apr 2024
Large Language Models as Test Case Generators: Performance Evaluation
  and Enhancement
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement
Ke-Shen Li
Yuan Yuan
LLMAG
30
12
0
20 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
55
6
0
12 Apr 2024
Accuracy of a Large Language Model in Distinguishing Anti- And
  Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus
  Vaccination
Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination
Soojong Kim
Kwanho Kim
Claire Wonjeong Jo
LM&MA
27
6
0
10 Apr 2024
Assisting humans in complex comparisons: automated information
  comparison at scale
Assisting humans in complex comparisons: automated information comparison at scale
Truman Yuen
Graham A. Watt
Y. Lawryshyn
41
0
0
05 Apr 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language
  Models -- A Survey
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELM
LRM
LM&MA
39
37
0
02 Apr 2024
A Theory for Length Generalization in Learning to Reason
A Theory for Length Generalization in Learning to Reason
Changnan Xiao
Bing Liu
LRM
47
9
0
31 Mar 2024
MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task
  Planning with Open-Source Large Language Model
MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model
Yike Wu
Jiatao Zhang
Nan Hu
LanLing Tang
Guilin Qi
Jun Shao
Jie Ren
Wei Song
62
10
0
27 Mar 2024
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
Shuai Ma
Qiaoyi Chen
Xinru Wang
Chengbo Zheng
Zhenhui Peng
Ming Yin
Xiaojuan Ma
ELM
36
20
0
25 Mar 2024
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and
  English
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English
H. M. Q. H. Sheikh Shafayat
Rishav Hada
Isaac Cowhey
Rifki Afina
Jerry Tworek
Lorie De Leon
35
3
0
16 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
36
296
0
12 Mar 2024
A Novel Nuanced Conversation Evaluation Framework for Large Language
  Models in Mental Health
A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health
Alexander Marrapese
Basem Suleiman
Imdad Ullah
Juno Kim
AI4MH
LM&MA
32
3
0
08 Mar 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
  Models for PowerPoint Task Completion
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Zekai Zhang
Yiduo Guo
Yaobo Liang
Dongyan Zhao
Nan Duan
46
3
0
06 Mar 2024
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint
  Erasing
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing
Zhiyuan Chang
Mingyang Li
Junjie Wang
Cheng Li
Qing Wang
22
0
0
05 Mar 2024
Predicting Learning Performance with Large Language Models: A Study in
  Adult Literacy
Predicting Learning Performance with Large Language Models: A Study in Adult Literacy
Liang Zhang
Jionghao Lin
Conrad Borchers
John Sabatini
John Hollander
Meng Cao
Xiangen Hu
44
8
0
04 Mar 2024
Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering
Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering
Armin Toroghi
Willis Guo
Mohammad Mahdi Torabi pour
Scott Sanner
LRM
31
8
0
03 Mar 2024
Navigating Hallucinations for Reasoning of Unintentional Activities
Navigating Hallucinations for Reasoning of Unintentional Activities
Shresth Grover
Vibhav Vineet
Yogesh S Rawat
LRM
52
1
0
29 Feb 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
58
2
0
27 Feb 2024
QASE Enhanced PLMs: Improved Control in Text Generation for MRC
QASE Enhanced PLMs: Improved Control in Text Generation for MRC
Lin Ai
Zheng Hui
Zizhou Liu
Julia Hirschberg
34
0
0
26 Feb 2024
Puzzle Solving using Reasoning of Large Language Models: A Survey
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
21
27
0
17 Feb 2024
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for
  Large Language Models
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models
Hai-Tao Zheng
Qingyu Zhou
Yuanzhen Luo
Shirong Ma
Yangning Li
Hai-Tao Zheng
Xuming Hu
Philip S. Yu
LRM
52
14
0
16 Feb 2024
Exploring the Potential of Large Language Models in Artistic Creation:
  Collaboration and Reflection on Creative Programming
Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming
Anqi Wang
Zhizhuo Yin
Yulu Hu
Yuanyuan Mao
Pan Hui
36
10
0
15 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in
  Closed-Source LLMs
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
35
159
0
06 Feb 2024
Can Large Language Models Learn Independent Causal Mechanisms?
Can Large Language Models Learn Independent Causal Mechanisms?
Gaël Gendron
Bao Trung Nguyen
A. Peng
Michael Witbrock
Gillian Dobbie
LRM
30
4
0
04 Feb 2024
Computational Experiments Meet Large Language Model Based Agents: A
  Survey and Perspective
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective
Qun Ma
Xiao Xue
Deyu Zhou
Xiangning Yu
Donghua Liu
...
Yifan Shen
Peilin Ji
Juanjuan Li
Gang Wang
Wanpeng Ma
AI4CE
LM&Ro
LLMAG
21
7
0
01 Feb 2024
Prospects for inconsistency detection using large language models and
  sheaves
Prospects for inconsistency detection using large language models and sheaves
Steve Huntsman
Michael Robinson
Ludmilla Huntsman
42
4
0
30 Jan 2024
Visualization Generation with Large Language Models: An Evaluation
Visualization Generation with Large Language Models: An Evaluation
Guozheng Li
Xinyu Wang
Gerile Aodeng
Shunyuan Zheng
Yu Zhang
Chuangxin Ou
Song Wang
Chi Harold Liu
31
28
0
20 Jan 2024
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
  Assurance
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Tinghui Ouyang
AprilPyone Maungmaung
Koichi Konishi
Yoshiki Seo
Isao Echizen
AI4MH
28
5
0
15 Jan 2024
Distortions in Judged Spatial Relations in Large Language Models
Distortions in Judged Spatial Relations in Large Language Models
N. Fulman
Abdulkadir Memduhoğlu
Alexander Zipf
25
9
0
08 Jan 2024
Evaluating AI Vocational Skills Through Professional Testing
Evaluating AI Vocational Skills Through Professional Testing
David Noever
Matt Ciolino
ELM
32
0
0
17 Dec 2023
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated
  Content
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content
Wentao Wang
Xuanyao Huang
Tianyang Wang
Swalpa Kumar Roy
EGVM
48
0
0
16 Dec 2023
Large Language Models are Complex Table Parsers
Large Language Models are Complex Table Parsers
Bowen Zhao
Changkai Ji
Yuejie Zhang
Wen He
Yingwen Wang
Qing Wang
Rui Feng
Xiaobo Zhang
LMTD
RALM
25
22
0
13 Dec 2023
How should the advent of large language models affect the practice of
  science?
How should the advent of large language models affect the practice of science?
Marcel Binz
Stephan Alaniz
Adina Roskies
B. Aczel
Carl T. Bergstrom
...
Emily M. Bender
M. Marelli
Matthew M. Botvinick
Zeynep Akata
Eric Schulz
39
9
0
05 Dec 2023
Conditions for Length Generalization in Learning Reasoning Skills
Conditions for Length Generalization in Learning Reasoning Skills
Changnan Xiao
Bing Liu
LRM
40
7
0
22 Nov 2023
Causal Structure Learning Supervised by Large Language Model
Causal Structure Learning Supervised by Large Language Model
Taiyu Ban
Lyuzhou Chen
Derui Lyu
Xiangyu Wang
Huanhuan Chen
74
12
0
20 Nov 2023
Complementary Advantages of ChatGPTs and Human Readers in Reasoning:
  Evidence from English Text Reading Comprehension
Complementary Advantages of ChatGPTs and Human Readers in Reasoning: Evidence from English Text Reading Comprehension
Tongquan Zhou
Yao Zhang
Siyi Cao
Yulu Li
Tao Wang
AI4MH
LRM
35
2
0
17 Nov 2023
Exploring the Potential of Large Language Models in Computational
  Argumentation
Exploring the Potential of Large Language Models in Computational Argumentation
Guizhen Chen
Liying Cheng
Anh Tuan Luu
Lidong Bing
LLMAG
LRM
29
23
0
15 Nov 2023
When does In-context Learning Fall Short and Why? A Study on
  Specification-Heavy Tasks
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Y. Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
34
28
0
15 Nov 2023
Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts
Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts
Leonardo Ranaldi
Giulia Pucci
Federico Ranaldi
Elena Sofia Ruzzetti
Fabio Massimo Zanzotto
LRM
32
12
0
14 Nov 2023
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM
  Game
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng
Yifan Yang
Jian Li
Yong Dai
Tianhao Hu
Peixin Cao
Nan Du
Xiaolong Li
28
28
0
14 Nov 2023
A Closer Look at the Self-Verification Abilities of Large Language
  Models in Logical Reasoning
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning
Ruixin Hong
Hongming Zhang
Xinyu Pang
Dong Yu
Changshui Zhang
LRM
52
24
0
14 Nov 2023
Language Models can be Logical Solvers
Language Models can be Logical Solvers
Jiazhan Feng
Ruochen Xu
Junheng Hao
Hiteshi Sharma
Yelong Shen
Dongyan Zhao
Weizhu Chen
ReLM
LRM
ELM
53
23
0
10 Nov 2023
Measuring Five Accountable Talk Moves to Improve Instruction at Scale
Measuring Five Accountable Talk Moves to Improve Instruction at Scale
Ashlee Kupor
Candice Morgan
Dorottya Demszky
10
7
0
02 Nov 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as
  Explainable Metrics
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
Juri Opitz
Daniel Deutsch
Yang Gao
Rotem Dror
Steffen Eger
ALM
LRM
ELM
40
31
0
30 Oct 2023
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection
  Method
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
Yukun Zhao
Lingyong Yan
Weiwei Sun
Guoliang Xing
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
Dawei Yin
33
37
0
27 Oct 2023
From Transcripts to Insights: Uncovering Corporate Risks Using Generative AI
From Transcripts to Insights: Uncovering Corporate Risks Using Generative AI
Alex G. Kim
Maximilian Muhn
Valeri V. Nikolaev
45
9
0
26 Oct 2023
Previous
1234
Next