Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.03439
Cited By
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
7 April 2023
Hanmeng Liu
Ruoxi Ning
Zhiyang Teng
Jian Liu
Qiji Zhou
Yuexin Zhang
ELM
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4"
50 / 161 papers shown
Title
Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge
Luyu Chen
Zeyu Zhang
Haoran Tan
Quanyu Dai
Hao-ran Yang
Zhenhua Dong
Xu Chen
7
0
0
18 May 2025
Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment
Paul Tschisgale
Holger Maus
Fabian Kieser
Ben Kroehs
Stefan Petersen
Peter Wulff
ELM
LRM
41
0
0
14 May 2025
Knowledge Acquisition on Mass-shooting Events via LLMs for AI-Driven Justice
Benign John Ihugba
Afsana Nasrin
Ling Wu
Lin Li
Lijun Qian
Xishuang Dong
38
0
0
17 Apr 2025
Rethinking Reflection in Pre-Training
Essential AI
Darsh J Shah
Peter Rushton
Somanshu Singla
Mohit Parmar
...
Philip Monk
Platon Mazarakis
Ritvik Kapila
Saurabh Srivastava
Tim Romanski
ReLM
LRM
49
4
0
05 Apr 2025
SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad Hoc
Daniel Guzman-Olivares
Lara Quijano-Sanchez
Federico Liberatore
41
0
0
07 Mar 2025
Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation
Qianxi He
Qianyu He
Jiaqing Liang
Yanghua Xiao
Weikang Zhou
Zeye Sun
Fei Yu
LRM
74
0
0
27 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Jing Yang
Max Glockner
Anderson de Rezende Rocha
Iryna Gurevych
LRM
73
1
0
07 Feb 2025
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
Cheryl Li
Tianyuan Xu
Yiwen Guo
LRM
229
2
0
05 Feb 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
52
6
0
28 Jan 2025
WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge
Jingyuan Chen
Tao Wu
Wei Ji
Fei Wu
46
0
0
22 Jan 2025
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
Qiming Bao
Gaël Gendron
A. Peng
Wanjun Zhong
N. Tan
Yang Chen
Michael Witbrock
Qingbin Liu
LRM
ELM
73
2
0
20 Jan 2025
When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie
VLM
45
0
0
26 Dec 2024
Chumor 2.0: Towards Benchmarking Chinese Humor Understanding
Ruiqi He
Yushu He
Longju Bai
Jiarui Liu
Zhenjie Sun
Zenghao Tang
He Wang
Hanchen Xia
Rada Mihalcea
Naihao Deng
83
1
0
23 Dec 2024
Explainable CTR Prediction via LLM Reasoning
Xiaohan Yu
Li Zhang
C. L. Philip Chen
OffRL
LRM
69
1
0
03 Dec 2024
MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models
Angus Fung
A. H. Tan
Haitong Wang
B. Benhabib
G. Nejat
LM&Ro
126
1
0
27 Nov 2024
Bitcoin Research with a Transaction Graph Dataset
Hugo Schnoering
Michalis Vazirgiannis
GNN
40
0
0
15 Nov 2024
Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach
Qingchuan Li
Jiatong Li
Tongxuan Liu
Yuting Zeng
Mingyue Cheng
Weizhe Huang
Qi Liu
LRM
AI4CE
57
2
0
29 Oct 2024
A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs
Ryosuke Sonoda
Ramya Srinivasan
61
1
0
22 Oct 2024
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA
Maharshi Gor
Hal Daumé III
Dinesh Manocha
Jordan Boyd-Graber
ELM
AI4MH
LRM
28
1
0
09 Oct 2024
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models
Weijia Xu
Nebojsa Jojic
Nicolas Le Roux
21
0
0
07 Oct 2024
Adaptive Question Answering: Enhancing Language Model Proficiency for Addressing Knowledge Conflicts with Source Citations
Sagi Shaier
Ari Kobren
Philip Ogren
HILM
37
6
0
05 Oct 2024
Structured List-Grounded Question Answering
Mujeen Sung
Song Feng
James Gung
Raphael Shu
Yi Zhang
Saab Mansour
RALM
32
0
0
04 Oct 2024
CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs
Kangsheng Wang
Xiao Zhang
Hao Liu
Songde Han
Huimin Ma
Tianyu Hu
LRM
59
5
0
02 Oct 2024
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks
Xingxuan Li
Weiwen Xu
Ruochen Zhao
Fangkai Jiao
Chenyu You
Lidong Bing
LRM
69
8
0
02 Oct 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Tongxuan Liu
Wenjiang Xu
Weizhe Huang
Yuting Zeng
Jiaxing Wang
Hailong Yang
Hailong Yang
Jing Li
LRM
ReLM
52
6
0
26 Sep 2024
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Hung-Ting Su
Ya-Ching Hsu
Xudong Lin
Xiang Qian Shi
Yulei Niu
Han-Yuan Hsu
Hung-yi Lee
Winston H. Hsu
LRM
36
0
0
22 Sep 2024
GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion
Tongxuan Liu
Xingyu Wang
Weizhe Huang
Wenjiang Xu
Yuting Zeng
Lei Jiang
Hailong Yang
Jing Li
LLMAG
44
8
0
21 Sep 2024
CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Causal Significance and Consistency
Kangsheng Wang
Xiao Zhang
Zizheng Guo
Tianyu Hu
Huimin Ma
LRM
48
7
0
20 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
Hao Fei
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
52
5
0
19 Sep 2024
Linguini: A benchmark for language-agnostic linguistic reasoning
Eduardo Sánchez
Belen Alastruey
C. Ropers
Pontus Stenetorp
Mikel Artetxe
Marta R. Costa-jussá
ReLM
ELM
LRM
42
6
0
18 Sep 2024
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
Jonathan Ivey
Shivani Kumar
Jiayu Liu
Hua Shen
Sushrita Rakshit
...
Dustin Wright
Abraham Israeli
Anders Giovanni Møller
Lechen Zhang
David Jurgens
47
3
0
12 Sep 2024
Learning to Ask: When LLM Agents Meet Unclear Instruction
Wenxuan Wang
Juluan Shi
Chaozheng Wang
Cheryl Lee
Chaozheng Wang
Cheryl Lee
Youliang Yuan
Jen-tse Huang
Wenxiang Jiao
Michael R. Lyu
LLMAG
36
8
0
31 Aug 2024
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen
Yi Liu
Donghai Hong
Jiaying Chen
Wenhai Wang
44
2
0
18 Aug 2024
How Well Do LLMs Identify Cultural Unity in Diversity?
Jialin Li
Junli Wang
Junjie Hu
Ming Jiang
43
4
0
09 Aug 2024
Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models
Nunzio Lorè
Alireza Ilami
Babak Heydari
LRM
43
1
0
05 Aug 2024
Strong and weak alignment of large language models with human values
Mehdi Khamassi
Marceau Nahon
Raja Chatila
ALM
45
10
0
05 Aug 2024
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Yuyan Chen
Qiang Fu
Yichen Yuan
Zhihao Wen
Ge Fan
Dayiheng Liu
Dongmei Zhang
Zhixu Li
Yanghua Xiao
HILM
52
70
0
04 Jul 2024
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He
Yongliang Shen
Wenqi Zhang
Zeqi Tan
Weiming Lu
LRM
40
5
0
29 Jun 2024
It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension
Sagi Shaier
Lawrence E Hunter
K. Wense
44
3
0
24 Jun 2024
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Andrea Posada
Daniel Rueckert
Felix Meissen
Philip Muller
LM&MA
ELM
37
0
0
24 Jun 2024
Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba
Ruiqi He
Yushu He
Longju Bai
Jiarui Liu
Zhenjie Sun
Zenghao Tang
He Wang
Hanchen Xia
Naihao Deng
38
0
0
18 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
57
11
0
12 Jun 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang
Chujie Zhao
Guanyu Chen
Yizhou Jiang
Feng Chen
OOD
MLT
OODD
77
3
0
05 Jun 2024
A Survey of Useful LLM Evaluation
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
LLMAG
ELM
34
12
0
03 Jun 2024
PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering
Fangzhi Xu
Qika Lin
Tianzhe Zhao
Jiawei Han
Jun Liu
LRM
38
1
0
29 May 2024
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRM
BDL
52
5
0
26 May 2024
Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection
Han Zhang
A. B. Sediq
Ali Afana
Melike Erol-Kantarci
44
7
0
17 May 2024
Large Language Models for Education: A Survey
Hanyi Xu
Wensheng Gan
Zhenlian Qi
Jiayang Wu
Philip S. Yu
AI4Ed
ELM
62
15
0
12 May 2024
Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models
Yitian Li
Jidong Tian
Hao He
Yaohui Jin
LRM
ReLM
34
0
0
09 May 2024
CRCL at SemEval-2024 Task 2: Simple prompt optimizations
Clément Brutti-Mairesse
L. Verlingue
46
2
0
03 May 2024
1
2
3
4
Next