Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.12023
Cited By
v1
v2
v3 (latest)
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
21 October 2022
Alessandro Stolfo
Zhijing Jin
Kumar Shridhar
Bernhard Schölkopf
Mrinmaya Sachan
ELM
OOD
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models"
50 / 50 papers shown
Title
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang
Le Wu
Kui Yu
Guangyi Lv
Dacao Zhang
AAML
ELM
21
0
0
08 Jun 2025
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
Xiaoyan Zhao
Juntao You
Yang Zhang
Wenjie Wang
Hong Cheng
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
58
2
0
03 Jun 2025
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Siddhant Bhambri
Upasana Biswas
Subbarao Kambhampati
137
1
0
20 May 2025
PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation
Zhengwei Tao
Zhi Jin
Bincheng Li
Xiaoying Bai
Haiyan Zhao
Chengfeng Dou
Xiancai Chen
Jia Li
Linyu Li
Chongyang Tao
AI4TS
73
1
0
02 Apr 2025
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Xinghao Chen
Zhijing Sun
Wenjin Guo
Miaoran Zhang
Yanjun Chen
...
Hui Su
Yijie Pan
Dietrich Klakow
Wenjie Li
Xiaoyu Shen
LRM
182
8
0
25 Feb 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Andreas Opedal
Haruki Shirakami
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
140
3
0
17 Feb 2025
CodeSCM: Causal Analysis for Multi-Modal Code Generation
Mukur Gupta
Noopur Bhatt
Suman Jana
117
1
0
07 Feb 2025
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
139
1
0
04 Dec 2024
COLD: Causal reasOning in cLosed Daily activities
Abhinav Joshi
A. Ahmad
Ashutosh Modi
LRM
ReLM
125
3
0
29 Nov 2024
Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
108
8
0
06 Nov 2024
Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Zehao Wang
Minye Wu
Yixin Cao
Yubo Ma
Meiqi Chen
Tinne Tuytelaars
66
2
0
25 Sep 2024
Models Can and Should Embrace the Communicative Nature of Human-Generated Math
Sasha Boguraev
Ben Lipkin
Leonie Weissweiler
Kyle Mahowald
106
1
0
25 Sep 2024
Causal Inference with Large Language Model: A Survey
Jing Ma
CML
LRM
253
9
0
15 Sep 2024
Probing Causality Manipulation of Large Language Models
Chenyang Zhang
Haibo Tong
Bin Zhang
Dongyu Zhang
LRM
68
0
0
26 Aug 2024
How Well Do LLMs Identify Cultural Unity in Diversity?
Jialin Li
Junli Wang
Junjie Hu
Ming Jiang
76
4
0
09 Aug 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
110
31
0
11 Jul 2024
A Survey of Useful LLM Evaluation
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
LLMAG
ELM
84
15
0
03 Jun 2024
D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models
Duygu Altinok
AI4MH
LRM
LM&MA
ELM
41
2
0
07 May 2024
FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering
Wei Zhou
Mohsen Mesgar
Heike Adel
Annemarie Friedrich
LMTD
85
9
0
29 Apr 2024
Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions
Jordan Meadows
Tamsin James
André Freitas
ReLM
LRM
AI4CE
70
1
0
29 Apr 2024
Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs
Zhenlan Ji
Daoyuan Wu
Pingchuan Ma
Zongjie Li
Shuai Wang
LLMAG
84
6
0
27 Apr 2024
How often are errors in natural language reasoning due to paraphrastic variability?
Neha Srikanth
Marine Carpuat
Rachel Rudinger
LRM
72
2
0
17 Apr 2024
SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials
Mael Jullien
Marco Valentino
André Freitas
LM&MA
74
44
0
07 Apr 2024
Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models
Julia Rozanova
Marco Valentino
André Freitas
CML
68
1
0
03 Apr 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELM
LRM
LM&MA
167
52
0
02 Apr 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen
Yixin Cao
Yan Zhang
Chaochao Lu
107
16
0
27 Mar 2024
Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment
Congzhi Zhang
Linhai Zhang
Jialong Wu
Deyu Zhou
Guoqiang Xu
CML
AI4CE
LRM
107
21
0
05 Mar 2024
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
Zhaorun Chen
Zhuokai Zhao
Zhihong Zhu
Ruiqi Zhang
Xiang Li
Bhiksha Raj
Huaxiu Yao
LRM
100
29
0
18 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
143
146
0
31 Jan 2024
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
Andreas Opedal
Alessandro Stolfo
Haruki Shirakami
Ying Jiao
Ryan Cotterell
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
121
16
0
31 Jan 2024
CLadder: Assessing Causal Reasoning in Language Models
Zhijing Jin
Yuen Chen
Felix Leeb
Luigi Gresele
Ojasv Kamal
...
Kevin Blin
Fernando Gonzalez Adauto
Max Kleiman-Weiner
Mrinmaya Sachan
Bernhard Schölkopf
ReLM
ELM
LRM
112
79
0
07 Dec 2023
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models
Youssef Benchekroun
Megi Dervishi
Mark Ibrahim
Jean-Baptiste Gaya
Xavier Martinet
Grégoire Mialon
Thomas Scialom
Emmanuel Dupoux
Dieuwke Hupkes
Pascal Vincent
LRM
63
8
0
27 Nov 2023
Multimodal Multi-Hop Question Answering Through a Conversation Between Tools and Efficiently Finetuned Large Language Models
Hossein Rajabzadeh
Suyuchen Wang
Hyock Ju Kwon
Bang Liu
KELM
53
3
0
16 Sep 2023
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Ziyi Tang
Ruilin Wang
Weixing Chen
Keze Wang
Yang Liu
Tianshui Chen
Liang Lin
Tianshui Chen
Liang Lin
LRM
51
0
0
23 Aug 2023
World Models for Math Story Problems
Andreas Opedal
Niklas Stoehr
Abulhair Saparov
Mrinmaya Sachan
ReLM
112
13
0
07 Jun 2023
Large Language Models Are Not Strong Abstract Reasoners
Gaël Gendron
Qiming Bao
Michael Witbrock
Gillian Dobbie
ELM
LRM
110
37
0
31 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
101
54
0
24 May 2023
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
Moxin Li
Wenjie Wang
Fuli Feng
Yixin Cao
Jizhi Zhang
Tat-Seng Chua
OffRL
150
20
0
23 May 2023
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
Jordan Meadows
Marco Valentino
Damien Teney
André Freitas
120
8
0
21 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
136
6
0
21 May 2023
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility
Wen-song Ye
Mingfeng Ou
Tianyi Li
Yipeng Chen
Xuetao Ma
...
Sai Wu
Jie Fu
Gang Chen
Haobo Wang
Jiaqi Zhao
96
38
0
15 May 2023
Estimating the Causal Effects of Natural Logic Features in Neural NLI Models
Julia Rozanova
Marco Valentino
André Freitas
CML
69
4
0
15 May 2023
Measuring Consistency in Text-based Financial Forecasting Models
Linyi Yang
Yingpeng Ma
Yue Zhang
59
4
0
15 May 2023
Autonomous GIS: the next-generation AI-powered GIS
Zhenlong Li
H. Ning
SyDa
LLMAG
79
85
0
10 May 2023
NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports
Mael Jullien
Marco Valentino
H. Frost
Paul O'Regan
Dónal Landers
André Freitas
53
30
0
05 May 2023
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Yixuan Weng
Minjun Zhu
Fei Xia
Bin Li
Shizhu He
Kang Liu
Jun Zhao
80
6
0
04 Apr 2023
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
111
109
0
20 Mar 2023
Distilling Reasoning Capabilities into Smaller Language Models
Kumar Shridhar
Alessandro Stolfo
Mrinmaya Sachan
LRM
ReLM
125
176
0
01 Dec 2022
Automatic Generation of Socratic Subquestions for Teaching Math Word Problems
Kumar Shridhar
Jakub Macina
Mennatallah El-Assady
Tanmay Sinha
Manu Kapur
Mrinmaya Sachan
AIMat
98
48
0
23 Nov 2022
Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution
Emily McMilin
25
0
0
30 Sep 2022
1