Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02106
Cited By
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
4 June 2024
Weiqi Wang
Yangqiu Song
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset"
50 / 61 papers shown
Title
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association
Weiqi Wang
Limeng Cui
Xin Liu
Sreyashi Nag
Wenju Xu
...
Y. Gao
Haiyang Zhang
Qi He
Shuiwang Ji
Yangqiu Song
98
3
0
21 May 2025
Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
Chunyang Li
Weiqi Wang
Tianshi Zheng
Yangqiu Song
LRM
68
6
0
22 Feb 2025
On the Role of Entity and Event Level Conceptualization in Generalizable Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions
Weiqi Wang
Tianqing Fang
Haochen Shi
Baixuan Xu
Wenxuan Ding
...
Wei Fan
Jiaxin Bai
Haoran Li
Xin Liu
Yangqiu Song
LRM
66
3
0
16 Jun 2024
MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
Baixuan Xu
Weiqi Wang
Haochen Shi
Wenxuan Ding
Huihao Jing
Tianqing Fang
Jiaxin Bai
Long Chen
Yangqiu Song
69
10
0
15 Jun 2024
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce
Wenxuan Ding
Weiqi Wang
Sze Heng Douglas Kwok
Minghao Liu
Tianqing Fang
Jiaxin Bai
Junxian He
Yangqiu Song
RALM
56
8
0
14 Jun 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
71
180
0
15 May 2024
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
Ruixin Yang
Dheeraj Rajagopal
S. Hayati
Bin Hu
Dongyeop Kang
LLMAG
89
7
0
14 Apr 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
86
479
0
20 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
158
460
0
13 Mar 2024
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery
Feihong Lu
Weiqi Wang
Yangyifei Luo
Ziqin Zhu
Qingyun Sun
...
Haochen Shi
Shiqi Gao
Qian Li
Yangqiu Song
Jianxin Li
VLM
83
3
0
28 Feb 2024
Understanding the planning of LLM agents: A survey
Xu Huang
Weiwen Liu
Xiaolong Chen
Xingmei Wang
Hao Wang
Defu Lian
Yasheng Wang
Ruiming Tang
Enhong Chen
LLMAG
LM&Ro
75
146
0
05 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
62
130
0
31 Jan 2024
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
Wenyue Hua
Jiang Guo
Mingwen Dong
He Zhu
Patrick Ng
Zhiguo Wang
KELM
81
20
0
31 Jan 2024
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning
Weiqi Wang
Tianqing Fang
Chunyang Li
Haochen Shi
Wenxuan Ding
...
Jiaxin Bai
Xin Liu
Cheng Jiayang
Chunkit Chan
Yangqiu Song
LRM
33
29
0
14 Jan 2024
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Yihan Chen
Benfeng Xu
Quan Wang
Yi Liu
Zhendong Mao
ALM
ELM
47
27
0
01 Jan 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DV
RALM
85
1,658
1
18 Dec 2023
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
124
420
0
28 Nov 2023
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
Zhaowei Wang
Haochen Shi
Weiqi Wang
Tianqing Fang
Hongming Zhang
Sehyun Choi
Xin Liu
Yangqiu Song
43
20
0
15 Nov 2023
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
55
35
0
24 Oct 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
60
77
0
13 Sep 2023
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen
Hongyu Lin
Xianpei Han
Le Sun
3DV
RALM
27
282
0
04 Sep 2023
Reasoning in Large Language Models Through Symbolic Math Word Problems
Vedant Gaur
Nikunj Saunshi
ReLM
LRM
52
27
0
03 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
197
11,484
0
18 Jul 2023
Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez
Priyanka Agrawal
Sebastian Gehrmann
ELM
LM&MA
58
31
0
29 Jun 2023
Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
Qingyu Tan
Hwee Tou Ng
Lidong Bing
LRM
78
25
0
15 Jun 2023
The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code
Xiao Liu
Da Yin
Chen Zhang
Yansong Feng
Dongyan Zhao
ELM
ReLM
ReCod
LRM
59
20
0
30 May 2023
Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints
Jiaxin Bai
Xin Liu
Weiqi Wang
Chen Luo
Yangqiu Song
NAI
40
31
0
30 May 2023
Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios
Jiaxuan Li
Lang-Chi Yu
Allyson Ettinger
LRM
ELM
18
24
0
26 May 2023
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering
Weiqi Wang
Tianqing Fang
Wenxuan Ding
Baixuan Xu
Xin Liu
Yangqiu Song
Antoine Bosselut
ReLM
LRM
38
41
0
24 May 2023
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
HILM
VLM
56
241
0
19 May 2023
Natural Language Decomposition and Interpretation of Complex Utterances
Harsh Jhamtani
Hao Fang
Patrick Xia
Eran Levy
Jacob Andreas
Benjamin Van Durme
ReLM
44
7
0
15 May 2023
Distilling Script Knowledge from Large Language Models for Constrained Language Planning
Siyu Yuan
Jiangjie Chen
Ziquan Fu
Xuyang Ge
Soham Shah
C. R. Jankowski
Yanghua Xiao
Deqing Yang
61
54
0
09 May 2023
CAT: A Contextualized Conceptualization and Instantiation Framework for Commonsense Reasoning
Weiqi Wang
Tianqing Fang
Baixuan Xu
Chun Yi Louis Bo
Yangqiu Song
Lei Chen
ReLM
LRM
42
37
0
08 May 2023
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Jiacheng Liu
Wenya Wang
Dianzhuo Wang
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
VLM
70
50
0
05 May 2023
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models
Ning Bian
Xianpei Han
Le Sun
Hongyu Lin
Yaojie Lu
Xianpei Han
Shanshan Jiang
Bin Dong
KELM
ELM
AI4MH
LRM
41
78
0
29 Mar 2023
MathPrompter: Mathematical Reasoning using Large Language Models
Shima Imani
Liang Du
H. Shrivastava
KELM
ReLM
LRM
34
203
0
04 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
593
12,840
0
27 Feb 2023
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Chengwei Qin
Aston Zhang
Zhuosheng Zhang
Jiaao Chen
Michihiro Yasunaga
Diyi Yang
LM&MA
AI4MH
LRM
ELM
117
689
0
08 Feb 2023
Large Language Models Can Be Easily Distracted by Irrelevant Context
Freda Shi
Xinyun Chen
Kanishka Misra
Nathan Scales
David Dohan
Ed H. Chi
Nathanael Scharli
Denny Zhou
ReLM
RALM
LRM
65
564
0
31 Jan 2023
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
Yunhu Ye
Binyuan Hui
Min Yang
Binhua Li
Fei Huang
Yongbin Li
LMTD
ReLM
LRM
59
147
0
31 Jan 2023
Reasoning with Language Model Prompting: A Survey
Shuofei Qiao
Yixin Ou
Ningyu Zhang
Xiang Chen
Yunzhi Yao
Shumin Deng
Chuanqi Tan
Fei Huang
Huajun Chen
ReLM
ELM
LRM
91
319
0
19 Dec 2022
Language Models as Agent Models
Jacob Andreas
LLMAG
44
135
0
03 Dec 2022
FolkScope: Intention Knowledge Graph Construction for E-commerce Commonsense Discovery
Changlong Yu
Weiqi Wang
Xin Liu
Jiaxin Bai
Yangqiu Song
Zheng Li
Yifan Gao
Tianyu Cao
Bing Yin
75
23
0
15 Nov 2022
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
Karthik Valmeekam
Matthew Marquez
Alberto Olmo
S. Sreedharan
Subbarao Kambhampati
ReLM
LRM
44
215
0
21 Jun 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
449
3,486
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
562
9,009
0
28 Jan 2022
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Pengcheng He
Jianfeng Gao
Weizhu Chen
102
1,155
0
18 Nov 2021
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Peter West
Chandrasekhar Bhagavatula
Jack Hessel
Jena D. Hwang
Liwei Jiang
Ronan Le Bras
Ximing Lu
Sean Welleck
Yejin Choi
SyDa
82
331
0
14 Oct 2021
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset
Tianqing Fang
Weiqi Wang
Sehyun Choi
Shibo Hao
Hongming Zhang
Yangqiu Song
Bin He
42
31
0
16 Sep 2021
DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense Knowledge
Tianqing Fang
Hongming Zhang
Weiqi Wang
Yangqiu Song
Bin He
65
46
0
01 Jan 2021
1
2
Next