ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05320
  4. Cited By
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

14 January 2022
Alon Talmor
Ori Yoran
Ronan Le Bras
Chandrasekhar Bhagavatula
Yoav Goldberg
Yejin Choi
Jonathan Berant
    ELM
ArXivPDFHTML

Papers citing "CommonsenseQA 2.0: Exposing the Limits of AI through Gamification"

50 / 105 papers shown
Title
Causal Parrots: Large Language Models May Talk Causality But Are Not
  Causal
Causal Parrots: Large Language Models May Talk Causality But Are Not Causal
Matej Zečević
Moritz Willig
Devendra Singh Dhami
Kristian Kersting
LRM
30
102
0
24 Aug 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional
  Questions
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLM
KELM
LRM
34
2
0
02 Aug 2023
CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning
  Capabilities of Natural Language Models
CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models
Xingbo Wang
Renfei Huang
Zhihua Jin
Tianqing Fang
Huamin Qu
VLM
ReLM
LRM
40
1
0
23 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
46
99
0
20 Jul 2023
Toward Grounded Commonsense Reasoning
Toward Grounded Commonsense Reasoning
Minae Kwon
Hengyuan Hu
Vivek Myers
Siddharth Karamcheti
Anca Dragan
Dorsa Sadigh
LM&Ro
ReLM
LRM
42
9
0
14 Jun 2023
Fighting Bias with Bias: Promoting Model Robustness by Amplifying
  Dataset Biases
Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases
Yuval Reif
Roy Schwartz
28
7
0
30 May 2023
UFO: Unified Fact Obtaining for Commonsense Question Answering
UFO: Unified Fact Obtaining for Commonsense Question Answering
Zhifeng Li
Yifan Fan
Bowei Zou
Yu Hong
HILM
LRM
32
1
0
25 May 2023
Getting MoRE out of Mixture of Language Model Reasoning Experts
Getting MoRE out of Mixture of Language Model Reasoning Experts
Chenglei Si
Weijia Shi
Chen Zhao
Luke Zettlemoyer
Jordan L. Boyd-Graber
LRM
26
24
0
24 May 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via
  Debate
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate
Boshi Wang
Xiang Yue
Huan Sun
ELM
LRM
46
60
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
41
6
0
21 May 2023
Evaluation of medium-large Language Models at zero-shot closed book
  generative question answering
Evaluation of medium-large Language Models at zero-shot closed book generative question answering
René Peinl
Johannes Wirth
ELM
26
7
0
19 May 2023
KEPR: Knowledge Enhancement and Plausibility Ranking for Generative
  Commonsense Question Answering
KEPR: Knowledge Enhancement and Plausibility Ranking for Generative Commonsense Question Answering
Zhifeng Li
Bowei Zou
Yifan Fan
Yu Hong
24
3
0
15 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
  Human Utility of Free-Text Rationales
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAI
LRM
34
29
0
11 May 2023
Decker: Double Check with Heterogeneous Knowledge for Commonsense Fact
  Verification
Decker: Double Check with Heterogeneous Knowledge for Commonsense Fact Verification
Anni Zou
Zhuosheng Zhang
Hai Zhao
HILM
37
6
0
10 May 2023
CAT: A Contextualized Conceptualization and Instantiation Framework for
  Commonsense Reasoning
CAT: A Contextualized Conceptualization and Instantiation Framework for Commonsense Reasoning
Weiqi Wang
Tianqing Fang
Baixuan Xu
Chun Yi Louis Bo
Yangqiu Song
Lei Chen
ReLM
LRM
25
34
0
08 May 2023
Vera: A General-Purpose Plausibility Estimation Model for Commonsense
  Statements
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Jiacheng Liu
Wenya Wang
Dianzhuo Wang
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
VLM
43
48
0
05 May 2023
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Kaixin Ma
Hao Cheng
Yu Zhang
Xiaodong Liu
Eric Nyberg
Jianfeng Gao
LRM
28
16
0
04 May 2023
Can LLM Already Serve as A Database Interface? A BIg Bench for
  Large-Scale Database Grounded Text-to-SQLs
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Jinyang Li
Binyuan Hui
Ge Qu
Jiaxi Yang
Binhua Li
...
Guoliang Li
Kevin C. C. Chang
Fei Huang
Reynold Cheng
Yongbin Li
LMTD
59
363
0
04 May 2023
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from
  Linguistically Complex Text
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
Yunxin Li
Baotian Hu
Yuxin Ding
Lin Ma
Hao Fei
25
5
0
03 May 2023
LINGO : Visually Debiasing Natural Language Instructions to Support Task
  Diversity
LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Anjana Arunkumar
Shubham Sharma
Rakhi Agrawal
Sriramakrishnan Chandrasekaran
Chris Bryan
34
0
0
12 Apr 2023
Natural Language Reasoning, A Survey
Natural Language Reasoning, A Survey
Fei Yu
Hongbo Zhang
Prayag Tiwari
Benyou Wang
ReLM
LRM
49
53
0
26 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
66
0
13 Mar 2023
Commonsense Reasoning for Conversational AI: A Survey of the State of
  the Art
Commonsense Reasoning for Conversational AI: A Survey of the State of the Art
Christopher Richardson
Larry Heck
LRM
30
8
0
15 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
Benchmarks for Automated Commonsense Reasoning: A Survey
E. Davis
ELM
LRM
24
58
0
09 Feb 2023
Real-Time Visual Feedback to Guide Benchmark Creation: A
  Human-and-Metric-in-the-Loop Workflow
Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Anjana Arunkumar
Swaroop Mishra
Bhavdeep Singh Sachdeva
Chitta Baral
Chris Bryan
30
0
0
09 Feb 2023
LaMPP: Language Models as Probabilistic Priors for Perception and Action
LaMPP: Language Models as Probabilistic Priors for Perception and Action
Belinda Z. Li
William Chen
Pratyusha Sharma
Jacob Andreas
24
15
0
03 Feb 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
99
35
0
01 Jan 2023
GPT-Neo for commonsense reasoning -- a theoretical and practical lens
GPT-Neo for commonsense reasoning -- a theoretical and practical lens
Rohan Kashyap
Vivek Kashyap
Narendra C.P
ReLM
ELM
LRM
38
7
0
28 Nov 2022
A Simple, Yet Effective Approach to Finding Biases in Code Generation
A Simple, Yet Effective Approach to Finding Biases in Code Generation
Spyridon Mouselinos
Mateusz Malinowski
Henryk Michalewski
18
7
0
31 Oct 2022
Two is Better than Many? Binary Classification as an Effective Approach
  to Multi-Choice Question Answering
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Deepanway Ghosal
Navonil Majumder
Rada Mihalcea
Soujanya Poria
58
10
0
29 Oct 2022
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach
W. Yu
Chenguang Zhu
Zhihan Zhang
Shuohang Wang
Zhuosheng Zhang
Yuwei Fang
Meng Jiang
LRM
ReLM
15
19
0
23 Oct 2022
Open-domain Question Answering via Chain of Reasoning over Heterogeneous
  Knowledge
Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge
Kaixin Ma
Hao Cheng
Xiaodong Liu
Eric Nyberg
Jianfeng Gao
LRM
144
15
0
22 Oct 2022
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Zhuosheng Zhang
Shuohang Wang
Yichong Xu
Yuwei Fang
W. Yu
Yang Liu
Han Zhao
Chenguang Zhu
Michael Zeng
SSL
LRM
25
16
0
12 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
118
79
0
11 Oct 2022
Unpacking Large Language Models with Conceptual Consistency
Unpacking Large Language Models with Conceptual Consistency
Pritish Sahu
Michael Cogswell
Yunye Gong
Ajay Divakaran
LRM
87
16
0
29 Sep 2022
Elaboration-Generating Commonsense Question Answering at Scale
Elaboration-Generating Commonsense Question Answering at Scale
Wenya Wang
Vivek Srikumar
Hannaneh Hajishirzi
Noah A. Smith
ELM
LRM
37
15
0
02 Sep 2022
On Reality and the Limits of Language Data: Aligning LLMs with Human
  Norms
On Reality and the Limits of Language Data: Aligning LLMs with Human Norms
Nigel Collier
Fangyu Liu
Ehsan Shareghi
16
3
0
25 Aug 2022
Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense
  Reasoning
Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning
Letian Peng
Z. Li
Hai Zhao
ReLM
LRM
18
1
0
23 Aug 2022
WinoGAViL: Gamified Association Benchmark to Challenge
  Vision-and-Language Models
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
25
19
0
25 Jul 2022
Hidden Schema Networks
Hidden Schema Networks
Ramses J. Sanchez
L. Conrads
Pascal Welke
K. Cvejoski
C. Ojeda
NAI
MILM
19
3
0
08 Jul 2022
DFM: Dialogue Foundation Model for Universal Large-Scale
  Dialogue-Oriented Task Learning
DFM: Dialogue Foundation Model for Universal Large-Scale Dialogue-Oriented Task Learning
Zhi Chen
Jijia Bao
Lu Chen
Yuncong Liu
Da Ma
...
Xinhsuai Dong
Fujiang Ge
Qingliang Miao
Jian-Guang Lou
Kai Yu
ALM
AI4CE
48
3
0
25 May 2022
Maieutic Prompting: Logically Consistent Reasoning with Recursive
  Explanations
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Jaehun Jung
Lianhui Qin
Sean Welleck
Faeze Brahman
Chandra Bhagavatula
Ronan Le Bras
Yejin Choi
ReLM
LRM
229
190
0
24 May 2022
UL2: Unifying Language Learning Paradigms
UL2: Unifying Language Learning Paradigms
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
Xavier Garcia
Jason W. Wei
...
Tal Schuster
H. Zheng
Denny Zhou
N. Houlsby
Donald Metzler
AI4CE
59
297
0
10 May 2022
Great Truths are Always Simple: A Rather Simple Knowledge Encoder for
  Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models
Great Truths are Always Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models
Jinhao Jiang
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
24
15
0
04 May 2022
Inferring Implicit Relations in Complex Questions with Language Models
Inferring Implicit Relations in Complex Questions with Language Models
Uri Katz
Mor Geva
Jonathan Berant
ReLM
LRM
24
11
0
28 Apr 2022
Explanation Graph Generation via Pre-trained Language Models: An
  Empirical Study with Contrastive Learning
Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning
Swarnadeep Saha
Prateek Yadav
Joey Tianyi Zhou
11
9
0
11 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
79
212
0
16 Jan 2022
CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning
  of Large Language Models
CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
Jorg Frohberg
Frank Binder
SLR
6
27
0
22 Dec 2021
Human Parity on CommonsenseQA: Augmenting Self-Attention with External
  Attention
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
Yichong Xu
Chenguang Zhu
Shuohang Wang
Siqi Sun
Hao Cheng
Xiaodong Liu
Jianfeng Gao
Pengcheng He
Michael Zeng
Xuedong Huang
LRM
254
55
0
06 Dec 2021
Previous
123
Next