Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.00277
Cited By
v1
v2 (latest)
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
31 August 2019
Lifu Huang
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
AIMat
RALM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning"
50 / 296 papers shown
Title
Localizing Task Information for Improved Model Merging and Compression
Ke Wang
Nikolaos Dimitriadis
Guillermo Ortiz-Jimenez
Franccois Fleuret
Pascal Frossard
MoMe
94
60
0
13 May 2024
AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts
Zefang Liu
Jiahua Luo
MoE
KELM
101
13
0
01 May 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
181
88
0
25 Apr 2024
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Shibo Hao
Yi Gu
Haotian Luo
Tianyang Liu
Xiyan Shao
...
Haodi Ma
Adithya Samavedhi
Qiyue Gao
Zhen Wang
Zhiting Hu
LRM
ELM
158
29
0
08 Apr 2024
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
Peng Ding
Jiading Fang
Peng Li
Kangrui Wang
Xiaochen Zhou
Mo Yu
Jing Li
Matthew R. Walter
Hongyuan Mei
RALM
ELM
99
6
0
29 Mar 2024
Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models
Adian Liusie
Yassir Fathullah
Mark Gales
65
5
0
20 Mar 2024
GraphERE: Jointly Multiple Event-Event Relation Extraction via Graph-Enhanced Event Embeddings
Haochen Li
Di Geng
36
0
0
19 Mar 2024
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English
H. M. Q. H. Sheikh Shafayat
Rishav Hada
Isaac Cowhey
Rifki Afina
Jerry Tworek
Lorie De Leon
69
3
0
16 Mar 2024
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Eunsu Kim
Juyoung Suk
Philhoon Oh
Haneul Yoo
James Thorne
Alice Oh
ELM
151
23
0
11 Mar 2024
PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset
Arda Uzunouglu
Abdalfatah Rashid Safa
Gözde Gül Sahin
LRM
88
2
0
05 Mar 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
149
17
0
21 Feb 2024
GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models
Sayantan Adak
Daivik Agrawal
Animesh Mukherjee
Somak Aditya
86
3
0
20 Feb 2024
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Fajri Koto
Haonan Li
Sara Shatnawi
Jad Doughman
Abdelrahman Boda Sadallah
...
Neha Sengupta
Shady Shehata
Nizar Habash
Preslav Nakov
Timothy Baldwin
ELM
LRM
163
44
0
20 Feb 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu
Siyu An
Min Zhang
Yulan He
Di Yin
Xing Sun
134
2
0
19 Feb 2024
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Seunghyeok Hong
Sangwon Baek
Sangdae Nam
Guijin Son
Seungone Kim
ELM
LRM
125
17
0
18 Feb 2024
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification
Soumya Sanyal
Tianyi Xiao
Jiacheng Liu
Wenya Wang
Xiang Ren
LRM
ReLM
131
12
0
06 Feb 2024
Distractor Generation for Multiple-Choice Questions: A Survey of Methods, Datasets, and Evaluation
Elaf Alhazmi
Quan Z. Sheng
W. Zhang
Munazza Zaib
A. Alhazmi
AI4Ed
94
1
0
02 Feb 2024
Desiderata for the Context Use of Question Answering Systems
Sagi Shaier
Lawrence E Hunter
Katharina von der Wense
131
5
0
31 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
101
9
0
30 Jan 2024
Towards Optimizing the Costs of LLM Usage
Shivanshu Shekhar
Tanishq Dubey
Koyel Mukherjee
Apoorv Saxena
Atharv Tyagi
Nishanth Kotla
83
22
0
29 Jan 2024
Benchmarking LLMs via Uncertainty Quantification
Fanghua Ye
Mingming Yang
Jianhui Pang
Longyue Wang
Derek F. Wong
Emine Yilmaz
Shuming Shi
Zhaopeng Tu
ELM
271
59
0
23 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
120
88
0
10 Jan 2024
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
96
0
0
09 Jan 2024
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Yuqing Wang
Yun Zhao
VLM
ReLM
LRM
146
24
0
29 Dec 2023
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns
Pavel Izmailov
Jan Hendrik Kirchner
Bowen Baker
Leo Gao
...
Adrien Ecoffet
Manas Joglekar
Jan Leike
Ilya Sutskever
Jeff Wu
ELM
168
299
0
14 Dec 2023
Neural Reasoning About Agents' Goals, Preferences, and Actions
Matteo Bortoletto
Lei Shi
Andreas Bulling
81
5
0
12 Dec 2023
Merging by Matching Models in Task Parameter Subspaces
Derek Tam
Mohit Bansal
Colin Raffel
MoMe
119
12
0
07 Dec 2023
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension
Akira Kawabata
Saku Sugawara
ELM
70
7
0
30 Nov 2023
CLOMO: Counterfactual Logical Modification with Large Language Models
Yinya Huang
Ruixin Hong
Hongming Zhang
Wei Shao
Zhicheng YANG
Dong Yu
Changshui Zhang
Xiaodan Liang
Linqi Song
LRM
74
9
0
29 Nov 2023
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
Andries P. Smit
Paul Duckworth
Nathan Grinsztajn
Thomas D. Barrett
Arnu Pretorius
105
27
0
29 Nov 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq Joty
ELM
CLL
AI4MH
LRM
ALM
156
28
0
28 Nov 2023
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Clifton A. Poth
Hannah Sterz
Indraneil Paul
Sukannya Purkayastha
Leon Arne Engländer
Timo Imhof
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
Jonas Pfeiffer
101
53
0
18 Nov 2023
Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities
Alex Wilf
Sihyun Shawn Lee
Paul Pu Liang
Louis-Philippe Morency
LRM
105
47
0
16 Nov 2023
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu
Oyvind Tafjord
Peter Clark
ELM
LRM
96
2
0
16 Nov 2023
MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation
Xiaozhi Wang
Hao Peng
Yong Guan
Kaisheng Zeng
Jianhui Chen
...
Yankai Lin
Zhiyuan Liu
Ruobing Xie
Jie Zhou
Juanzi Li
106
5
0
15 Nov 2023
Mirror: A Universal Framework for Various Information Extraction Tasks
Tong Zhu
Junfei Ren
Zijian Yu
Mengsong Wu
Guoliang Zhang
Xiaoye Qu
Wenliang Chen
Zhefeng Wang
Baoxing Huai
Min Zhang
89
14
0
09 Nov 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
88
5
0
28 Oct 2023
Multi-grained Evidence Inference for Multi-choice Reading Comprehension
Yilin Zhao
Hai Zhao
Sufeng Duan
64
2
0
27 Oct 2023
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks
Mete Ismayilzada
Debjit Paul
Syrielle Montariol
Mor Geva
Antoine Bosselut
LRM
96
5
0
23 Oct 2023
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question Answering
Haochen Shi
Weiqi Wang
Tianqing Fang
Baixuan Xu
Wenxuan Ding
Xin Liu
Yangqiu Song
116
7
0
17 Oct 2023
Instructive Dialogue Summarization with Query Aggregations
Bin Wang
Zhengyuan Liu
Nancy F. Chen
105
3
0
17 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
AI4Ed
LRM
ELM
114
46
0
07 Oct 2023
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
Man Luo
Shrinidhi Kumbhar
Ming shen
Mihir Parmar
Neeraj Varshney
Pratyay Banerjee
Somak Aditya
Chitta Baral
ReLM
ELM
LRM
137
31
0
02 Oct 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang
Yishan Li
Ge Zhang
Wenhao Huang
Bill Yuchen Lin
Wenhu Chen
ALM
132
69
0
01 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
126
15
0
29 Sep 2023
NLPBench: Evaluating Large Language Models on Solving NLP Problems
Linxin Song
Jieyu Zhang
Lechao Cheng
Pengyuan Zhou
Dinesh Manocha
Irene Li
ELM
LM&MA
LRM
111
12
0
27 Sep 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yuan Liu
Bing Qin
Ting Liu
LRM
AI4CE
137
175
0
27 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
111
24
0
20 Sep 2023
UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt
Yucheng Cai
Wentao Ma
Yuchuan Wu
Shuzheng Si
Yuan Shao
Zhijian Ou
Yongbin Li
120
3
0
20 Sep 2023
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
Bin Wang
Zhengyuan Liu
Xin Huang
Fangkai Jiao
Yang Ding
Ai Ti Aw
Nancy F. Chen
LRM
112
75
0
09 Sep 2023
Previous
1
2
3
4
5
6
Next