ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.10044
  4. Cited By
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

24 May 2019
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
ArXivPDFHTML

Papers citing "BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions"

50 / 1,041 papers shown
Title
PROST: Physical Reasoning of Objects through Space and Time
PROST: Physical Reasoning of Objects through Space and Time
Stéphane Aroca-Ouellette
Cory Paik
A. Roncone
Katharina Kann
LRM
19
47
0
07 Jun 2021
A Cluster-based Approach for Improving Isotropy in Contextual Embedding
  Space
A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space
S. Rajaee
Mohammad Taher Pilehvar
18
41
0
02 Jun 2021
COM2SENSE: A Commonsense Reasoning Benchmark with Complementary
  Sentences
COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences
Shikhar Singh
Nuan Wen
Yu Hou
Pegah Alipoormolabashi
Te-Lin Wu
Xuezhe Ma
Nanyun Peng
LRM
23
57
0
02 Jun 2021
Comparing Test Sets with Item Response Theory
Comparing Test Sets with Item Response Theory
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
27
39
0
01 Jun 2021
The Out-of-Distribution Problem in Explainability and Search Methods for
  Feature Importance Explanations
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
Peter Hase
Harry Xie
Joey Tianyi Zhou
OODD
LRM
FAtt
23
91
0
01 Jun 2021
Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus
  Covering the 2002 Gujarat Violence
Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence
Andrew Halterman
Katherine A. Keith
Sheikh Muhammad Sarwar
Brendan O'Connor
27
27
0
27 May 2021
True Few-Shot Learning with Language Models
True Few-Shot Learning with Language Models
Ethan Perez
Douwe Kiela
Kyunghyun Cho
21
427
0
24 May 2021
KLUE: Korean Language Understanding Evaluation
KLUE: Korean Language Understanding Evaluation
Sungjoon Park
Jihyung Moon
Sungdong Kim
Won Ik Cho
Jiyoon Han
...
Seonghyun Kim
Lucy Park
Alice Oh
Jung-Woo Ha
Kyunghyun Cho
ELM
VLM
29
191
0
20 May 2021
Enhancing Transformers with Gradient Boosted Decision Trees for NLI
  Fine-Tuning
Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
Benjamin Minixhofer
Milan Gritta
Ignacio Iacobacci
AI4CE
11
5
0
08 May 2021
Empirical Evaluation of Pre-trained Transformers for Human-Level NLP:
  The Role of Sample Size and Dimensionality
Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality
Adithya V Ganesan
Matthew Matero
Aravind Reddy Ravula
Huy-Hien Vu
H. Andrew Schwartz
30
35
0
07 May 2021
Entailment as Few-Shot Learner
Entailment as Few-Shot Learner
Sinong Wang
Han Fang
Madian Khabsa
Hanzi Mao
Hao Ma
35
183
0
29 Apr 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in
  NLP
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
223
180
0
18 Apr 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
43
425
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,858
0
18 Apr 2021
Competency Problems: On Finding and Removing Artifacts in Language Data
Competency Problems: On Finding and Removing Artifacts in Language Data
Matt Gardner
William Merrill
Jesse Dodge
Matthew E. Peters
Alexis Ross
Sameer Singh
Noah A. Smith
171
107
0
17 Apr 2021
Surface Form Competition: Why the Highest Probability Answer Isn't
  Always Right
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
Ari Holtzman
Peter West
Vered Schwartz
Yejin Choi
Luke Zettlemoyer
LRM
24
230
0
16 Apr 2021
What to Pre-Train on? Efficient Intermediate Task Selection
What to Pre-Train on? Efficient Intermediate Task Selection
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
19
94
0
16 Apr 2021
Multivalent Entailment Graphs for Question Answering
Multivalent Entailment Graphs for Question Answering
Nick McKenna
Liane Guillou
Mohammad Javad Hosseini
Sander Bijl de Vroe
Mark Johnson
Mark Steedman
NAI
35
14
0
16 Apr 2021
Sequence tagging for biomedical extractive question answering
Sequence tagging for biomedical extractive question answering
Wonjin Yoon
Richard Jackson
Aron Lagerberg
Jaewoo Kang
MedIm
12
26
0
15 Apr 2021
Does Putting a Linguist in the Loop Improve NLU Data Collection?
Does Putting a Linguist in the Loop Improve NLU Data Collection?
Alicia Parrish
William Huang
Omar Agha
Soo-hwan Lee
Nikita Nangia
Alex Warstadt
Karmanya Aggarwal
Emily Allaway
Tal Linzen
Samuel R. Bowman
30
40
0
15 Apr 2021
TWEAC: Transformer with Extendable QA Agent Classifiers
TWEAC: Transformer with Extendable QA Agent Classifiers
Gregor Geigle
Nils Reimers
Andreas Rucklé
Iryna Gurevych
ViT
21
22
0
14 Apr 2021
Structural analysis of an all-purpose question answering model
Structural analysis of an all-purpose question answering model
Vincent Micheli
Quentin Heinrich
Franccois Fleuret
Wacim Belblidia
18
3
0
13 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
32
150
0
13 Apr 2021
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
Roshanak Mirzaee
Hossein Rajaby Faghihi
Qiang Ning
Parisa Kordjmashidi
20
77
0
12 Apr 2021
Achieving Model Robustness through Discrete Adversarial Training
Achieving Model Robustness through Discrete Adversarial Training
Maor Ivgi
Jonathan Berant
AAML
19
27
0
11 Apr 2021
Adapting Language Models for Zero-shot Learning by Meta-tuning on
  Dataset and Prompt Collections
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Ruiqi Zhong
Kristy Lee
Zheng-Wei Zhang
Dan Klein
39
166
0
10 Apr 2021
Connecting Attributions and QA Model Behavior on Realistic
  Counterfactuals
Connecting Attributions and QA Model Behavior on Realistic Counterfactuals
Xi Ye
Rohan Nair
Greg Durrett
21
24
0
09 Apr 2021
AmbiFC: Fact-Checking Ambiguous Claims with Evidence
AmbiFC: Fact-Checking Ambiguous Claims with Evidence
Max Glockner
Ieva Staliunaite
James Thorne
Gisela Vallejo
Andreas Vlachos
Iryna Gurevych
32
22
0
01 Apr 2021
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New
  Multitask Benchmark
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Nicholas Lourie
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
LRM
30
137
0
24 Mar 2021
Improving and Simplifying Pattern Exploiting Training
Improving and Simplifying Pattern Exploiting Training
Derek Tam
Rakesh R Menon
Joey Tianyi Zhou
Shashank Srivastava
Colin Raffel
18
149
0
22 Mar 2021
GPT Understands, Too
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
84
1,146
0
18 Mar 2021
How Many Data Points is a Prompt Worth?
How Many Data Points is a Prompt Worth?
Teven Le Scao
Alexander M. Rush
VLM
66
296
0
15 Mar 2021
DOCENT: Learning Self-Supervised Entity Representations from Large
  Document Collections
DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections
Yury Zemlyanskiy
Sudeep Gandhe
Ruining He
Bhargav Kanagal
Anirudh Ravula
Juraj Gottweis
Fei Sha
Ilya Eckstein
SSL
31
11
0
26 Feb 2021
Muppet: Massive Multi-task Representations with Pre-Finetuning
Muppet: Massive Multi-task Representations with Pre-Finetuning
Armen Aghajanyan
Anchit Gupta
Akshat Shrivastava
Xilun Chen
Luke Zettlemoyer
Sonal Gupta
33
266
0
26 Jan 2021
English Machine Reading Comprehension Datasets: A Survey
English Machine Reading Comprehension Datasets: A Survey
Daria Dzendzik
Carl Vogel
Jennifer Foster
RALM
AIMat
27
49
0
25 Jan 2021
Unanswerable Questions about Images and Texts
Unanswerable Questions about Images and Texts
E. Davis
45
12
0
25 Jan 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
259
678
0
06 Jan 2021
Retrieving and Reading: A Comprehensive Survey on Open-domain Question
  Answering
Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering
Fengbin Zhu
Wenqiang Lei
Chao Wang
Jianming Zheng
Soujanya Poria
Tat-Seng Chua
RALM
213
252
0
04 Jan 2021
FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale
  Generation
FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation
Kushal Lakhotia
Bhargavi Paranjape
Asish Ghoshal
Wen-tau Yih
Yashar Mehdad
Srini Iyer
25
27
0
31 Dec 2020
Learning from Mistakes: Using Mis-predictions as Harm Alerts in Language
  Pre-Training
Learning from Mistakes: Using Mis-predictions as Harm Alerts in Language Pre-Training
Chen Xing
Wenhao Liu
Caiming Xiong
29
0
0
16 Dec 2020
Reference Knowledgeable Network for Machine Reading Comprehension
Reference Knowledgeable Network for Machine Reading Comprehension
Yilin Zhao
Zhuosheng Zhang
Hai Zhao
18
5
0
07 Dec 2020
Learning from Task Descriptions
Learning from Task Descriptions
Orion Weller
Nicholas Lourie
Matt Gardner
Matthew E. Peters
47
89
0
16 Nov 2020
When Do You Need Billions of Words of Pretraining Data?
When Do You Need Billions of Words of Pretraining Data?
Yian Zhang
Alex Warstadt
Haau-Sing Li
Samuel R. Bowman
29
136
0
10 Nov 2020
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Tatiana Shavrina
Alena Fenogenova
Anton A. Emelyanov
Denis Shevelev
Ekaterina Artemova
Valentin Malykh
Vladislav Mikhailov
Maria Tikhonova
Andrey Chertok
Andrey Evlampiev
VLM
ELM
27
81
0
29 Oct 2020
Measuring Association Between Labels and Free-Text Rationales
Measuring Association Between Labels and Free-Text Rationales
Sarah Wiegreffe
Ana Marasović
Noah A. Smith
282
170
0
24 Oct 2020
Optimal Subarchitecture Extraction For BERT
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
51
18
0
20 Oct 2020
Evaluating and Characterizing Human Rationales
Evaluating and Characterizing Human Rationales
Samuel Carton
Anirudh Rathore
Chenhao Tan
22
48
0
09 Oct 2020
MOCHA: A Dataset for Training and Evaluating Generative Reading
  Comprehension Metrics
MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics
Anthony Chen
Gabriel Stanovsky
Sameer Singh
Matt Gardner
19
50
0
07 Oct 2020
"I'd rather just go to bed": Understanding Indirect Answers
"I'd rather just go to bed": Understanding Indirect Answers
Annie Louis
Dan Roth
Filip Radlinski
11
43
0
07 Oct 2020
A Review on Fact Extraction and Verification
A Review on Fact Extraction and Verification
Giannis Bekoulis
Christina Papagiannopoulou
Nikos Deligiannis
33
42
0
06 Oct 2020
Previous
123...192021
Next