Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.00293
Cited By
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension
2 February 2020
Max Bartolo
A. Roberts
Johannes Welbl
Sebastian Riedel
Pontus Stenetorp
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension"
50 / 51 papers shown
Title
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training
Vivian Liu
Yiqiao Yin
40
11
0
01 Apr 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
Jessica Quaye
Alicia Parrish
Oana Inel
Charvi Rastogi
Hannah Rose Kirk
...
Nathan Clement
Rafael Mosquera
Juan Ciro
Vijay Janapa Reddi
Lora Aroyo
45
7
0
14 Feb 2024
Desiderata for the Context Use of Question Answering Systems
Sagi Shaier
Lawrence E Hunter
K. Wense
28
4
0
31 Jan 2024
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
30
0
0
20 Jan 2024
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
27
32
0
20 Oct 2023
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
32
25
0
20 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
16
3
0
02 Oct 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLM
KELM
LRM
34
2
0
02 Aug 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
44
75
0
07 Jun 2023
Entailment as Robust Self-Learner
Jiaxin Ge
Hongyin Luo
Yoon Kim
James R. Glass
42
3
0
26 May 2023
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
37
2
0
23 May 2023
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
31
11
0
21 May 2023
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?
Neeraj Varshney
Mihir Parmar
Nisarg Patel
Divij Handa
Sayantan Sarkar
Man Luo
Chitta Baral
LRM
34
4
0
20 May 2023
A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors
Alexander Hoelzemann
Kristof Van Laerhoven
19
6
0
15 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
41
3
0
11 May 2023
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
30
42
0
31 Mar 2023
Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension
Son Quoc Tran
Phong Nguyen-Thuan Do
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
39
0
0
16 Mar 2023
Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning?
Chengwei Qin
Q. Li
Ruochen Zhao
Chenyu You
VLM
LRM
23
15
0
16 Feb 2023
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
58
100
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
367
0
19 Dec 2022
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question
Alireza Mohammadshahi
Thomas Scialom
Majid Yazdani
Pouya Yanki
Angela Fan
James Henderson
Marzieh Saeidi
31
20
0
02 Nov 2022
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension
Rifki Afina Putri
Alice Oh
33
9
0
25 Oct 2022
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence
Hung-Ting Chen
Michael J.Q. Zhang
Eunsol Choi
RALM
HILM
47
92
0
25 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
146
24
0
10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
129
95
0
06 Oct 2022
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
39
2
0
06 Oct 2022
Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios
Mana Ashida
Saku Sugawara
70
6
0
16 Sep 2022
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension
Xanh Ho
Johannes Mario Meissner
Saku Sugawara
Akiko Aizawa
OffRL
35
4
0
05 Sep 2022
Collecting high-quality adversarial data for machine reading comprehension tasks with humans and models in the loop
Damian Y. Romero Diaz
M. Aniol
John M. Culnan
32
0
0
28 Jun 2022
Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts
Qinyuan Ye
Juan Zha
Xiang Ren
MoE
18
12
0
25 May 2022
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Yizhong Wang
Swaroop Mishra
Pegah Alipoormolabashi
Yeganeh Kordi
Amirreza Mirzaei
...
Chitta Baral
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
Daniel Khashabi
ELM
64
790
0
16 Apr 2022
Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting
Gabriel Orlanski
LRM
27
2
0
29 Mar 2022
What Makes Reading Comprehension Questions Difficult?
Saku Sugawara
Nikita Nangia
Alex Warstadt
Sam Bowman
ELM
RALM
20
13
0
12 Mar 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
47
212
0
16 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
82
212
0
16 Jan 2022
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Alon Talmor
Ori Yoran
Ronan Le Bras
Chandrasekhar Bhagavatula
Yoav Goldberg
Yejin Choi
Jonathan Berant
ELM
33
141
0
14 Jan 2022
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants
Max Bartolo
Tristan Thrush
Sebastian Riedel
Pontus Stenetorp
Robin Jia
Douwe Kiela
24
33
0
16 Dec 2021
QuALITY: Question Answering with Long Input Texts, Yes!
Richard Yuanzhe Pang
Alicia Parrish
Nitish Joshi
Nikita Nangia
Jason Phang
...
Vishakh Padmakumar
Johnny Ma
Jana Thompson
He He
Sam Bowman
RALM
30
141
0
16 Dec 2021
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair
Jason Phang
Angelica Chen
William Huang
Samuel R. Bowman
AAML
28
13
0
16 Nov 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLM
ELM
AAML
33
216
0
04 Nov 2021
Retrieval-guided Counterfactual Generation for QA
Bhargavi Paranjape
Matthew Lamm
Ian Tenney
33
31
0
14 Oct 2021
Distantly-Supervised Evidence Retrieval Enables Question Answering without Evidence Annotation
Chen Zhao
Chenyan Xiong
Jordan L. Boyd-Graber
Hal Daumé
RALM
29
9
0
10 Oct 2021
On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study
Divyansh Kaushik
Douwe Kiela
Zachary Chase Lipton
Wen-tau Yih
AAML
11
36
0
02 Jun 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
223
180
0
18 Apr 2021
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
Max Bartolo
Tristan Thrush
Robin Jia
Sebastian Riedel
Pontus Stenetorp
Douwe Kiela
AAML
28
103
0
18 Apr 2021
Cooperative Self-training of Machine Reading Comprehension
Hongyin Luo
Shang-Wen Li
Ming Gao
Seunghak Yu
James R. Glass
SyDa
RALM
20
11
0
12 Mar 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
259
678
0
06 Jan 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
77
0
30 Dec 2020
Elastic weight consolidation for better bias inoculation
James Thorne
Andreas Vlachos
22
11
0
29 Apr 2020
1
2
Next