ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07898
  4. Cited By
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

21 August 2019
Mor Geva
Yoav Goldberg
Jonathan Berant
ArXivPDFHTML

Papers citing "Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets"

50 / 63 papers shown
Title
LLM-Human Pipeline for Cultural Context Grounding of Conversations
LLM-Human Pipeline for Cultural Context Grounding of Conversations
Rajkumar Pujari
Dan Goldwasser
28
1
0
17 Oct 2024
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets
Tommaso Giorgi
Lorenzo Cima
T. Fagni
M. Avvenuti
S. Cresci
40
9
0
10 Oct 2024
Are We Done with MMLU?
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
52
30
0
06 Jun 2024
Annotation Sensitivity: Training Data Collection Methods Affect Model
  Performance
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
Christoph Kern
Stephanie Eckman
Jacob Beck
Rob Chew
Bolei Ma
Frauke Kreuter
24
9
0
23 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in
  Perspectives
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
34
9
0
09 Nov 2023
Mind the instructions: a holistic evaluation of consistency and
  interactions in prompt-based learning
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
30
24
0
20 Oct 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional
  Questions
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLM
KELM
LRM
27
2
0
02 Aug 2023
Does Collaborative Human-LM Dialogue Generation Help Information
  Extraction from Human Dialogues?
Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?
Bo-Ru Lu
Nikita Haduong
Chia-Hsuan Lee
Zeqiu Wu
Hao Cheng
Paul Koester
J. Utke
Tao Yu
Noah A. Smith
Mari Ostendorf
SyDa
47
2
0
13 Jul 2023
TalkUp: Paving the Way for Understanding Empowering Language
TalkUp: Paving the Way for Understanding Empowering Language
Lucille Njoo
Chan Young Park
Octavia Stappart
Marvin Thielk
Yi Chu
Yulia Tsvetkov
16
3
0
23 May 2023
EASE: An Easily-Customized Annotation System Powered by Efficiency
  Enhancement Mechanisms
EASE: An Easily-Customized Annotation System Powered by Efficiency Enhancement Mechanisms
Naihao Deng
Yikai Liu
Mingye Chen
Winston Wu
Siyang Liu
Yulong Chen
Yue Zhang
Rada Mihalcea
26
0
0
23 May 2023
Beyond Labels: Empowering Human Annotators with Natural Language
  Explanations through a Novel Active-Learning Architecture
Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture
Bingsheng Yao
Ishan Jindal
Lucian Popa
Yannis Katsis
Sayan Ghosh
...
Yuxuan Lu
Shashank Srivastava
Yunyao Li
James A. Hendler
Dakuo Wang
32
10
0
22 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
34
26
0
15 May 2023
MisRoBÆRTa: Transformers versus Misinformation
MisRoBÆRTa: Transformers versus Misinformation
Ciprian-Octavian Truică
Elena Simona Apostol
19
37
0
16 Apr 2023
LINGO : Visually Debiasing Natural Language Instructions to Support Task
  Diversity
LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Anjana Arunkumar
Shubham Sharma
Rakhi Agrawal
Sriramakrishnan Chandrasekaran
Chris Bryan
26
0
0
12 Apr 2023
Angler: Helping Machine Translation Practitioners Prioritize Model
  Improvements
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
30
15
0
12 Apr 2023
Investigating Multi-source Active Learning for Natural Language
  Inference
Investigating Multi-source Active Learning for Natural Language Inference
Ard Snijders
Douwe Kiela
Katerina Margatina
24
7
0
14 Feb 2023
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting
  Algorithms under Distributional Shift
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift
Defu Cao
Yousef El-Laham
Loc Trinh
Svitlana Vyetrenko
Y. Liu
26
11
0
17 Nov 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
28
0
0
14 Oct 2022
HumSet: Dataset of Multilingual Information Extraction and
  Classification for Humanitarian Crisis Response
HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Selim Fekih
Nicolò Tamagnone
Benjamin Minixhofer
R. Shrestha
Ximena Contla
Ewan Oglethorpe
Navid Rekabsaz
11
6
0
10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
114
93
0
06 Oct 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech
  Detection Models
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
28
65
0
20 Jun 2022
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Divyansh Kaushik
Zachary Chase Lipton
A. London
25
2
0
08 Jun 2022
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot
  with Multi-Source Learning
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning
Oscar Sainz
Itziar Gonzalez-Dios
Oier López de Lacalle
Bonan Min
Eneko Agirre
21
49
0
03 May 2022
Image Retrieval from Contextual Descriptions
Image Retrieval from Contextual Descriptions
Benno Krojer
Vaibhav Adlakha
Vibhav Vineet
Yash Goyal
E. Ponti
Siva Reddy
11
29
0
29 Mar 2022
Generating Data to Mitigate Spurious Correlations in Natural Language
  Inference Datasets
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
26
67
0
24 Mar 2022
Less is More: Summary of Long Instructions is Better for Program
  Synthesis
Less is More: Summary of Long Instructions is Better for Program Synthesis
Kirby Kuznia
Swaroop Mishra
Mihir Parmar
Chitta Baral
AIMat
28
22
0
16 Mar 2022
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman
Furkan Şahinuç
E. Yilmaz
24
58
0
02 Mar 2022
A Survey on Programmatic Weak Supervision
A Survey on Programmatic Weak Supervision
Jieyu Zhang
Cheng-Yu Hsieh
Yue Yu
Chao Zhang
Alexander Ratner
24
91
0
11 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
39
212
0
16 Jan 2022
Models in the Loop: Aiding Crowdworkers with Generative Annotation
  Assistants
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants
Max Bartolo
Tristan Thrush
Sebastian Riedel
Pontus Stenetorp
Robin Jia
Douwe Kiela
19
33
0
16 Dec 2021
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Sarah Wiegreffe
Jack Hessel
Swabha Swayamdipta
Mark O. Riedl
Yejin Choi
21
142
0
16 Dec 2021
Measure and Improve Robustness in NLP Models: A Survey
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
In Search of Ambiguity: A Three-Stage Workflow Design to Clarify
  Annotation Guidelines for Crowd Workers
In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers
V. Pradhan
M. Schaekermann
Matthew Lease
21
12
0
04 Dec 2021
Annotators with Attitudes: How Annotator Beliefs And Identities Bias
  Toxic Language Detection
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
29
266
0
15 Nov 2021
Detecting Community Sensitive Norm Violations in Online Conversations
Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park
Julia Mendelsohn
Karthik Radhakrishnan
Kinjal Jain
Tushar Kanakagiri
David Jurgens
Yulia Tsvetkov
30
21
0
09 Oct 2021
Studying Up Machine Learning Data: Why Talk About Bias When We Mean
  Power?
Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?
Milagros Miceli
Julian Posada
Tianling Yang
14
60
0
16 Sep 2021
ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language
  Understanding
ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding
Sayan Ghosh
Shashank Srivastava
13
11
0
14 Sep 2021
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge
Yasumasa Onoe
Michael J.Q. Zhang
Eunsol Choi
Greg Durrett
HILM
27
85
0
03 Sep 2021
Semantic Answer Similarity for Evaluating Question Answering Models
Semantic Answer Similarity for Evaluating Question Answering Models
Julian Risch
Timo Moller
Julian Gutsch
M. Pietsch
ELM
30
67
0
13 Aug 2021
Context-aware Adversarial Training for Name Regularity Bias in Named
  Entity Recognition
Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition
Abbas Ghaddar
Philippe Langlais
Ahmad Rashid
Mehdi Rezagholizadeh
34
42
0
24 Jul 2021
Prompting Contrastive Explanations for Commonsense Reasoning Tasks
Prompting Contrastive Explanations for Commonsense Reasoning Tasks
Bhargavi Paranjape
Julian Michael
Marjan Ghazvininejad
Luke Zettlemoyer
Hannaneh Hajishirzi
ReLM
LRM
20
66
0
12 Jun 2021
SyGNS: A Systematic Generalization Testbed Based on Natural Language
  Semantics
SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
Hitomi Yanaka
K. Mineshima
Kentaro Inui
NAI
AI4CE
30
11
0
02 Jun 2021
Reliability Testing for Natural Language Processing Systems
Reliability Testing for Natural Language Processing Systems
Samson Tan
Shafiq R. Joty
K. Baxter
Araz Taeihagh
G. Bennett
Min-Yen Kan
13
38
0
06 May 2021
Hidden Biases in Unreliable News Detection Datasets
Hidden Biases in Unreliable News Detection Datasets
Xiang Zhou
Heba Elfardy
Christos Christodoulopoulos
Thomas Butler
Mohit Bansal
11
15
0
20 Apr 2021
Back to Square One: Artifact Detection, Training and Commonsense
  Disentanglement in the Winograd Schema
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
Yanai Elazar
Hongming Zhang
Yoav Goldberg
Dan Roth
ReLM
LRM
37
44
0
16 Apr 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the
  Compositional Consistency of GQA
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
17
33
0
17 Mar 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
672
0
06 Jan 2021
HateCheck: Functional Tests for Hate Speech Detection Models
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
29
259
0
31 Dec 2020
You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate
  Speech Detection
You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection
Prateek Chaudhry
Matthew Lease
25
7
0
16 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
24
119
0
30 Nov 2020
12
Next