ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.01007
  4. Cited By
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

4 February 2019
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
ArXivPDFHTML

Papers citing "Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference"

50 / 307 papers shown
Title
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
53
19
0
14 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
14
0
31 Jul 2023
GPT-4 Can't Reason
GPT-4 Can't Reason
Konstantine Arkoudas
ELM
LRM
AI4MH
11
33
0
21 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
32
5
0
11 Jul 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
26
3
0
21 Jun 2023
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual
  Interactive Diagnosis through Data-Constrained Counterfactuals
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals
Robin Shing Moon Chan
Afra Amini
Mennatallah El-Assady
LRM
AAML
32
2
0
21 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
39
73
0
07 Jun 2023
Can current NLI systems handle German word order? Investigating language
  model performance on a new German challenge set of minimal pairs
Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs
Ines Reinig
K. Markert
16
0
0
07 Jun 2023
Improving neural network representations using human similarity
  judgments
Improving neural network representations using human similarity judgments
Lukas Muttenthaler
Lorenz Linhardt
Jonas Dippel
Robert A. Vandermeulen
Katherine L. Hermann
Andrew Kyle Lampinen
Simon Kornblith
40
29
0
07 Jun 2023
Measuring the Robustness of NLP Models to Domain Shifts
Measuring the Robustness of NLP Models to Domain Shifts
Nitay Calderon
Naveh Porat
Eyal Ben-David
Alexander Chapanin
Zorik Gekhman
Nadav Oved
Vitaly Shalumov
Roi Reichart
21
7
0
31 May 2023
What does the Failure to Reason with "Respectively" in Zero/Few-Shot
  Settings Tell Us about Language Models?
What does the Failure to Reason with "Respectively" in Zero/Few-Shot Settings Tell Us about Language Models?
Ruixiang Cui
Seolhwa Lee
Daniel Hershcovich
Anders Søgaard
30
2
0
31 May 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a
  Unified Automatic Robustness Evaluation Framework
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAML
ELM
27
6
0
29 May 2023
Out-of-Distribution Generalization in Text Classification: Past,
  Present, and Future
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
37
2
0
23 May 2023
Does ChatGPT have Theory of Mind?
Does ChatGPT have Theory of Mind?
B. Holterman
Kees van Deemter
LRM
AI4CE
36
22
0
23 May 2023
Understanding and Mitigating Spurious Correlations in Text
  Classification with Neighborhood Analysis
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis
Oscar Chew
Hsuan-Tien Lin
Kai-Wei Chang
Kuan-Hao Huang
34
5
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
57
9
0
23 May 2023
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization
Ting Wu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
41
2
0
20 May 2023
Can NLP Models Correctly Reason Over Contexts that Break the Common
  Assumptions?
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?
Neeraj Varshney
Mihir Parmar
Nisarg Patel
Divij Handa
Sayantan Sarkar
Man Luo
Chitta Baral
LRM
34
4
0
20 May 2023
Mitigating Backdoor Poisoning Attacks through the Lens of Spurious
  Correlation
Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation
Xuanli He
Qiongkai Xu
Jun Wang
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
32
18
0
19 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
34
26
0
15 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction
  Shortcuts of Question Answering Models
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
35
3
0
11 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
52
64
0
10 May 2023
PreCog: Exploring the Relation between Memorization and Performance in
  Pre-trained Language Models
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models
Leonardo Ranaldi
Elena Sofia Ruzzetti
Fabio Massimo Zanzotto
31
6
0
08 May 2023
Empowering Language Model with Guided Knowledge Fusion for Biomedical
  Document Re-ranking
Empowering Language Model with Guided Knowledge Fusion for Biomedical Document Re-ranking
D. Gupta
Dina Demner-Fushman
21
1
0
07 May 2023
Towards preserving word order importance through Forced Invalidation
Towards preserving word order importance through Forced Invalidation
Hadeel Al-Negheimish
Pranava Madhyastha
Alessandra Russo
19
3
0
11 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language
  Models
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
27
247
0
07 Apr 2023
Natural Language Reasoning, A Survey
Natural Language Reasoning, A Survey
Fei Yu
Hongbo Zhang
Prayag Tiwari
Benyou Wang
ReLM
LRM
49
51
0
26 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
Yanchen Liu
Jing Yang
Yan Chen
Jing Liu
Huaqin Wu
MoE
47
2
0
28 Feb 2023
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input
  Noises
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises
Chenglei Si
Zhengyan Zhang
Yingfa Chen
Xiaozhi Wang
Zhiyuan Liu
Maosong Sun
AAML
26
1
0
14 Feb 2023
Guide the Learner: Controlling Product of Experts Debiasing Method Based
  on Token Attribution Similarities
Guide the Learner: Controlling Product of Experts Debiasing Method Based on Token Attribution Similarities
Ali Modarressi
Hossein Amirkhani
Mohammad Taher Pilehvar
26
2
0
06 Feb 2023
Dissociating language and thought in large language models
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
29
209
0
16 Jan 2023
Training language models to summarize narratives improves brain
  alignment
Training language models to summarize narratives improves brain alignment
Khai Loong Aw
Mariya Toneva
30
24
0
21 Dec 2022
DISCO: Distilling Counterfactuals with Large Language Models
DISCO: Distilling Counterfactuals with Large Language Models
Zeming Chen
Qiyue Gao
Antoine Bosselut
Ashish Sabharwal
Kyle Richardson
31
25
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
40
44
0
20 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
364
0
19 Dec 2022
JEMMA: An Extensible Java Dataset for ML4Code Applications
JEMMA: An Extensible Java Dataset for ML4Code Applications
Anjan Karmakar
Miltiadis Allamanis
Romain Robbes
VLM
21
3
0
18 Dec 2022
Multi-Scales Data Augmentation Approach In Natural Language Inference
  For Artifacts Mitigation And Pre-Trained Model Optimization
Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization
Zhenyu Lu
18
1
0
16 Dec 2022
Assessing the Impact of Sequence Length Learning on Classification Tasks
  for Transformer Encoder Models
Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models
Jean-Thomas Baillargeon
Luc Lamontagne
32
1
0
16 Dec 2022
Feature-Level Debiased Natural Language Understanding
Feature-Level Debiased Natural Language Understanding
Yougang Lyu
Piji Li
Yechang Yang
Maarten de Rijke
Pengjie Ren
Yukun Zhao
Dawei Yin
Z. Ren
32
10
0
11 Dec 2022
Legal Prompt Engineering for Multilingual Legal Judgement Prediction
Legal Prompt Engineering for Multilingual Legal Judgement Prediction
Dietrich Trautmann
Alina Petrova
Frank Schilder
ELM
AILaw
33
74
0
05 Dec 2022
Event knowledge in large language models: the gap between the impossible
  and the unlikely
Event knowledge in large language models: the gap between the impossible and the unlikely
Carina Kauf
Anna A. Ivanova
Giulia Rambelli
Emmanuele Chersoni
Jingyuan Selena She
Zawad Chowdhury
Evelina Fedorenko
Alessandro Lenci
37
67
0
02 Dec 2022
Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
Wanqian Yang
Polina Kirichenko
Micah Goldblum
A. Wilson
DRL
27
10
0
28 Nov 2022
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data
  Sizes
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes
Yiqiao Jin
Xiting Wang
Y. Hao
Yizhou Sun
Xing Xie
38
11
0
24 Nov 2022
GLUE-X: Evaluating Natural Language Understanding Models from an
  Out-of-distribution Generalization Perspective
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
Linyi Yang
Shuibai Zhang
Libo Qin
Yafu Li
Yidong Wang
Hanmeng Liu
Jindong Wang
Xingxu Xie
Yue Zhang
ELM
41
79
0
15 Nov 2022
Capabilities for Better ML Engineering
Capabilities for Better ML Engineering
Chenyang Yang
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
24
3
0
11 Nov 2022
Towards Human-Centred Explainability Benchmarks For Text Classification
Towards Human-Centred Explainability Benchmarks For Text Classification
Viktor Schlegel
Erick Mendez Guzman
R. Batista-Navarro
18
5
0
10 Nov 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
39
1
0
08 Nov 2022
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in
  Natural Language Inference
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference
S. Rajaee
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
36
5
0
07 Nov 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks
LMentry: A Language Model Benchmark of Elementary Language Tasks
Avia Efrat
Or Honovich
Omer Levy
29
19
0
03 Nov 2022
Why is Winoground Hard? Investigating Failures in Visuolinguistic
  Compositionality
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Anuj Diwan
Layne Berry
Eunsol Choi
David Harwath
Kyle Mahowald
CoGe
108
41
0
01 Nov 2022
Previous
1234567
Next