Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04618
Cited By
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
7 June 2023
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations"
20 / 70 papers shown
Title
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
75
1,529
0
06 Jul 2023
Measuring the Robustness of NLP Models to Domain Shifts
Nitay Calderon
Naveh Porat
Eyal Ben-David
Alexander Chapanin
Zorik Gekhman
Nadav Oved
Vitaly Shalumov
Roi Reichart
21
7
0
31 May 2023
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu
Xingxuan Zhang
Renzhe Xu
Jiashuo Liu
Yue He
Peng Cui
OOD
35
7
0
24 May 2023
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
Jiazhao Li
Yijin Yang
Zhuofeng Wu
V. Vydiswaran
Chaowei Xiao
SILM
67
42
0
27 Apr 2023
Are Prompt-based Models Clueless?
Pride Kavumba
Ryo Takahashi
Yusuke Oda
VLM
142
13
0
19 May 2022
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
218
1,663
0
15 Oct 2021
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
Tianlu Wang
Rohit Sridhar
Diyi Yang
Xuezhi Wang
AAML
120
72
0
14 Oct 2021
ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
Yuta Koreeda
Christopher D. Manning
AILaw
94
97
0
05 Oct 2021
Types of Out-of-Distribution Texts and How to Detect Them
Udit Arora
William Huang
He He
OODD
225
97
0
14 Sep 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
106
239
0
11 Sep 2021
ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding
Xing Wu
Chaochen Gao
Liangjun Zang
Jizhong Han
Zhongyuan Wang
Songlin Hu
SSL
AILaw
39
129
0
09 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,872
0
18 Apr 2021
Competency Problems: On Finding and Removing Artifacts in Language Data
Matt Gardner
William Merrill
Jesse Dodge
Matthew E. Peters
Alexis Ross
Sameer Singh
Noah A. Smith
171
107
0
17 Apr 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
137
0
13 Jan 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
77
0
30 Dec 2020
CrossNER: Evaluating Cross-Domain Named Entity Recognition
Zihan Liu
Yan Xu
Tiezheng Yu
Wenliang Dai
Ziwei Ji
Samuel Cahyawijaya
Andrea Madotto
Pascale Fung
78
146
0
08 Dec 2020
Posterior Differential Regularization with f-divergence for Improving Model Robustness
Hao Cheng
Xiaodong Liu
L. Pereira
Yaoliang Yu
Jianfeng Gao
248
31
0
23 Oct 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
232
438
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,996
0
20 Apr 2018
Previous
1
2