Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.14666
Cited By
RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems
29 December 2020
Baolin Peng
Chunyuan Li
Zhu Zhang
Chenguang Zhu
Jinchao Li
Jianfeng Gao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems"
34 / 34 papers shown
Title
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
Zhenyu Zhang
Bingguang Hao
Jinpeng Li
Zekai Zhang
Dongyan Zhao
33
0
0
16 Jun 2024
Generating Hard-Negative Out-of-Scope Data with ChatGPT for Intent Classification
Zhijian Li
Stefan Larson
Kevin Leach
OODD
37
1
0
08 Mar 2024
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future
Minzhi Li
Weiyan Shi
Caleb Ziems
Diyi Yang
41
9
0
28 Feb 2024
Noise-BERT: A Unified Perturbation-Robust Framework with Noise Alignment Pre-training for Noisy Slot Filling Task
Jinxu Zhao
Guanting Dong
Yueyan Qiu
Tingfeng Hui
Xiaoshuai Song
Daichi Guo
Weiran Xu
29
1
0
22 Feb 2024
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties
Ekaterina Artemova
Verena Blaschke
Barbara Plank
36
3
0
03 Feb 2024
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations
Ankita Gupta
Chulaka Gunasekara
H. Wan
Jatin Ganhotra
Sachindra Joshi
Marina Danilevsky
21
0
0
15 Nov 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
13
6
0
25 Oct 2023
DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task
Guanting Dong
Tingfeng Hui
Zhuoma Gongque
Jinxu Zhao
Daichi Guo
Gang Zhao
Keqing He
Weiran Xu
DiffM
11
7
0
16 Oct 2023
Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task
Guanting Dong
Jinxu Zhao
Tingfeng Hui
Daichi Guo
Wenlong Wan
...
Yueyan Qiu
Zhuoma Gongque
Keqing He
Zechen Wang
Weiran Xu
AAML
35
20
0
10 Oct 2023
Towards Robust and Generalizable Training: An Empirical Study of Noisy Slot Filling for Input Perturbations
Jiachi Liu
Liwen Wang
Guanting Dong
Xiaoshuai Song
Zechen Wang
...
Shanglin Lei
Jinzheng Zhao
Keqing He
Bo Xiao
Weiran Xu
35
6
0
05 Oct 2023
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents
Shuzheng Si
Wen-Cheng Ma
Haoyu Gao
Yuchuan Wu
Ting-En Lin
Yinpei Dai
Hangyu Li
Rui Yan
Fei Huang
Yongbin Li
AuLLM
42
28
0
22 May 2023
SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting
Xiaoying Zhang
Baolin Peng
Kun Li
Jingyan Zhou
Helen M. Meng
76
39
0
15 May 2023
Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Hai Ye
Yuyang Ding
Juntao Li
Hwee Tou Ng
OOD
TTA
29
9
0
09 Feb 2023
Sources of Noise in Dialogue and How to Deal with Them
Derek Chen
Zhou Yu
24
2
0
06 Dec 2022
CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation
Yinpei Dai
Wanwei He
Bowen Li
Yuchuan Wu
Zhen Cao
Zhongqi An
Jian Sun
Yongbin Li
ELM
ALM
41
12
0
21 Nov 2022
Are Current Task-oriented Dialogue Systems Able to Satisfy Impolite Users?
Zhiqiang Hu
Roy Ka-Wei Lee
Nancy F. Chen
32
4
0
24 Oct 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
37
18
0
14 Oct 2022
Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining
Asa Cooper Stickland
Sailik Sengupta
Jason Krone
Saab Mansour
He He
52
7
0
10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
127
94
0
06 Oct 2022
AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog
Tomávs Nekvinda
Ondrej Dusek
RALM
21
9
0
08 Sep 2022
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling
Guanting Dong
Daichi Guo
Liwen Wang
Xuefeng Li
Zechen Wang
...
Hao Lei
Xinyue Cui
Yi Huang
Junlan Feng
Weiran Xu
21
12
0
24 Aug 2022
"Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking
Léo Jacqmin
L. Rojas-Barahona
Benoit Favre
43
27
0
29 Jul 2022
A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog
Stefan Larson
Kevin Leach
41
20
0
26 Jul 2022
Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering
Shiquan Yang
Xinting Huang
Jey Han Lau
S. Erfani
17
5
0
20 May 2022
Towards Explanation for Unsupervised Graph-Level Representation Learning
Qinghua Zheng
Jihong Wang
Minnan Luo
Yaoliang Yu
Jundong Li
L. Yao
Xiao Chang
24
1
0
20 May 2022
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue
Alon Albalak
Yi-Lin Tuan
Pegah Jandaghi
Connor Pryor
Luke Yoffe
Deepak Ramachandran
Lise Getoor
Jay Pujara
William Yang Wang
21
14
0
12 May 2022
Toward Self-learning End-to-End Task-Oriented Dialog Systems
Xiaoying Zhang
Baolin Peng
Jianfeng Gao
Helen M. Meng
27
7
0
18 Jan 2022
Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics
Hyundong Justin Cho
Chinnadhurai Sankar
Christopher Lin
Kaushik Ram Sadagopan
Shahin Shayandeh
Asli Celikyilmaz
Jonathan May
Ahmad Beirami
60
10
0
15 Dec 2021
Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems
Manaal Faruqui
Dilek Z. Hakkani-Tür
32
21
0
10 Dec 2021
SYNERGY: Building Task Bots at Scale Using Symbolic Knowledge and Machine Teaching
Baolin Peng
Chunyuan Li
Zhu Zhang
Jinchao Li
Chenguang Zhu
Jianfeng Gao
73
3
0
21 Oct 2021
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations
Seokhwan Kim
Yang Liu
Di Jin
Alexandros Papangelis
Karthik Gopalakrishnan
Behnam Hedayatnia
Dilek Z. Hakkani-Tür
11
38
0
28 Sep 2021
ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
Lingyun Feng
Jianwei Yu
Deng Cai
Songxiang Liu
Haitao Zheng
Yan Wang
ELM
79
14
0
30 Aug 2021
Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering
Aditya Gupta
Jiacheng Xu
Shyam Upadhyay
Diyi Yang
Manaal Faruqui
37
33
0
08 Jun 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1