RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems

29 December 2020

Papers citing "RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems"

34 / 34 papers shown

Title
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models Zhenyu Zhang Bingguang Hao Jinpeng Li Zekai Zhang Dongyan Zhao 33 0 0 16 Jun 2024
Generating Hard-Negative Out-of-Scope Data with ChatGPT for Intent Classification Zhijian Li Stefan Larson Kevin Leach OODD 37 1 0 08 Mar 2024
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future Minzhi Li Weiyan Shi Caleb Ziems Diyi Yang 41 9 0 28 Feb 2024
Noise-BERT: A Unified Perturbation-Robust Framework with Noise Alignment Pre-training for Noisy Slot Filling Task Jinxu Zhao Guanting Dong Yueyan Qiu Tingfeng Hui Xiaoshuai Song Daichi Guo Weiran Xu 29 1 0 22 Feb 2024
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties Ekaterina Artemova Verena Blaschke Barbara Plank 36 3 0 03 Feb 2024
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations Ankita Gupta Chulaka Gunasekara H. Wan Jatin Ganhotra Sachindra Joshi Marina Danilevsky 21 0 0 15 Nov 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Marek Kubis Pawel Skórzewski Marcin Sowañski Tomasz Ziętkiewicz 13 6 0 25 Oct 2023
DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task Guanting Dong Tingfeng Hui Zhuoma Gongque Jinxu Zhao Daichi Guo Gang Zhao Keqing He Weiran Xu DiffM 11 7 0 16 Oct 2023
Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task Guanting Dong Jinxu Zhao Tingfeng Hui Daichi Guo Wenlong Wan ... Yueyan Qiu Zhuoma Gongque Keqing He Zechen Wang Weiran Xu AAML 35 20 0 10 Oct 2023
Towards Robust and Generalizable Training: An Empirical Study of Noisy Slot Filling for Input Perturbations Jiachi Liu Liwen Wang Guanting Dong Xiaoshuai Song Zechen Wang ... Shanglin Lei Jinzheng Zhao Keqing He Bo Xiao Weiran Xu 35 6 0 05 Oct 2023
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents Shuzheng Si Wen-Cheng Ma Haoyu Gao Yuchuan Wu Ting-En Lin Yinpei Dai Hangyu Li Rui Yan Fei Huang Yongbin Li AuLLM 42 28 0 22 May 2023
SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting Xiaoying Zhang Baolin Peng Kun Li Jingyan Zhou Helen M. Meng 76 39 0 15 May 2023
Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study Hai Ye Yuyang Ding Juntao Li Hwee Tou Ng OOD TTA 29 9 0 09 Feb 2023
Sources of Noise in Dialogue and How to Deal with Them Derek Chen Zhou Yu 24 2 0 06 Dec 2022
CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation Yinpei Dai Wanwei He Bowen Li Yuchuan Wu Zhen Cao Zhongqi An Jian Sun Yongbin Li ELM ALM 41 12 0 21 Nov 2022
Are Current Task-oriented Dialogue Systems Able to Satisfy Impolite Users? Zhiqiang Hu Roy Ka-Wei Lee Nancy F. Chen 32 4 0 24 Oct 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers Stefan Larson Gordon Lim Yutong Ai David Kuang Kevin Leach OODD OOD 37 18 0 14 Oct 2022
Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining Asa Cooper Stickland Sailik Sengupta Jason Krone Saab Mansour He He 52 7 0 10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review Dieuwke Hupkes Mario Giulianelli Verna Dankers Mikel Artetxe Yanai Elazar ... Leila Khalatbari Maria Ryskina Rita Frieske Ryan Cotterell Zhijing Jin 127 94 0 06 Oct 2022
AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog Tomávs Nekvinda Ondrej Dusek RALM 21 9 0 08 Sep 2022
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling Guanting Dong Daichi Guo Liwen Wang Xuefeng Li Zechen Wang ... Hao Lei Xinyue Cui Yi Huang Junlan Feng Weiran Xu 21 12 0 24 Aug 2022
"Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking Léo Jacqmin L. Rojas-Barahona Benoit Favre 43 27 0 29 Jul 2022
A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog Stefan Larson Kevin Leach 41 20 0 26 Jul 2022
Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering Shiquan Yang Xinting Huang Jey Han Lau S. Erfani 17 5 0 20 May 2022
Towards Explanation for Unsupervised Graph-Level Representation Learning Qinghua Zheng Jihong Wang Minnan Luo Yaoliang Yu Jundong Li L. Yao Xiao Chang 24 1 0 20 May 2022
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue Alon Albalak Yi-Lin Tuan Pegah Jandaghi Connor Pryor Luke Yoffe Deepak Ramachandran Lise Getoor Jay Pujara William Yang Wang 21 14 0 12 May 2022
Toward Self-learning End-to-End Task-Oriented Dialog Systems Xiaoying Zhang Baolin Peng Jianfeng Gao Helen M. Meng 27 7 0 18 Jan 2022
Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics Hyundong Justin Cho Chinnadhurai Sankar Christopher Lin Kaushik Ram Sadagopan Shahin Shayandeh Asli Celikyilmaz Jonathan May Ahmad Beirami 60 10 0 15 Dec 2021
Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems Manaal Faruqui Dilek Z. Hakkani-Tür 32 21 0 10 Dec 2021
SYNERGY: Building Task Bots at Scale Using Symbolic Knowledge and Machine Teaching Baolin Peng Chunyuan Li Zhu Zhang Jinchao Li Chenguang Zhu Jianfeng Gao 73 3 0 21 Oct 2021
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations Seokhwan Kim Yang Liu Di Jin Alexandros Papangelis Karthik Gopalakrishnan Behnam Hedayatnia Dilek Z. Hakkani-Tür 11 38 0 28 Sep 2021
ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding Lingyun Feng Jianwei Yu Deng Cai Songxiang Liu Haitao Zheng Yan Wang ELM 79 14 0 30 Aug 2021
Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering Aditya Gupta Jiacheng Xu Shyam Upadhyay Diyi Yang Manaal Faruqui 37 33 0 08 Jun 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 299 6,984 0 20 Apr 2018