ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.04118
  4. Cited By
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

8 May 2020
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
    ELM
ArXivPDFHTML

Papers citing "Beyond Accuracy: Behavioral Testing of NLP models with CheckList"

50 / 664 papers shown
Title
A Closer Look at Classification Evaluation Metrics and a Critical
  Reflection of Common Evaluation Practice
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice
Juri Opitz
22
16
0
25 Apr 2024
Interactive Analysis of LLMs using Meaningful Counterfactuals
Interactive Analysis of LLMs using Meaningful Counterfactuals
Furui Cheng
Vilém Zouhar
Robin Shing Moon Chan
Daniel Fürst
Hendrik Strobelt
Mennatallah El-Assady
21
10
0
23 Apr 2024
MisgenderMender: A Community-Informed Approach to Interventions for
  Misgendering
MisgenderMender: A Community-Informed Approach to Interventions for Misgendering
Tamanna Hossain
Sunipa Dev
Sameer Singh
32
5
0
23 Apr 2024
IMO: Greedy Layer-Wise Sparse Representation Learning for
  Out-of-Distribution Text Classification with Pre-trained Models
IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models
Tao Feng
Lizhen Qu
Zhuang Li
Haolan Zhan
Yuncheng Hua
Gholamreza Haffari
VLM
37
1
0
21 Apr 2024
Cross-Problem Learning for Solving Vehicle Routing Problems
Cross-Problem Learning for Solving Vehicle Routing Problems
Zhuoyi Lin
Yaoxin Wu
Bangjian Zhou
Zhiguang Cao
Wen Song
Yingqian Zhang
Senthilnath Jayavelu
46
10
0
17 Apr 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large
  Language Models
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Minfeng Zhu
Wei Zhang
Wei Chen
42
3
0
12 Apr 2024
Goal-guided Generative Prompt Injection Attack on Large Language Models
Goal-guided Generative Prompt Injection Attack on Large Language Models
Chong Zhang
Mingyu Jin
Qinkai Yu
Chengzhi Liu
Haochen Xue
Xiaobo Jin
AAML
SILM
42
10
0
06 Apr 2024
The Impact of Unstated Norms in Bias Analysis of Language Models
The Impact of Unstated Norms in Bias Analysis of Language Models
Farnaz Kohankhaki
D. B. Emerson
David B. Emerson
Laleh Seyyed-Kalantari
Faiza Khan Khattak
60
1
0
04 Apr 2024
Estimating the Causal Effects of Natural Logic Features in
  Transformer-Based NLI Models
Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models
Julia Rozanova
Marco Valentino
André Freitas
CML
34
1
0
03 Apr 2024
Machine Learning Robustness: A Primer
Machine Learning Robustness: A Primer
Houssem Ben Braiek
Foutse Khomh
AAML
OOD
36
5
0
01 Apr 2024
Benchmark Transparency: Measuring the Impact of Data on Evaluation
Benchmark Transparency: Measuring the Impact of Data on Evaluation
Venelin Kovatchev
Matthew Lease
32
3
0
31 Mar 2024
Towards a Framework for Evaluating Explanations in Automated Fact
  Verification
Towards a Framework for Evaluating Explanations in Automated Fact Verification
Neema Kotonya
Francesca Toni
32
5
0
29 Mar 2024
Targeted Visualization of the Backbone of Encoder LLMs
Targeted Visualization of the Backbone of Encoder LLMs
Isaac Roberts
Alexander Schulz
L. Hermes
Barbara Hammer
37
0
0
26 Mar 2024
"It is there, and you need it, so why do you not use it?" Achieving
  better adoption of AI systems by domain experts, in the case study of natural
  science research
"It is there, and you need it, so why do you not use it?" Achieving better adoption of AI systems by domain experts, in the case study of natural science research
Auste Simkute
Ewa Luger
Michael Evans
Rhianne Jones
29
1
0
25 Mar 2024
ChatGPT Incorrectness Detection in Software Reviews
ChatGPT Incorrectness Detection in Software Reviews
M. Tanzil
Junaed Younus Khan
Gias Uddin
19
4
0
25 Mar 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
58
0
0
13 Mar 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
  Models for PowerPoint Task Completion
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Zekai Zhang
Yiduo Guo
Yaobo Liang
Dongyan Zhao
Nan Duan
43
1
0
06 Mar 2024
Improving Open-Ended Text Generation via Adaptive Decoding
Improving Open-Ended Text Generation via Adaptive Decoding
Wenhong Zhu
Hong-ping Hao
Zhiwei He
Yiming Ai
Rui Wang
31
6
0
28 Feb 2024
Farsight: Fostering Responsible AI Awareness During AI Application
  Prototyping
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
40
43
0
23 Feb 2024
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech
  Detection?
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Yiping Jin
Leo Wanner
A. Shvets
21
2
0
23 Feb 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
50
30
0
21 Feb 2024
RITFIS: Robust input testing framework for LLMs-based intelligent
  software
RITFIS: Robust input testing framework for LLMs-based intelligent software
Ming-Ming Xiao
Yan Xiao
Hai Dong
Shunhui Ji
Pengcheng Zhang
AAML
47
5
0
21 Feb 2024
Investigating the Impact of Model Instability on Explanations and
  Uncertainty
Investigating the Impact of Model Instability on Explanations and Uncertainty
Sara Vera Marjanović
Isabelle Augenstein
Christina Lioma
AAML
45
0
0
20 Feb 2024
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu
Mingqi Gao
Sen Hu
Yang Zhang
Yicheng Chen
Teng Xu
Xiaojun Wan
AAML
ELM
44
22
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
43
13
0
19 Feb 2024
Contrastive Instruction Tuning
Contrastive Instruction Tuning
Tianyi Yan
Fei Wang
James Y. Huang
Wenxuan Zhou
Fan Yin
Aram Galstyan
Wenpeng Yin
Muhao Chen
ALM
23
5
0
17 Feb 2024
Arabic Synonym BERT-based Adversarial Examples for Text Classification
Arabic Synonym BERT-based Adversarial Examples for Text Classification
Norah M. Alshahrani
Saied Alshahrani
Esma Wali
Jeanna Neefe Matthews
AAML
22
5
0
05 Feb 2024
eXplainable Bayesian Multi-Perspective Generative Retrieval
eXplainable Bayesian Multi-Perspective Generative Retrieval
EuiYul Song
Philhoon Oh
Sangryul Kim
James Thorne
BDL
35
0
0
04 Feb 2024
How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?
How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?
Rheeya Uppaal
Yixuan Li
Junjie Hu
37
4
0
31 Jan 2024
Conditional and Modal Reasoning in Large Language Models
Conditional and Modal Reasoning in Large Language Models
Wesley H. Holliday
M. Mandelkern
Cedegao E. Zhang
LRM
29
5
0
30 Jan 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the
  Fragility of NLI Models
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
45
9
0
25 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min-Bin Lin
MLLM
32
14
0
22 Jan 2024
How the Advent of Ubiquitous Large Language Models both Stymie and
  Turbocharge Dynamic Adversarial Question Generation
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
30
0
0
20 Jan 2024
Evaluating LLMs' Mathematical and Coding Competency through
  Ontology-guided Interventions
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions
Pengfei Hong
Navonil Majumder
Deepanway Ghosal
Somak Aditya
Rada Mihalcea
Soujanya Poria
LRM
45
4
0
17 Jan 2024
IDoFew: Intermediate Training Using Dual-Clustering in Language Models
  for Few Labels Text Classification
IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification
Abdullah Alsuhaibani
Hamad Zogan
Imran Razzak
Shoaib Jameel
Guandong Xu
30
4
0
08 Jan 2024
Towards Faithful Explanations for Text Classification with Robustness
  Improvement and Explanation Guided Training
Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training
Dongfang Li
Baotian Hu
Qingcai Chen
Shan He
34
4
0
29 Dec 2023
Navigating the Structured What-If Spaces: Counterfactual Generation via
  Structured Diffusion
Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion
Nishtha Madaan
Srikanta J. Bedathur
DiffM
38
0
0
21 Dec 2023
Taxonomy-based CheckList for Large Language Model Evaluation
Taxonomy-based CheckList for Large Language Model Evaluation
Damin Zhang
22
0
0
15 Dec 2023
Discovering Highly Influential Shortcut Reasoning: An Automated
  Template-Free Approach
Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach
Daichi Haraguchi
Kiyoaki Shirai
Naoya Inoue
Natthawut Kertkeidkachorn
LRM
8
0
0
15 Dec 2023
Dissecting vocabulary biases datasets through statistical testing and
  automated data augmentation for artifact mitigation in Natural Language
  Inference
Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference
Dat Thanh Nguyen
14
0
0
14 Dec 2023
Humans vs Large Language Models: Judgmental Forecasting in an Era of
  Advanced AI
Humans vs Large Language Models: Judgmental Forecasting in an Era of Advanced AI
Mahdi Abolghasemi
Odkhishig Ganbold
Kristian Rotaru
38
8
0
12 Dec 2023
Augmenty: A Python Library for Structured Text Augmentation
Augmenty: A Python Library for Structured Text Augmentation
Kenneth Enevoldsen
23
0
0
09 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
30
204
0
04 Dec 2023
Challenges of Large Language Models for Mental Health Counseling
Challenges of Large Language Models for Mental Health Counseling
N. C. Chung
George C. Dyer
L. Brocki
LM&MA
AI4MH
73
14
0
23 Nov 2023
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for
  Evolving LLM APIs
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs
Wanqin Ma
Chenyang Yang
Christian Kastner
24
20
0
18 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
31
3
0
16 Nov 2023
Show Your Work with Confidence: Confidence Bands for Tuning Curves
Show Your Work with Confidence: Confidence Bands for Tuning Curves
Nicholas Lourie
Kyunghyun Cho
He He
20
2
0
16 Nov 2023
Explore Spurious Correlations at the Concept Level in Language Models
  for Text Classification
Explore Spurious Correlations at the Concept Level in Language Models for Text Classification
Yuhang Zhou
Paiheng Xu
Xiaoyu Liu
Bang An
Wei Ai
Furong Huang
LRM
71
20
0
15 Nov 2023
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
  Language Models
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Language Models
Yibo Wang
Xiangjue Dong
James Caverlee
Philip S. Yu
26
2
0
14 Nov 2023
Functionality learning through specification instructions
Functionality learning through specification instructions
Pedro Henrique Luz de Araujo
Benjamin Roth
ELM
38
0
0
14 Nov 2023
Previous
123456...121314
Next