Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

17 March 2026

Khizar Hussain

Bradley A. Malin

Zhijun Yin

Susannah Leigh Rose

Murat Kantarcioglu

AILaw

AI4MH

ELM

ArXiv (abs)PDF HTML Github

Main:8 Pages

2 Figures

Bibliography:2 Pages

15 Tables

Appendix:4 Pages

Abstract

As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinations and omissions has become critical for user safety. However, state-of-the-art LLM-as-a-judge methods often fail in high-risk healthcarecontexts, where subtle errors can have serious consequences. We show that leading LLM judges achieve only 52% accuracy on mental health counseling data, with some hallucination detection approaches exhibiting near-zero recall. We identify the root causeas LLMs' inability to capture nuanced linguistic and therapeutic patterns recognized by domain experts. To address this, we propose a framework that integrates human expertise with LLMs to extract interpretable, domain-informed features across fiveanalytical dimensions: logical consistency, entity verification, factual accuracy, linguistic uncertainty, and professional appropriateness. Experiments on a public mental health dataset and a new human-annotated dataset show that traditional machinelearning models trained on these features achieve 0.717 F1 on our custom dataset and 0.849 F1 on a public benchmark for hallucination detection, with 0.59-0.64 F1 for omission detection across both datasets. Our results demonstrate that combining domainexpertise with automated methods yields more reliable and transparent evaluation than black-box LLM judging in high-stakes mental health applications.

View on arXiv

Comments on this paper