Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild

30 January 2026

Alexander Loth

Martin Kappes

Marc-Oliver Pahl

HILM

ArXiv (abs)PDF HTML Github (4★)

Main:7 Pages

7 Figures

Bibliography:1 Pages

8 Tables

Appendix:4 Pages

Abstract

As foundation models (FMs) approach human-level fluency, distinguishing synthetic from organic content has become a key challenge for Trustworthy Web Intelligence.This paper presents JudgeGPT and RogueGPT, a dual-axis framework that decouples "authenticity" from "attribution" to investigate the mechanisms of human susceptibility. Analyzing 918 evaluations across five FMs (including GPT-4 and Llama-2), we employ Structural Causal Models (SCMs) as a principal framework for formulating testable causal hypotheses about detection accuracy.Contrary to partisan narratives, we find that political orientation shows a negligible association with detection performance ( $r=-0.10$ ). Instead, "fake news familiarity" emerges as a candidate mediator ( $r=0.35$ ), suggesting that exposure may function as adversarial training for human discriminators. We identify a "fluency trap" where GPT-4 outputs (HumanMachineScore: 0.20) bypass Source Monitoring mechanisms, rendering them indistinguishable from human text.These findings suggest that "pre-bunking" interventions should target cognitive source monitoring rather than demographic segmentation to ensure trustworthy information ecosystems.

View on arXiv

Comments on this paper