11
0

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation

Main:10 Pages
7 Figures
Bibliography:2 Pages
Abstract

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for high-quality data. Synthesize data has emerged as a mainstream solution, demonstrating impressive performance in areas such as images, audio, and video. Generating mixed-type data, especially high-quality tabular data, still faces significant challenges. These primarily include its inherent heterogeneous data types, complex inter-variable relationships, and intricate column-wise distributions. In this paper, we introduce CausalDiffTab, a diffusion model-based generative model specifically designed to handle mixed tabular data containing both numerical and categorical features, while being more flexible in capturing complex interactions among variables. We further propose a hybrid adaptive causal regularization method based on the principle of Hierarchical Prior Fusion. This approach adaptively controls the weight of causal regularization, enhancing the model's performance without compromising its generative capabilities. Comprehensive experiments conducted on seven datasets demonstrate that CausalDiffTab outperforms baseline methods across all metrics. Our code is publicly available at:this https URL.

View on arXiv
@article{zhang2025_2506.14206,
  title={ CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation },
  author={ Jia-Chen Zhang and Zheng Zhou and Yu-Jie Xiong and Chun-Ming Xia and Fei Dai },
  journal={arXiv preprint arXiv:2506.14206},
  year={ 2025 }
}
Comments on this paper