CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

While accurate and user-friendly Computer-Aided Design (CAD) is crucial for industrial design and manufacturing, existing methods still struggle to achieve this due to their over-simplified representations or architectures incapable of supporting multimodal design requirements. In this paper, we attempt to tackle this problem from both methods and datasets aspects. First, we propose a cascade MAR with topology predictor (CMT), the first multimodal framework for CAD generation based on Boundary Representation (B-Rep). Specifically, the cascade MAR can effectively capture the ``edge-counters-surface'' priors that are essential in B-Reps, while the topology predictor directly estimates topology in B-Reps from the compact tokens in MAR. Second, to facilitate large-scale training, we develop a large-scale multimodal CAD dataset, mmABC, which includes over 1.3 million B-Rep models with multimodal annotations, including point clouds, text descriptions, and multi-view images. Extensive experiments show the superior of CMT in both conditional and unconditional CAD generation tasks. For example, we improve Coverage and Valid ratio by +10.68% and +10.3%, respectively, compared to state-of-the-art methods on ABC in unconditional generation. CMT also improves +4.01 Chamfer on image conditioned CAD generation on mmABC. The dataset, code and pretrained network shall be released.
View on arXiv@article{wu2025_2504.20830, title={ CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation }, author={ Jianyu Wu and Yizhou Wang and Xiangyu Yue and Xinzhu Ma and Jingyang Guo and Dongzhan Zhou and Wanli Ouyang and Shixiang Tang }, journal={arXiv preprint arXiv:2504.20830}, year={ 2025 } }