Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion
- DiffM

Breakthroughs in high-accuracy protein structure prediction, such as AlphaFold, have established receptor-based molecule design as a critical driver for rapid early-phase drug discovery. However, most approaches still struggle to balance pocket-specific geometric fit with strict valence and synthetic constraints. To resolve this trade-off, a Retrieval-Enhanced Aligned Diffusion termed READ is introduced, which is the first to merge molecular Retrieval-Augmented Generation with an SE(3)-equivariant diffusion model. Specifically, a contrastively pre-trained encoder aligns atom-level representations during training, then retrieves graph embeddings of pocket-matched scaffolds to guide each reverse-diffusion step at inference. This single mechanism can inject real-world chemical priors exactly where needed, producing valid, diverse, and shape-complementary ligands. Experimental results demonstrate that READ can achieve very competitive performance in CBGBench, surpassing state-of-the-art generative models and even native ligands. That suggests retrieval and diffusion can be co-optimized for faster, more reliable structure-based drug design.
View on arXiv@article{xu2025_2506.14488, title={ Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion }, author={ Dong Xu and Zhangfan Yang and Ka-chun Wong and Zexuan Zhu and Jiangqiang Li and Junkai Ji }, journal={arXiv preprint arXiv:2506.14488}, year={ 2025 } }