Active Learning-Guided Seq2Seq Variational Autoencoder for Multi-target Inhibitor Generation

Simultaneously optimizing molecules against multiple therapeutic targets remains a profound challenge in drug discovery, particularly due to sparse rewards and conflicting design constraints. We propose a structured active learning (AL) paradigm integrating a sequence-to-sequence (Seq2Seq) variational autoencoder (VAE) into iterative loops designed to balance chemical diversity, molecular quality, and multi-target affinity. Our method alternates between expanding chemically feasible regions of latent space and progressively constraining molecules based on increasingly stringent multi-target docking thresholds. In a proof-of-concept study targeting three related coronavirus main proteases (SARS-CoV-2, SARS-CoV, MERS-CoV), our approach efficiently generated a structurally diverse set of pan-inhibitor candidates. We demonstrate that careful timing and strategic placement of chemical filters within this active learning pipeline markedly enhance exploration of beneficial chemical space, transforming the sparse-reward, multi-objective drug design problem into an accessible computational task. Our framework thus provides a generalizable roadmap for efficiently navigating complex polypharmacological landscapes.
View on arXiv@article{vilalta-mor2025_2506.15309, title={ Active Learning-Guided Seq2Seq Variational Autoencoder for Multi-target Inhibitor Generation }, author={ Júlia Vilalta-Mor and Alexis Molina and Laura Ortega Varga and Isaac Filella-Merce and Victor Guallar }, journal={arXiv preprint arXiv:2506.15309}, year={ 2025 } }