5
0

Tokenizing Electron Cloud in Protein-Ligand Interaction Learning

Abstract

The affinity and specificity of protein-molecule binding directly impact functional outcomes, uncovering the mechanisms underlying biological regulation and signal transduction. Most deep-learning-based prediction approaches focus on structures of atoms or fragments. However, quantum chemical properties, such as electronic structures, are the key to unveiling interaction patterns but remain largely underexplored. To bridge this gap, we propose ECBind, a method for tokenizing electron cloud signals into quantized embeddings, enabling their integration into downstream tasks such as binding affinity prediction. By incorporating electron densities, ECBind helps uncover binding modes that cannot be fully represented by atom-level models. Specifically, to remove the redundancy inherent in electron cloud signals, a structure-aware transformer and hierarchical codebooks encode 3D binding sites enriched with electron structures into tokens. These tokenized codes are then used for specific tasks with labels. To extend its applicability to a wider range of scenarios, we utilize knowledge distillation to develop an electron-cloud-agnostic prediction model. Experimentally, ECBind demonstrates state-of-the-art performance across multiple tasks, achieving improvements of 6.42\% and 15.58\% in per-structure Pearson and Spearman correlation coefficients, respectively.

View on arXiv
@article{lin2025_2505.19014,
  title={ Tokenizing Electron Cloud in Protein-Ligand Interaction Learning },
  author={ Haitao Lin and Odin Zhang and Jia Xu and Yunfan Liu and Zheng Cheng and Lirong Wu and Yufei Huang and Zhifeng Gao and Stan Z. Li },
  journal={arXiv preprint arXiv:2505.19014},
  year={ 2025 }
}
Comments on this paper