ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.00152
40
3

Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

28 February 2025
Keqiang Yan
Xiner Li
Hongyi Ling
Kenna Ashen
Carl Edwards
Raymundo Arroyave
Marinka Zitnik
Heng Ji
Xiaofeng Qian
X. Qian
Shuiwang Ji
ArXivPDFHTML
Abstract

We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence, thereby provably achieving SE(3) and periodic invariance. Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.

View on arXiv
@article{yan2025_2503.00152,
  title={ Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation },
  author={ Keqiang Yan and Xiner Li and Hongyi Ling and Kenna Ashen and Carl Edwards and Raymundo Arróyave and Marinka Zitnik and Heng Ji and Xiaofeng Qian and Xiaoning Qian and Shuiwang Ji },
  journal={arXiv preprint arXiv:2503.00152},
  year={ 2025 }
}
Comments on this paper