ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.16565
39
0

Gene42: Long-Range Genomic Foundation Model With Dense Attention

20 March 2025
Kirill Vishniakov
Boulbaba Ben Amor
Engin Tekin
Nancy A. ElNaker
Karthik Viswanathan
Aleksandr Medvedev
A. Singh
Maryam Nadeem
Mohammad Amaan Sayeed
Praveenkumar Kanithi
Tiago Magalhaes
Natalia Vassilieva
Dwarikanath Mahapatra
Marco Pimentel
and Shadab Khan
    3DV
ArXivPDFHTML
Abstract

We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available atthis http URL.

View on arXiv
@article{vishniakov2025_2503.16565,
  title={ Gene42: Long-Range Genomic Foundation Model With Dense Attention },
  author={ Kirill Vishniakov and Boulbaba Ben Amor and Engin Tekin and Nancy A. ElNaker and Karthik Viswanathan and Aleksandr Medvedev and Aahan Singh and Maryam Nadeem and Mohammad Amaan Sayeed and Praveenkumar Kanithi and Tiago Magalhaes and Natalia Vassilieva and Dwarikanath Mahapatra and Marco Pimentel and and Shadab Khan },
  journal={arXiv preprint arXiv:2503.16565},
  year={ 2025 }
}
Comments on this paper