ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11564
2
0

HessFormer: Hessians at Foundation Scale

16 May 2025
Diego Granziol
ArXivPDFHTML
Abstract

Whilst there have been major advancements in the field of first order optimisation of deep learning models, where state of the art open source mixture of expert models go into the hundreds of billions of parameters, methods that rely on Hessian vector products, are still limited to run on a single GPU and thus cannot even work for models in the billion parameter range. We release a software package \textbf{HessFormer}, which integrates nicely with the well known Transformers package and allows for distributed hessian vector computation across a single node with multiple GPUs. Underpinning our implementation is a distributed stochastic lanczos quadrature algorithm, which we release for public consumption. Using this package we investigate the Hessian spectral density of the recent Deepseek 707070bn parameter model.

View on arXiv
@article{granziol2025_2505.11564,
  title={ HessFormer: Hessians at Foundation Scale },
  author={ Diego Granziol },
  journal={arXiv preprint arXiv:2505.11564},
  year={ 2025 }
}
Comments on this paper