ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00590
11
20

Building astroBERT, a language model for Astronomy & Astrophysics

1 December 2021
Félix Grèzes
Sergi Blanco-Cuaresma
Alberto Accomazzi
Michael J. Kurtz
Golnaz Shapurian
E. Henneken
C. Grant
Donna M. Thompson
Roman Chyla
Stephen McDonald
Timothy W. Hostetler
Matthew R. Templeton
Kelly E. Lockhart
N. Martinovic
Shinyi Chen
Christy Tanner
P. Protopapas
ArXivPDFHTML
Abstract

The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool. We present here our preliminary results and lessons learned.

View on arXiv
Comments on this paper