ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.07814
219
280

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

20 July 2018
Albert Reuther
J. Kepner
Chansup Byun
S. Samsi
William Arcand
David Bestor
Bill Bergeron
V. Gadepally
Michael Houle
Matthew Hubbell
Michael Jones
Anna Klein
Lauren Milechin
J. Mullen
Andrew Prout
Antonio Rosa
Charles Yee
Peter Michaleas
    LRM
    ReLM
ArXivPDFHTML
Abstract

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges - in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.

View on arXiv
Comments on this paper