ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.02549
20
487

MLPerf Inference Benchmark

6 November 2019
Vijayarāghava Reḍḍī
C. Cheng
David Kanter
Pete H Mattson
Guenther Schmuelling
Carole-Jean Wu
Brian Anderson
Maximilien Breughe
M. Charlebois
William Chou
Ramesh Chukka
Cody Coleman
S. Davis
Pan Deng
Greg Diamos
Jared Duke
D. Fick
J. Gardner
Itay Hubara
S. Idgunji
Thomas B. Jablin
J. Jiao
Tom St. John
Pankaj Kanwar
David Lee
Jeffery Liao
Anton Lokhmotov
Francisco Massa
Peng Meng
Paulius Micikevicius
C. Osborne
Gennady Pekhimenko
Arun Tejusve Raghunath Rajan
Dilip Sequeira
Ashish Sirasao
Fei Sun
Hanlin Tang
Michael Thomson
Frank Wei
E. Wu
Ling Xu
Koichiro Yamada
Bing Yu
George Y. Yuan
Aaron Zhong
P. Zhang
Yuchen Zhou
ArXivPDFHTML
Abstract

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.

View on arXiv
Comments on this paper