Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.02549
Cited By
MLPerf Inference Benchmark
6 November 2019
Vijayarāghava Reḍḍī
C. Cheng
David Kanter
Pete H Mattson
Guenther Schmuelling
Carole-Jean Wu
Brian Anderson
Maximilien Breughe
M. Charlebois
William Chou
Ramesh Chukka
Cody Coleman
S. Davis
Pan Deng
Greg Diamos
Jared Duke
D. Fick
J. Gardner
Itay Hubara
S. Idgunji
Thomas B. Jablin
J. Jiao
Tom St. John
Pankaj Kanwar
David Lee
Jeffery Liao
Anton Lokhmotov
Francisco Massa
Peng Meng
Paulius Micikevicius
C. Osborne
Gennady Pekhimenko
Arun Tejusve Raghunath Rajan
Dilip Sequeira
Ashish Sirasao
Fei Sun
Hanlin Tang
Michael Thomson
Frank Wei
E. Wu
Ling Xu
Koichiro Yamada
Bing Yu
George Y. Yuan
Aaron Zhong
P. Zhang
Yuchen Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MLPerf Inference Benchmark"
50 / 61 papers shown
Title
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang
Yao Fu
Yeqi Huang
Ping Nie
Zhan Lu
...
Dayou Du
Tairan Xu
Kai Zou
Edoardo Ponti
Luo Mai
MoE
17
0
0
16 May 2025
LithOS: An Operating System for Efficient Machine Learning on GPUs
Patrick H. Coppock
Brian Zhang
Eliot H. Solomon
Vasilis Kypriotis
Leon Yang
Bikash Sharma
Dan Schatzberg
Todd C. Mowry
Dimitrios Skarlatos
27
0
0
21 Apr 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
200
3
0
20 Nov 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
62
3
0
15 Oct 2024
Towards Cloud Efficiency with Large-scale Workload Characterization
Anjaly Parayil
Jue Zhang
Xiaoting Qin
Íñigo Goiri
Lexiang Huang
Timothy Zhu
Chetan Bansal
31
3
0
12 May 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
57
0
0
16 Apr 2024
Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision
Ahmed F. AbouElhamayed
Susanne Balle
Deshanand Singh
Mohamed S. Abdelfattah
3DH
27
0
0
02 Mar 2024
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems
Warren R. Williams
Ross Glandon
Luke L. Morris
Jingliang Cheng
VLM
19
0
0
27 Jul 2023
S
3
^{3}
3
: Increasing GPU Utilization during Generative Inference for Higher Throughput
Yunho Jin
Chun-Feng Wu
David Brooks
Gu-Yeon Wei
29
62
0
09 Jun 2023
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Seah Kim
Hasan Genç
Vadim Nikiforov
Krste Asanović
B. Nikolić
Y. Shao
19
18
0
10 May 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
28
27
0
19 Apr 2023
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems
Jason Yik
Korneel Van den Berghe
Douwe den Blanken
Younes Bouhadjar
Maxime Fabre
...
Fatima Tuz Zohora
Charlotte Frenkel
Vijay Janapa Reddi
Charlotte Frenkel
Vijay Janapa Reddi
25
17
0
10 Apr 2023
The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment
Jared Fernandez
Jacob Kahn
Clara Na
Yonatan Bisk
Emma Strubell
FedML
33
10
0
13 Feb 2023
Mixed Precision Post Training Quantization of Neural Networks with Sensitivity Guided Search
Clemens J. S. Schaefer
Elfie Guo
Caitlin Stanton
Xiaofan Zhang
T. Jablin
Navid Lambert-Shirzad
Jian Li
Chia-Wei Chou
Siddharth Joshi
Yu Wang
MQ
25
3
0
02 Feb 2023
MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs
Huaizheng Zhang
Yuanming Li
Wencong Xiao
Yizheng Huang
Xing Di
Jianxiong Yin
Simon See
Yong Luo
C. Lau
Yang You
VLM
16
3
0
01 Jan 2023
Quality at the Tail of Machine Learning Inference
Zhengxin Yang
Wanling Gao
Chunjie Luo
Lei Wang
Fei Tang
Xu Wen
Jianfeng Zhan
38
1
0
25 Dec 2022
Kernel-as-a-Service: A Serverless Interface to GPUs
Nathan Pemberton
Anton Zabreyko
Zhoujie Ding
R. Katz
Joseph E. Gonzalez
29
8
0
15 Dec 2022
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse
Hyoukjun Kwon
Krishnakumar Nair
Jamin Seo
Jason Yik
D. Mohapatra
...
Ashish Sirasao
T. Krishna
Harshit Khaitan
Vikas Chandra
Vijay Janapa Reddi
38
33
0
16 Nov 2022
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
22
11
0
12 Oct 2022
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Leandro von Werra
Lewis Tunstall
A. Thakur
A. Luccioni
Tristan Thrush
...
Julien Chaumond
Margaret Mitchell
Alexander M. Rush
Thomas Wolf
Douwe Kiela
ELM
23
24
0
30 Sep 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris
Stylianos I. Venieris
Stefanos Laskaridis
Nicholas D. Lane
42
8
0
27 Sep 2022
Understanding Time Variations of DNN Inference in Autonomous Driving
Liangkai Liu
Yanzhi Wang
Weisong Shi
AI4TS
AI4CE
23
6
0
12 Sep 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
29
25
0
23 Jul 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
14
6
0
12 May 2022
Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems
Shail Dave
Alberto Marchisio
Muhammad Abdullah Hanif
Amira Guesmi
Aviral Shrivastava
Ihsen Alouani
Muhammad Shafique
34
13
0
18 Apr 2022
Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models
Phyllis Ang
Bhuwan Dhingra
Lisa Wu Wills
25
6
0
15 Apr 2022
The MIT Supercloud Workload Classification Challenge
Benny J. Tang
Qiqi Chen
Matthew L. Weiss
Nathan C. Frey
Joseph McDonald
...
Lindsey McEvoy
Baolin Li
Devesh Tiwari
V. Gadepally
S. Samsi
11
2
0
12 Apr 2022
Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems
S. B. Dutta
Hoda Naghibijouybari
Arjun Gupta
Nael B. Abu-Ghazaleh
Andres Marquez
Kevin J. Barker
GNN
8
24
0
30 Mar 2022
Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment
Jemin Lee
Misun Yu
Yongin Kwon
Teaho Kim
MQ
17
17
0
10 Feb 2022
Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications
Davood Ghatreh Samani
M. Salehi
19
7
0
17 Dec 2021
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems
S. Farrell
M. Emani
J. Balma
L. Drescher
Aleksandr Drozd
...
Akihiro Tabuchi
V. Vishwanath
M. Wahib
Masafumi Yamazaki
Junqi Yin
VLM
32
35
0
21 Oct 2021
Pyxis: An Open-Source Performance Dataset of Sparse Accelerators
Linghao Song
Yuze Chi
Jason Cong
21
0
0
08 Oct 2021
Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads
Guin Gilman
R. Walls
GNN
BDL
36
17
0
01 Oct 2021
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation
Alexandros Karargyris
Renato Umeton
Micah J. Sheller
Alejandro Aristizabal
Johnu George
...
Poonam Yadav
Michael Rosenthal
M. Loda
Jason M. Johnson
Peter Mattson
FedML
46
72
0
29 Sep 2021
AI Accelerator Survey and Trends
Albert Reuther
Peter Michaleas
Michael Jones
V. Gadepally
S. Samsi
J. Kepner
42
79
0
18 Sep 2021
On the Accuracy of Analog Neural Network Inference Accelerators
T. Xiao
Ben Feinberg
C. Bennett
V. Prabhakar
Prashant Saxena
V. Agrawal
S. Agarwal
M. Marinella
30
32
0
03 Sep 2021
DFSynthesizer: Dataflow-based Synthesis of Spiking Neural Networks to Neuromorphic Hardware
Shihao Song
Harry Chong
Adarsha Balaji
Anup Das
J. Shackleford
Nagarajan Kandasamy
18
28
0
04 Aug 2021
Anchor-based Plain Net for Mobile Image Super-Resolution
Zongcai Du
Jie Liu
Jie Tang
Gangshan Wu
SupR
MQ
30
52
0
20 May 2021
Dynamic Reliability Management in Neuromorphic Computing
Shihao Song
Jui Hanamshet
Adarsha Balaji
Anup Das
J. Krichmar
N. Dutt
Nagarajan Kandasamy
F. Catthoor
23
23
0
05 May 2021
Faa
T
:
A
T
r
a
n
s
p
a
r
e
n
t
A
u
t
o
−
S
c
a
l
i
n
g
C
a
c
h
e
f
o
r
S
e
r
v
e
r
l
e
s
s
A
p
p
l
i
c
a
t
i
o
n
s
T: A Transparent Auto-Scaling Cache for Serverless Applications
T
:
A
T
r
an
s
p
a
re
n
t
A
u
t
o
−
S
c
a
l
in
g
C
a
c
h
e
f
or
S
er
v
er
l
ess
A
ppl
i
c
a
t
i
o
n
s
Francisco Romero
G. Chaudhry
Íñigo Goiri
Pragna Gopa
Paul Batum
N. Yadwadkar
Rodrigo Fonseca
Christos Kozyrakis
Ricardo Bianchini
60
111
0
28 Apr 2021
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats
Eric Qin
Geonhwa Jeong
William Won
Sheng-Chun Kao
Hyoukjun Kwon
Sudarshan Srinivasan
Dipankar Das
G. Moon
S. Rajamanickam
T. Krishna
27
18
0
18 Mar 2021
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier
Pierre Delaunay
Mirko Bronzi
Assya Trofimov
Brennan Nichyporuk
...
Dmitriy Serdyuk
Tal Arbel
C. Pal
Gaël Varoquaux
Pascal Vincent
26
148
0
01 Mar 2021
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
Bilge Acun
Matthew Murphy
Xiaodong Wang
Jade Nie
Carole-Jean Wu
K. Hazelwood
25
109
0
11 Nov 2020
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Michael Lui
Yavuz Yetim
Özgür Özkan
Zhuoran Zhao
Shin-Yeh Tsai
Carole-Jean Wu
Mark Hempstead
GNN
BDL
LRM
22
51
0
04 Nov 2020
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs
G. Fursin
14
7
0
02 Nov 2020
TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
R. David
Jared Duke
Advait Jain
Vijay Janapa Reddi
Nat Jeffries
...
Meghna Natraj
Shlomi Regev
Rocky Rhodes
Tiezhen Wang
Pete Warden
119
466
0
17 Oct 2020
Impact of Thermal Throttling on Long-Term Visual Inference in a CPU-based Edge Device
Théo Benoit-Cattin
Delia Velasco-Montero
Jorge Fernández-Berni
11
25
0
13 Oct 2020
The Hardware Lottery
Sara Hooker
27
203
0
14 Sep 2020
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Shauharda Khadka
Estelle Aflalo
Mattias Marder
Avrech Ben-David
Santiago Miret
Shie Mannor
Tamir Hazan
Hanlin Tang
Somdeb Majumdar
GNN
27
11
0
14 Jul 2020
1
2
Next