Papers citing "MLPerf Inference Benchmark"

50 / 61 papers shown

Title
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Yinsicheng Jiang Yao Fu Yeqi Huang Ping Nie Zhan Lu ... Dayou Du Tairan Xu Kai Zou Edoardo Ponti Luo Mai MoE 17 0 0 16 May 2025
LithOS: An Operating System for Efficient Machine Learning on GPUs Patrick H. Coppock Brian Zhang Eliot H. Solomon Vasilis Kypriotis Leon Yang Bikash Sharma Dan Schatzberg Todd C. Mowry Dimitrios Skarlatos 27 0 0 21 Apr 2025
Model Lakes Koyena Pal David Bau Renée J. Miller 67 0 0 24 Feb 2025
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training Jared Fernandez Luca Wehrstedt Leonid Shamis Mostafa Elhoushi Kalyan Saladi Yonatan Bisk Emma Strubell Jacob Kahn 200 3 0 20 Nov 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI Arya Tschand Arun Tejusve Raghunath Rajan S. Idgunji Anirban Ghosh J. Holleman ... Rowan Taubitz Sean Zhan Scott Wasson David Kanter Vijay Janapa Reddi 62 3 0 15 Oct 2024
Towards Cloud Efficiency with Large-scale Workload Characterization Anjaly Parayil Jue Zhang Xiaoting Qin Íñigo Goiri Lexiang Huang Timothy Zhu Chetan Bansal 31 3 0 12 May 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey Noah Lewis J. L. Bez Suren Byna 57 0 0 16 Apr 2024
Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision Ahmed F. AbouElhamayed Susanne Balle Deshanand Singh Mohamed S. Abdelfattah 3DH 27 0 0 02 Mar 2024
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems Warren R. Williams Ross Glandon Luke L. Morris Jingliang Cheng VLM 19 0 0 27 Jul 2023
$S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput$ S $^{3}$ : Increasing GPU Utilization during Generative Inference for Higher Throughput Yunho Jin Chun-Feng Wu David Brooks Gu-Yeon Wei 29 62 0 09 Jun 2023
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks Seah Kim Hasan Genç Vadim Nikiforov Krste Asanović B. Nikolić Y. Shao 19 18 0 10 May 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service Baolin Li S. Samsi V. Gadepally Devesh Tiwari 28 27 0 19 Apr 2023
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems Jason Yik Korneel Van den Berghe Douwe den Blanken Younes Bouhadjar Maxime Fabre ... Fatima Tuz Zohora Charlotte Frenkel Vijay Janapa Reddi Charlotte Frenkel Vijay Janapa Reddi 25 17 0 10 Apr 2023
The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment Jared Fernandez Jacob Kahn Clara Na Yonatan Bisk Emma Strubell FedML 33 10 0 13 Feb 2023
Mixed Precision Post Training Quantization of Neural Networks with Sensitivity Guided Search Clemens J. S. Schaefer Elfie Guo Caitlin Stanton Xiaofan Zhang T. Jablin Navid Lambert-Shirzad Jian Li Chia-Wei Chou Siddharth Joshi Yu Wang MQ 25 3 0 02 Feb 2023
MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs Huaizheng Zhang Yuanming Li Wencong Xiao Yizheng Huang Xing Di Jianxiong Yin Simon See Yong Luo C. Lau Yang You VLM 16 3 0 01 Jan 2023
Quality at the Tail of Machine Learning Inference Zhengxin Yang Wanling Gao Chunjie Luo Lei Wang Fei Tang Xu Wen Jianfeng Zhan 38 1 0 25 Dec 2022
Kernel-as-a-Service: A Serverless Interface to GPUs Nathan Pemberton Anton Zabreyko Zhoujie Ding R. Katz Joseph E. Gonzalez 29 8 0 15 Dec 2022
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse Hyoukjun Kwon Krishnakumar Nair Jamin Seo Jason Yik D. Mohapatra ... Ashish Sirasao T. Krishna Harshit Khaitan Vikas Chandra Vijay Janapa Reddi 38 33 0 16 Nov 2022
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources Baolin Li S. Samsi V. Gadepally Devesh Tiwari 22 11 0 12 Oct 2022
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Leandro von Werra Lewis Tunstall A. Thakur A. Luccioni Tristan Thrush ... Julien Chaumond Margaret Mitchell Alexander M. Rush Thomas Wolf Douwe Kiela ELM 23 24 0 30 Sep 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs Alexandros Kouris Stylianos I. Venieris Stefanos Laskaridis Nicholas D. Lane 42 8 0 27 Sep 2022
Understanding Time Variations of DNN Inference in Autonomous Driving Liangkai Liu Yanzhi Wang Weisong Shi AI4TS AI4CE 23 6 0 12 Sep 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances Baolin Li Rohan Basu Roy Tirthak Patel V. Gadepally K. Gettings Devesh Tiwari 29 25 0 23 Jul 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware Ayon Basumallik D. Bunandar Nicholas Dronen Nicholas Harris Ludmila Levkova Calvin McCarter Lakshmi Nair David Walter David Widemann 14 6 0 12 May 2022
Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems Shail Dave Alberto Marchisio Muhammad Abdullah Hanif Amira Guesmi Aviral Shrivastava Ihsen Alouani Muhammad Shafique 34 13 0 18 Apr 2022
Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models Phyllis Ang Bhuwan Dhingra Lisa Wu Wills 25 6 0 15 Apr 2022
The MIT Supercloud Workload Classification Challenge Benny J. Tang Qiqi Chen Matthew L. Weiss Nathan C. Frey Joseph McDonald ... Lindsey McEvoy Baolin Li Devesh Tiwari V. Gadepally S. Samsi 11 2 0 12 Apr 2022
Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems S. B. Dutta Hoda Naghibijouybari Arjun Gupta Nael B. Abu-Ghazaleh Andres Marquez Kevin J. Barker GNN 8 24 0 30 Mar 2022
Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment Jemin Lee Misun Yu Yongin Kwon Teaho Kim MQ 17 17 0 10 Feb 2022
Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications Davood Ghatreh Samani M. Salehi 19 7 0 17 Dec 2021
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems S. Farrell M. Emani J. Balma L. Drescher Aleksandr Drozd ... Akihiro Tabuchi V. Vishwanath M. Wahib Masafumi Yamazaki Junqi Yin VLM 32 35 0 21 Oct 2021
Pyxis: An Open-Source Performance Dataset of Sparse Accelerators Linghao Song Yuze Chi Jason Cong 21 0 0 08 Oct 2021
Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads Guin Gilman R. Walls GNN BDL 36 17 0 01 Oct 2021
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation Alexandros Karargyris Renato Umeton Micah J. Sheller Alejandro Aristizabal Johnu George ... Poonam Yadav Michael Rosenthal M. Loda Jason M. Johnson Peter Mattson FedML 46 72 0 29 Sep 2021
AI Accelerator Survey and Trends Albert Reuther Peter Michaleas Michael Jones V. Gadepally S. Samsi J. Kepner 42 79 0 18 Sep 2021
On the Accuracy of Analog Neural Network Inference Accelerators T. Xiao Ben Feinberg C. Bennett V. Prabhakar Prashant Saxena V. Agrawal S. Agarwal M. Marinella 30 32 0 03 Sep 2021
DFSynthesizer: Dataflow-based Synthesis of Spiking Neural Networks to Neuromorphic Hardware Shihao Song Harry Chong Adarsha Balaji Anup Das J. Shackleford Nagarajan Kandasamy 18 28 0 04 Aug 2021
Anchor-based Plain Net for Mobile Image Super-Resolution Zongcai Du Jie Liu Jie Tang Gangshan Wu SupR MQ 30 52 0 20 May 2021
Dynamic Reliability Management in Neuromorphic Computing Shihao Song Jui Hanamshet Adarsha Balaji Anup Das J. Krichmar N. Dutt Nagarajan Kandasamy F. Catthoor 23 23 0 05 May 2021
Faa $T: A Transparent Auto-Scaling Cache for Serverless Applications$ Francisco Romero G. Chaudhry Íñigo Goiri Pragna Gopa Paul Batum N. Yadwadkar Rodrigo Fonseca Christos Kozyrakis Ricardo Bianchini 60 111 0 28 Apr 2021
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats Eric Qin Geonhwa Jeong William Won Sheng-Chun Kao Hyoukjun Kwon Sudarshan Srinivasan Dipankar Das G. Moon S. Rajamanickam T. Krishna 27 18 0 18 Mar 2021
Accounting for Variance in Machine Learning Benchmarks Xavier Bouthillier Pierre Delaunay Mirko Bronzi Assya Trofimov Brennan Nichyporuk ... Dmitriy Serdyuk Tal Arbel C. Pal Gaël Varoquaux Pascal Vincent 26 148 0 01 Mar 2021
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale Bilge Acun Matthew Murphy Xiaodong Wang Jade Nie Carole-Jean Wu K. Hazelwood 25 109 0 11 Nov 2020
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference Michael Lui Yavuz Yetim Özgür Özkan Zhuoran Zhao Shin-Yeh Tsai Carole-Jean Wu Mark Hempstead GNN BDL LRM 22 51 0 04 Nov 2020
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs G. Fursin 14 7 0 02 Nov 2020
TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems R. David Jared Duke Advait Jain Vijay Janapa Reddi Nat Jeffries ... Meghna Natraj Shlomi Regev Rocky Rhodes Tiezhen Wang Pete Warden 119 466 0 17 Oct 2020
Impact of Thermal Throttling on Long-Term Visual Inference in a CPU-based Edge Device Théo Benoit-Cattin Delia Velasco-Montero Jorge Fernández-Berni 11 25 0 13 Oct 2020
The Hardware Lottery Sara Hooker 27 203 0 14 Sep 2020
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning Shauharda Khadka Estelle Aflalo Mattias Marder Avrech Ben-David Santiago Miret Shie Mannor Tamir Hazan Hanlin Tang Somdeb Majumdar GNN 27 11 0 14 Jul 2020