ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.05615
  4. Cited By
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

13 January 2021
D. Khudia
Jianyu Huang
Protonu Basu
Summer Deng
Haixin Liu
Jongsoo Park
M. Smelyanskiy
    FedML
    MQ
ArXivPDFHTML

Papers citing "FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference"

27 / 27 papers shown
Title
Applying Deep Learning to Ads Conversion Prediction in Last Mile Delivery Marketplace
Applying Deep Learning to Ads Conversion Prediction in Last Mile Delivery Marketplace
Di Li
Xiaochang Miao
Huiyu Song
Chao Chu
Hao Xu
Mandar Rahurkar
37
0
0
14 Feb 2025
Towards Universal Performance Modeling for Machine Learning Training on
  Multi-GPU Platforms
Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Zhongyi Lin
Ning Sun
Pallab Bhattacharya
Xizhou Feng
Louis Feng
John Douglas Owens
42
1
0
19 Apr 2024
Disaggregated Multi-Tower: Topology-aware Modeling Technique for
  Efficient Large-Scale Recommendation
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Liang Luo
Buyun Zhang
Michael Tsang
Yinbin Ma
Ching-Hsiang Chu
...
Guna Lakshminarayanan
Ellie Wen
Jongsoo Park
Dheevatsa Mudigere
Maxim Naumov
49
4
0
01 Mar 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential
  Transducers for Generative Recommendations
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai
Lucy Liao
Xing Liu
Yueming Wang
Rui Li
...
Zhaojie Gong
Fangda Gu
Michael He
Yin-Hua Lu
Yu Shi
OffRL
32
48
0
27 Feb 2024
Quantized Distillation: Optimizing Driver Activity Recognition Models
  for Resource-Constrained Environments
Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments
Calvin Tanama
Kunyu Peng
Zdravko Marinov
Rainer Stiefelhagen
Alina Roitberg
45
1
0
10 Nov 2023
Neural Network Compression using Binarization and Few Full-Precision
  Weights
Neural Network Compression using Binarization and Few Full-Precision Weights
F. M. Nardini
Cosimo Rulli
Salvatore Trani
Rossano Venturini
MQ
29
1
0
15 Jun 2023
Revisiting Neural Retrieval on Accelerators
Revisiting Neural Retrieval on Accelerators
Jiaqi Zhai
Zhaojie Gong
Yueming Wang
Xiao Sun
Zheng Yan
Fu Li
Xing Liu
20
10
0
06 Jun 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight
  Compression
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Tim Dettmers
Ruslan Svirschevski
Vage Egiazarian
Denis Kuznedelev
Elias Frantar
Saleh Ashkboos
Alexander Borzunov
Torsten Hoefler
Dan Alistarh
MQ
37
232
0
05 Jun 2023
Pre-train and Search: Efficient Embedding Table Sharding with
  Pre-trained Neural Cost Models
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Daochen Zha
Louis Feng
Liangchen Luo
Bhargav Bhushanam
Zirui Liu
...
J. McMahon
Yuzhen Huang
Bryan Clarke
A. Kejariwal
Xia Hu
58
7
0
03 May 2023
Stable and low-precision training for large-scale vision-language models
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
39
0
25 Apr 2023
MTrainS: Improving DLRM training efficiency using heterogeneous memories
MTrainS: Improving DLRM training efficiency using heterogeneous memories
H. Kassa
Paul Johnson
Jason B. Akers
Mrinmoy Ghosh
Andrew Tulloch
Dheevatsa Mudigere
Jongsoo Park
Xing Liu
R. Dreslinski
E. K. Ardestani
33
1
0
19 Apr 2023
HEAT: A Highly Efficient and Affordable Training System for
  Collaborative Filtering Based Recommendation on CPUs
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Chengming Zhang
Shaden Smith
Baixi Sun
Jiannan Tian
Jon Soifer
Xiaodong Yu
Shuaiwen Leon Song
Yuxiong He
Dingwen Tao
44
0
0
14 Apr 2023
FPUS23: An Ultrasound Fetus Phantom Dataset with Deep Neural Network
  Evaluations for Fetus Orientations, Fetal Planes, and Anatomical Features
FPUS23: An Ultrasound Fetus Phantom Dataset with Deep Neural Network Evaluations for Fetus Orientations, Fetal Planes, and Anatomical Features
B. Prabakaran
Paul Hamelmann
Erik Ostrowski
Mohamed Bennai
16
11
0
14 Mar 2023
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
Samuel Hsia
Udit Gupta
Bilge Acun
Newsha Ardalani
Pan Zhong
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
49
17
0
21 Feb 2023
A Practical Stereo Depth System for Smart Glasses
A Practical Stereo Depth System for Smart Glasses
Jialiang Wang
D. Scharstein
Akash Bapat
Kevin Blackburn-Matzen
Matthew Yu
...
Jan-Michael Frahm
Zijian He
Peter Vajda
Michael F. Cohen
M. Uyttendaele
MDE
38
5
0
19 Nov 2022
FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on
  General Purpose CPUs
FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on General Purpose CPUs
Hossein Katebi
Navidreza Asadi
M. Goudarzi
MQ
30
0
0
13 Nov 2022
DreamShard: Generalizable Embedding Table Placement for Recommender
  Systems
DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Daochen Zha
Louis Feng
Qiaoyu Tan
Zirui Liu
Kwei-Herng Lai
Bhargav Bhushanam
Yuandong Tian
A. Kejariwal
Xia Hu
LMTD
OffRL
33
28
0
05 Oct 2022
Verifiable and Energy Efficient Medical Image Analysis with Quantised
  Self-attentive Deep Neural Networks
Verifiable and Energy Efficient Medical Image Analysis with Quantised Self-attentive Deep Neural Networks
Rakshith Sathish
S. Khare
Debdoot Sheet
42
4
0
30 Sep 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
43
637
0
15 Aug 2022
AutoShard: Automated Embedding Table Sharding for Recommender Systems
AutoShard: Automated Embedding Table Sharding for Recommender Systems
Daochen Zha
Louis Feng
Bhargav Bhushanam
Dhruv Choudhary
Jade Nie
Yuandong Tian
Jay Chae
Yi-An Ma
A. Kejariwal
Xia Hu
40
30
0
12 Aug 2022
Torch.fx: Practical Program Capture and Transformation for Deep Learning
  in Python
Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python
James K. Reed
Zach DeVito
Horace He
Ansley Ussery
Jason Ansel
CLIP
28
47
0
15 Dec 2021
DeepSteal: Advanced Model Extractions Leveraging Efficient Weight
  Stealing in Memories
DeepSteal: Advanced Model Extractions Leveraging Efficient Weight Stealing in Memories
Adnan Siraj Rakin
Md Hafizul Islam Chowdhuryy
Fan Yao
Deliang Fan
AAML
MIACV
42
110
0
08 Nov 2021
Supporting Massive DLRM Inference Through Software Defined Memory
Supporting Massive DLRM Inference Through Software Defined Memory
E. K. Ardestani
Changkyu Kim
Seung Jae Lee
Luoshang Pan
Valmiki Rampersad
...
Krishnakumar Nair
Maxim Naumov
Christopher Peterson
M. Smelyanskiy
Vijay Rao
BDL
41
20
0
21 Oct 2021
The NiuTrans System for the WMT21 Efficiency Task
The NiuTrans System for the WMT21 Efficiency Task
Chenglong Wang
Chi Hu
Yongyu Mu
Zhongxiang Yan
Siming Wu
...
Hang Cao
Bei Li
Ye Lin
Tong Xiao
Jingbo Zhu
29
2
0
16 Sep 2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference
  at Scale
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Zhaoxia Deng
Deng
Jongsoo Park
P. T. P. Tang
Haixin Liu
...
S. Nadathur
Changkyu Kim
Maxim Naumov
S. Naghshineh
M. Smelyanskiy
29
11
0
26 May 2021
Model Pruning Based on Quantified Similarity of Feature Maps
Model Pruning Based on Quantified Similarity of Feature Maps
Zidu Wang
Xue-jun Liu
Long Huang
Yuxiang Chen
Yufei Zhang
Zhikang Lin
Rui Wang
21
16
0
13 May 2021
Efficient Soft-Error Detection for Low-precision Deep Learning
  Recommendation Models
Efficient Soft-Error Detection for Low-precision Deep Learning Recommendation Models
Sihuan Li
Jianyu Huang
P. T. P. Tang
D. Khudia
Jongsoo Park
H. Dixit
Zizhong Chen
39
13
0
27 Feb 2021
1