Title
Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline Ole Richter Y. Xing M. D. Marchi Carsten Nielsen M. Katsimpris ... SynSense Bio-Inspired Circuits Sadique Sheik T. Demirci Groningen Cognitive Systems 94 61 0 13 Apr 2023
Training Large Language Models Efficiently with Sparsity and Dataflow V. Srinivasan Darshan Gandhi Urmish Thakker R. Prabhakar MoE 69 6 0 11 Apr 2023
SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration I. Miro-Panadès Benoît Tain J. Christmann David Coriat R. Lemaire ... Jean-Marc Philippe Y. Thonnart A. Valentian Frédéric Heitzmann F. Clermidy 46 15 0 11 Apr 2023
Mixed-Precision Random Projection for RandNLA on Tensor Cores Hiroyuki Ootomo Rio Yokota 44 3 0 10 Apr 2023
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient Block Design Shinkook Choi Junkyeong Choi 26 1 0 08 Apr 2023
Tensor Slicing and Optimization for Multicore NPUs R. Sousa M. Pereira Yongin Kwon Taeho Kim Namsoon Jung Chang Soo Kim Michael Frank Guido Araujo 86 6 0 06 Apr 2023
A differentiable programming framework for spin models T. S. Farias V. V. Schultz José C. M. Mombach Jonas Maziero 55 1 0 04 Apr 2023
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings N. Jouppi George Kurian Sheng Li Peter C. Ma R. Nagarajan ... Brian Towles C. Young Xiaoping Zhou Zongwei Zhou David A. Patterson BDL VLM 169 371 0 04 Apr 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR Rami Botros Anmol Gulati Tara N. Sainath K. Choromanski Ruoming Pang Trevor Strohman Weiran Wang Jiahui Yu MQ 80 3 0 31 Mar 2023
D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs Aditya Dhakal Sameer G. Kulkarni K. Ramakrishnan 30 4 0 31 Mar 2023
PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration Richard Petri Grace Li Zhang Yiran Chen Ulf Schlichtmann Bing Li 29 6 0 24 Mar 2023
Pre-NeRF 360: Enriching Unbounded Appearances for Neural Radiance Fields Ahmad AlMughrabi Umair Haroon Ricardo Marques Petia Radeva 66 6 0 21 Mar 2023
Economical Quaternion Extraction from a Human Skeletal Pose Estimate using 2-D Cameras S. Radhakrishna A. Balasubramanyam 3DH 60 1 0 15 Mar 2023
DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators Mahdi Taheri M. Riazati Mohammad Hasan Ahmadilivani M. Jenihhin Masoud Daneshtalab J. Raik Mikael Sjödin B. Lisper 76 20 0 14 Mar 2023
X-Former: In-Memory Acceleration of Transformers S. Sridharan Jacob R. Stevens Kaushik Roy A. Raghunathan GNN 53 38 0 13 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training Wei Li Linchao Zhu Longyin Wen Yi Yang VLM 107 89 0 06 Mar 2023
End-to-End Speech Recognition: A Survey Rohit Prabhavalkar Takaaki Hori Tara N. Sainath Ralf Schluter Shinji Watanabe VLM 82 172 0 03 Mar 2023
HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture Yi-Chien Lin Viktor Prasanna GNN 67 7 0 01 Mar 2023
Auxiliary MCMC and particle Gibbs samplers for parallelisable inference in latent dynamical systems Adrien Corenflos Simo Särkkä 77 0 0 01 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey Sehoon Kim Coleman Hooper Thanakul Wattanawong Minwoo Kang Ruohan Yan ... Qijing Huang Kurt Keutzer Michael W. Mahoney Y. Shao A. Gholami MQ 163 106 0 27 Feb 2023
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation Samuel Hsia Udit Gupta Bilge Acun Newsha Ardalani Pan Zhong Gu-Yeon Wei David Brooks Carole-Jean Wu 108 17 0 21 Feb 2023
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs Geonhwa Jeong S. Damani Abhimanyu Bambhaniya Eric Qin C. Hughes S. Subramoney Hyesoon Kim T. Krishna MoE 84 26 0 17 Feb 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression Minghao Li Ran Ben-Basat S. Vargaftik Chon-In Lao Ke Xu Michael Mitzenmacher Minlan Yu Harvard University 94 19 0 16 Feb 2023
Toward matrix multiplication for deep learning inference on the Xilinx Versal Jie Lei J. Flich Enrique S. Quintana-Ortí 29 4 0 15 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle Vanessa Mehlin Sigurd Schacht Carsten Lanquillon HAI MedIm 127 20 0 05 Feb 2023
A Survey on Efficient Training of Transformers Bohan Zhuang Jing Liu Zizheng Pan Haoyu He Yuetian Weng Chunhua Shen 128 49 0 02 Feb 2023
Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity Wenhao Sun Zhiwei Zou Deng Liu Wendi Sun Song Chen Yi Kang MQ 21 7 0 01 Feb 2023
A Green(er) World for A.I Dan Zhao Nathan C. Frey Joseph McDonald Matthew Hubbell David Bestor Michael Jones Andrew Prout V. Gadepally S. Samsi 69 6 0 27 Jan 2023
PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices Yuji Chai Devashree Tripathy Chu Zhou Dibakar Gope Igor Fedorov Ramon Matas David Brooks Gu-Yeon Wei P. Whatmough GNN 78 5 0 26 Jan 2023
SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators Mingi Yoo Jaeyong Song Jounghoo Lee Namhyung Kim Youngsok Kim Jinho Lee GNN 89 22 0 25 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression Jaeyong Song Jinkyu Yim Jaewon Jung Hongsun Jang H. Kim Youngsok Kim Jinho Lee GNN 74 28 0 24 Jan 2023
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators Min-hee Yoo Jaeyong Song Hyeyoon Lee Jounghoo Lee Namhyung Kim Youngsok Kim Jinho Lee GNN 79 5 0 24 Jan 2023
Enabling Hard Constraints in Differentiable Neural Network and Accelerator Co-Exploration Deokki Hong Kanghyun Choi Hyeyoon Lee Joonsang Yu Noseong Park Youngsok Kim Jinho Lee 46 3 0 23 Jan 2023
Analog, In-memory Compute Architectures for Artificial Intelligence Patrick Bowen G. Regev Nir Regev Bruno U. Pedroni Edward Hanson Yiran Chen 25 3 0 13 Jan 2023
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics George Michelogiannakis Yehia Arafa B. Cook Liang Yuan Dai Abdel-Hameed A. Badawy Madeleine Glick Yuyang Wang Keren Bergman J. Shalf 57 9 0 09 Jan 2023
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models Geet Sethi Pallab Bhattacharya Dhruv Choudhary Carole-Jean Wu Christos Kozyrakis 79 5 0 08 Jan 2023
A Theory of I/O-Efficient Sparse Neural Network Inference Niels Gleinig Tal Ben-Nun Torsten Hoefler 59 0 0 03 Jan 2023
Accelerating CNN inference on long vector architectures via co-design Sonia Rani Gupta Nikela Papadopoulou Miquel Pericàs 3DV 78 4 0 22 Dec 2022
Annotated History of Modern AI and Deep Learning Juergen Schmidhuber MLAU AI4TS AI4CE 65 25 0 21 Dec 2022
$Sophisticated deep learning with on-chip optical diffractive tensor processing$ Sophisticated deep learning with on-chip optical diffractive tensor processing Yuyao Huang Tingzhao Fu Honghao Huang Sigang Yang Hong-wei Chen BDL 20 14 0 20 Dec 2022
AnyTOD: A Programmable Task-Oriented Dialog System Jeffrey Zhao Yuan Cao Raghav Gupta Harrison Lee Abhinav Rastogi Mingqiu Wang H. Soltau Izhak Shafran Yonghui Wu VLM 92 11 0 20 Dec 2022
Containerisation for High Performance Computing Systems: Survey and Prospects Naweiluo Zhou Huan Zhou Dennis Hoppe 65 27 0 16 Dec 2022
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks Mingyu Liang Wenyin Fu Louis Feng Zhongyi Lin P. Panakanti Shengbao Zheng Srinivas Sridharan Christina Delimitrou 52 12 0 16 Dec 2022
Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics Viktor Sanca Anastasia Ailamaki 116 4 0 14 Dec 2022
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling L. Mei Koen Goetschalckx Arne Symons Marian Verhelst 183 31 0 10 Dec 2022
Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework Federico Nicolás Peccia Oliver Bringmann 52 5 0 06 Dec 2022
DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation Liu Ke Xuan Zhang Benjamin C. Lee G. E. Suh Hsien-Hsin S. Lee 71 8 0 02 Dec 2022
On-device Training: A First Overview on Existing Systems Shuai Zhu Thiemo Voigt Jeonggil Ko Fatemeh Rahimian 138 17 0 01 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Trevor Gale Deepak Narayanan C. Young Matei A. Zaharia MoE 81 109 0 29 Nov 2022
Edge Video Analytics: A Survey on Applications, Systems and Enabling Techniques Renjie Xu S. Razavi Rong Zheng 112 20 0 28 Nov 2022