Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.00025
Cited By
v1
v2 (latest)
Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition
5 January 2024
Adnan Hoque
Less Wright
Chih-Chieh Yang
Mudhakar Srivatsa
R. Ganti
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition"
1 / 1 papers shown
Title
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding
Bohan Hou
Xinyu Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
121
0
0
17 Apr 2025
1