Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22135
Cited By
RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding
28 May 2025
Yuichiro Hoshino
Hideyuki Tachibana
Muneyoshi Inahara
Hiroto Takegawa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding"
8 / 8 papers shown
Title
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing
Kou Misaki
Han Bao
Sho Yokoi
Takuya Akiba
VLM
79
2
0
28 Jan 2025
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
114
3
0
28 Nov 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
83
63
0
11 Jun 2024
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
35
47
0
26 May 2024
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
62
55
0
18 Apr 2024
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
54
663
0
30 Nov 2022
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
33
1,049
0
25 May 2019
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
221
129,831
0
12 Jun 2017
1