ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12728
  4. Cited By
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
v1v2 (latest)

FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks

19 May 2025
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
    MLLM
ArXiv (abs)PDFHTML

Papers citing "FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks"

26 / 26 papers shown
Title
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
327
685
0
20 Feb 2025
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
Zichen Wen
Yifeng Gao
Shaobo Wang
J.N. Zhang
Qintong Zhang
Weijia Li
Conghui He
Linfeng Zhang
VLM
99
13
0
17 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALMBDL
144
16
0
31 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLMVLM
142
42
0
07 Jan 2025
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Xiangxiang Gao
Weisheng Xie
Yiwei Xiang
Feng Ji
153
6
0
17 Dec 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
111
13
0
09 Oct 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
87
67
0
02 Jul 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
138
84
0
24 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
64
34
0
31 May 2024
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Mostafa Elhoushi
Akshat Shrivastava
Diana Liskovich
Basil Hosmer
Bram Wasti
...
Saurabh Agarwal
Ahmed Roman
Ahmed Aly
Beidi Chen
Carole-Jean Wu
LRM
83
108
0
25 Apr 2024
On Speculative Decoding for Multimodal Large Language Models
On Speculative Decoding for Multimodal Large Language Models
Mukul Gagrani
Raghavv Goel
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
LRM
48
11
0
13 Apr 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal
  Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
110
126
0
22 Mar 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
171
163
0
03 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
117
165
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple
  Decoding Heads
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
131
299
0
19 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
122
123
0
15 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
252
1,188
0
21 Dec 2023
Cascade Speculative Drafting for Even Faster LLM Inference
Cascade Speculative Drafting for Even Faster LLM Inference
Ziyi Chen
Xiaocong Yang
Jiacheng Lin
Chenkai Sun
Kevin Chen-Chuan Chang
Jie Huang
LRM
62
51
0
18 Dec 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
94
94
0
12 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
123
2,807
0
05 Oct 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
560
4,861
0
17 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,631
0
15 Mar 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDLLRM
89
430
0
02 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,641
0
30 Jan 2023
Fast Inference from Transformers via Speculative Decoding
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
147
724
0
30 Nov 2022
The Kinetics Human Action Video Dataset
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
250
3,815
0
19 May 2017
1