ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.12639
  4. Cited By
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
v1v2v3 (latest)

Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree

17 December 2024
Xiangxiang Gao
Weisheng Xie
Yiwei Xiang
Feng Ji
ArXiv (abs)PDFHTML

Papers citing "Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree"

26 / 26 papers shown
Title
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
129
0
0
19 May 2025
RASD: Retrieval-Augmented Speculative Decoding
Guofeng Quan
Wenfeng Feng
Chuzhan Hao
Guochao Jiang
Yuewei Zhang
Hao Wang
RALM
161
1
0
05 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
234
18
0
03 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
170
0
0
27 Feb 2025
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan Oseledets
144
1
0
23 Oct 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
150
165
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple
  Decoding Heads
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
187
314
0
19 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
158
130
0
15 Jan 2024
Cascade Speculative Drafting for Even Faster LLM Inference
Cascade Speculative Drafting for Even Faster LLM Inference
Ziyi Chen
Xiaocong Yang
Jiacheng Lin
Chenkai Sun
Kevin Chen-Chuan Chang
Jie Huang
LRM
122
52
0
18 Dec 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
531
4,453
0
09 Jun 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
147
112
0
15 Feb 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDLLRM
91
436
0
02 Feb 2023
A Survey on Non-Autoregressive Generation for Neural Machine Translation
  and Beyond
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
Yisheng Xiao
Lijun Wu
Junliang Guo
Juntao Li
Hao Fei
Tao Qin
Tie-Yan Liu
3DVMedImAI4CE
96
89
0
20 Apr 2022
Self-Distillation Mixup Training for Non-autoregressive Neural Machine
  Translation
Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation
Jiaxin Guo
Minghan Wang
Daimeng Wei
Hengchao Shang
Yuxia Wang
...
Chang Su
Hao Fei
Lizhi Lei
Shimin Tao
Hao Yang
55
12
0
22 Dec 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
408
4,606
0
27 Oct 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
247
5,695
0
07 Jul 2021
How Does Distilled Data Complexity Impact the Quality and Confidence of
  Non-Autoregressive Machine Translation?
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?
Weijia Xu
Shuming Ma
Dongdong Zhang
Marine Carpuat
73
19
0
27 May 2021
GLM: General Language Model Pretraining with Autoregressive Blank
  Infilling
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDLAI4CE
156
1,565
0
18 Mar 2021
Understanding and Improving Lexical Choice in Non-Autoregressive
  Translation
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
Liang Ding
Longyue Wang
Xuebo Liu
Derek F. Wong
Dacheng Tao
Zhaopeng Tu
174
77
0
29 Dec 2020
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine
  Translation
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
Junliang Guo
Xu Tan
Linli Xu
Tao Qin
Enhong Chen
Tie-Yan Liu
96
85
0
20 Nov 2019
Fast Structured Decoding for Sequence Models
Fast Structured Decoding for Sequence Models
Zhiqing Sun
Zhuohan Li
Haoqing Wang
Zi Lin
Di He
Zhihong Deng
94
122
0
25 Oct 2019
Non-Autoregressive Machine Translation with Auxiliary Regularization
Non-Autoregressive Machine Translation with Auxiliary Regularization
Yiren Wang
Fei Tian
Di He
Tao Qin
ChengXiang Zhai
Tie-Yan Liu
74
160
0
22 Feb 2019
Semi-Autoregressive Neural Machine Translation
Semi-Autoregressive Neural Machine Translation
Chunqi Wang
Ji Zhang
Haiqing Chen
80
89
0
26 Aug 2018
Fast Decoding in Sequence Models using Discrete Latent Variables
Fast Decoding in Sequence Models using Discrete Latent Variables
Łukasz Kaiser
Aurko Roy
Ashish Vaswani
Niki Parmar
Samy Bengio
Jakob Uszkoreit
Noam M. Shazeer
83
232
0
09 Mar 2018
WaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
415
7,429
0
12 Sep 2016
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
369
19,791
0
09 Mar 2015
1