Lossless Acceleration of Large Language Model via Adaptive N-gram
Parallel Decoding

v1v2 (latest)

Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding

10 April 2024

ArXiv (abs)PDF HTML

Papers citing "Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding"

18 / 18 papers shown

Title
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models Mugilan Ganesan Siyang Song Ankur Aggarwal Nish Sinnadurai Sean Lie Vithursan Thangarasa VLM 106 0 0 15 May 2025
Teola: Towards End-to-End Optimization of LLM-based Applications Xin Tan Yimin Jiang Yitao Yang Hong-Yu Xu 128 7 0 29 Jun 2024
Qwen Technical Report Jinze Bai Shuai Bai Yunfei Chu Zeyu Cui Kai Dang ... Zhenru Zhang Chang Zhou Jingren Zhou Xiaohuan Zhou Tianhang Zhu OSLM 262 1,827 0 28 Sep 2023
Accelerating LLM Inference with Staged Speculative Decoding Benjamin Spector Christal Re 66 108 0 08 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding Seongjun Yang Gibbeum Lee Jaewoong Cho Dimitris Papailiopoulos Kangwook Lee 77 38 0 12 Jul 2023
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference Luciano Del Corro Allison Del Giorno Sahaj Agarwal Ting Yu Ahmed Hassan Awadallah Subhabrata Mukherjee 98 59 0 05 Jul 2023
Speculative Decoding with Big Little Decoder Sehoon Kim K. Mangalam Suhong Moon Jitendra Malik Michael W. Mahoney A. Gholami Kurt Keutzer MoE 95 112 0 15 Feb 2023
Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan Matan Kalman Yossi Matias LRM 147 724 0 30 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers Z. Yao Xiaoxia Wu Conglong Li Connor Holmes Minjia Zhang Cheng-rong Li Yuxiong He 73 12 0 17 Nov 2022
PaLM: Scaling Language Modeling with Pathways Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra ... Kathy Meier-Hellstern Douglas Eck J. Dean Slav Petrov Noah Fiedel PILM LRM 509 6,279 0 05 Apr 2022
Token Dropping for Efficient BERT Pretraining Le Hou Richard Yuanzhe Pang Dinesh Manocha Yuexin Wu Xinying Song Xiaodan Song Denny Zhou 72 45 0 24 Mar 2022
Evaluating Large Language Models Trained on Code Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pondé ... Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever Wojciech Zaremba ELM ALM 233 5,635 0 07 Jul 2021
GLM: General Language Model Pretraining with Autoregressive Blank Infilling Zhengxiao Du Yujie Qian Xiao Liu Ming Ding J. Qiu Zhilin Yang Jie Tang BDL AI4CE 137 1,550 0 18 Mar 2021
Training data-efficient image transformers & distillation through attention Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles Hervé Jégou ViT 389 6,793 0 23 Dec 2020
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 69 421 0 28 Mar 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization Shashi Narayan Shay B. Cohen Mirella Lapata AILaw 143 1,682 0 27 Aug 2018
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Song Han Huizi Mao W. Dally 3DGS 261 8,854 0 01 Oct 2015
Teaching Machines to Read and Comprehend Karl Moritz Hermann Tomás Kociský Edward Grefenstette L. Espeholt W. Kay Mustafa Suleyman Phil Blunsom 347 3,551 0 10 Jun 2015