FIRP: Faster LLM inference via future intermediate representation
prediction

FIRP: Faster LLM inference via future intermediate representation prediction

27 October 2024

Jingang Wang

ArXiv (abs)PDF HTML

Papers citing "FIRP: Faster LLM inference via future intermediate representation prediction"

6 / 6 papers shown

Title
Accelerating LLM Inference with Staged Speculative Decoding Benjamin Spector Christal Re 69 112 0 08 Aug 2023
Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan Matan Kalman Yossi Matias LRM 147 733 0 30 Nov 2022
Training Verifiers to Solve Math Word Problems K. Cobbe V. Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun ... Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse John Schulman ReLM OffRL LRM 323 4,533 0 27 Oct 2021
Datasets: A Community Library for Natural Language Processing Quentin Lhoest Albert Villanova del Moral Yacine Jernite A. Thakur Patrick von Platen ... Thibault Goehringer Victor Mustar François Lagunas Alexander M. Rush Thomas Wolf 218 613 0 07 Sep 2021
Fast Transformer Decoding: One Write-Head is All You Need Noam M. Shazeer 154 472 0 06 Nov 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization Shashi Narayan Shay B. Cohen Mirella Lapata AILaw 146 1,682 0 27 Aug 2018