ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.05165
  4. Cited By
Efficient Inference for Large Language Model-based Generative Recommendation
v1v2v3 (latest)

Efficient Inference for Large Language Model-based Generative Recommendation

7 October 2024
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
ArXiv (abs)PDFHTML

Papers citing "Efficient Inference for Large Language Model-based Generative Recommendation"

35 / 35 papers shown
Title
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
Zijie Lin
Yang Zhang
Xiaoyan Zhao
Fengbin Zhu
Fuli Feng
Tat-Seng Chua
33
0
0
16 Jun 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
179
0
0
27 Feb 2025
Order-agnostic Identifier for Large Language Model-based Generative Recommendation
Order-agnostic Identifier for Large Language Model-based Generative Recommendation
Xinyu Lin
Haihan Shi
Wenjie Wang
Fuli Feng
Qifan Wang
See-Kiong Ng
Tat-Seng Chua
61
3
0
15 Feb 2025
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding
Yunjia Xi
Hangyu Wang
Bo Chen
Jianghao Lin
Menghui Zhu
Wen Liu
Ruiming Tang
Zhewei Wei
Weinan Zhang
Yong Yu
OffRL
160
4
0
11 Aug 2024
A Survey of Generative Search and Recommendation in the Era of Large
  Language Models
A Survey of Generative Search and Recommendation in the Era of Large Language Models
Chak Tou Leong
Xinyu Lin
Wenjie Wang
Fuli Feng
Liang Pang
Wenjie Li
Liqiang Nie
Xiangnan He
Tat-Seng Chua
3DVLRM
101
9
0
25 Apr 2024
Can Small Language Models be Good Reasoners for Sequential
  Recommendation?
Can Small Language Models be Good Reasoners for Sequential Recommendation?
Yuling Wang
Changxin Tian
Binbin Hu
Yanhua Yu
Ziqi Liu
Qing Cui
Jun Zhou
Liang Pang
Xiao Wang
LRM
115
31
0
07 Mar 2024
Stealthy Attack on Large Language Model based Recommendation
Stealthy Attack on Large Language Model based Recommendation
Jinghao Zhang
Yuting Liu
Qiang Liu
Shu Wu
Guibing Guo
Liang Wang
85
14
0
18 Feb 2024
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative
  Decoding
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du
Jing Jiang
Yuanchen Xu
Jiawei Wu
Sicheng Yu
...
Shenggui Li
Kai Xu
Liqiang Nie
Zhaopeng Tu
Yang You
97
38
0
03 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
209
164
0
03 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
202
165
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple
  Decoding Heads
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
198
315
0
19 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
168
130
0
15 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
140
87
0
23 Dec 2023
Cascade Speculative Drafting for Even Faster LLM Inference
Cascade Speculative Drafting for Even Faster LLM Inference
Ziyi Chen
Xiaocong Yang
Jiacheng Lin
Chenkai Sun
Kevin Chen-Chuan Chang
Jie Huang
LRM
128
52
0
18 Dec 2023
LightLM: A Lightweight Deep and Narrow Language Model for Generative
  Recommendation
LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation
Kai Mei
Yongfeng Zhang
VLM
220
11
0
26 Oct 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
SpecTr: Fast Speculative Decoding via Optimal Transport
Ziteng Sun
A. Suresh
Jae Hun Ro
Ahmad Beirami
Himanshu Jain
Felix X. Yu
107
81
0
23 Oct 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
138
95
0
12 Oct 2023
Online Speculative Decoding
Online Speculative Decoding
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
145
62
0
11 Oct 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
81
61
0
27 Jul 2023
On-Policy Distillation of Language Models: Learning from Self-Generated
  Mistakes
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Rishabh Agarwal
Nino Vieillard
Yongchao Zhou
Piotr Stańczyk
Sabela Ramos
Matthieu Geist
Olivier Bachem
105
105
0
23 Jun 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
152
157
0
16 May 2023
How to Index Item IDs for Recommendation Foundation Models
How to Index Item IDs for Recommendation Foundation Models
Wenyue Hua
Shuyuan Xu
Yingqiang Ge
Yongfeng Zhang
83
117
0
11 May 2023
Recommender Systems with Generative Retrieval
Recommender Systems with Generative Retrieval
Shashank Rajput
Nikhil Mehta
Anima Singh
Raghunandan H. Keshavan
T. Vu
...
Vinh Q. Tran
Jonah Samost
Maciej Kula
Ed H. Chi
M. Sathiamoorthy
RALM3DV
103
90
0
08 May 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.7K
13,558
0
27 Feb 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
147
113
0
15 Feb 2023
Fast Inference from Transformers via Speculative Decoding
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
164
738
0
30 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
265
847
0
18 Nov 2022
Speculative Decoding: Exploiting Speculative Execution for Accelerating
  Seq2seq Generation
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
124
90
0
30 Mar 2022
Autoregressive Image Generation using Residual Quantization
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
295
378
0
03 Mar 2022
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
843
10,659
0
17 Jun 2021
Mining Latent Structures for Multimedia Recommendation
Mining Latent Structures for Multimedia Recommendation
Jinghao Zhang
Yanqiao Zhu
Qiang Liu
Shu Wu
Shuhui Wang
Liang Wang
BDL
124
207
0
19 Apr 2021
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
346
7,585
0
02 Oct 2019
Sequence-Level Knowledge Distillation
Sequence-Level Knowledge Distillation
Yoon Kim
Alexander M. Rush
138
1,123
0
25 Jun 2016
How (not) to Train your Generative Model: Scheduled Sampling,
  Likelihood, Adversary?
How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?
Ferenc Huszár
OODDiffMGAN
98
298
0
16 Nov 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
372
19,823
0
09 Mar 2015
1