ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.07140
  4. Cited By
Edge Intelligence Optimization for Large Language Model Inference with
  Batching and Quantization

Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

12 May 2024
Xinyuan Zhang
Jiang Liu
Zehui Xiong
Yudong Huang
Gaochang Xie
Ran Zhang
ArXivPDFHTML

Papers citing "Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization"

3 / 3 papers shown
Title
PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference
PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference
Guanqiao Qu
Qian Chen
Xianhao Chen
Kaibin Huang
Yuguang Fang
46
1
0
29 Mar 2025
On-Device Language Models: A Comprehensive Review
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
50
27
0
26 Aug 2024
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
371
0
13 Mar 2023
1