ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18164
  4. Cited By
Model-Distributed Inference for Large Language Models at the Edge

Model-Distributed Inference for Large Language Models at the Edge

13 May 2025
Davide Macario
H. Seferoglu
Erdem Koyuncu
ArXiv (abs)PDFHTML

Papers citing "Model-Distributed Inference for Large Language Models at the Edge"

15 / 15 papers shown
Title
Priority-Aware Model-Distributed Inference at Edge Networks
Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li
Hulya Seferoglu
91
1
0
16 Dec 2024
Early-Exit meets Model-Distributed Inference at Edge Networks
Early-Exit meets Model-Distributed Inference at Edge Networks
Marco Colocrese
Erdem Koyuncu
H. Seferoglu
35
3
0
08 Aug 2024
TinyLlama: An Open-Source Small Language Model
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALMLRM
139
393
0
04 Jan 2024
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Yilong Zhao
Chien-Yu Lin
Kan Zhu
Zihao Ye
Lequn Chen
Wenlei Bao
Luis Ceze
Arvind Krishnamurthy
Tianqi Chen
Baris Kasikci
MQ
70
148
0
29 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
95
50
0
02 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
326
12,044
0
18 Jul 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
134
436
0
20 Jun 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
84
688
0
22 May 2023
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
97
324
0
09 Nov 2022
Pipeline Parallelism for Inference on Heterogeneous Edge Computing
Pipeline Parallelism for Inference on Heterogeneous Edge Computing
Yang Hu
Connor Imes
Xuanang Zhao
Souvik Kundu
Peter A. Beerel
S. Crago
J. Walters
MoE
139
20
0
28 Oct 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
Tesseract: Parallelize the Tensor Parallelism Efficiently
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLMGNN
21
32
0
30 May 2021
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
107
2,976
0
09 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
835
42,332
0
28 May 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,914
0
17 Sep 2019
Efficient and Robust Parallel DNN Training through Model Parallelism on
  Multi-GPU Platform
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Chi-Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
74
101
0
08 Sep 2018
1