Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.18164
Cited By
Model-Distributed Inference for Large Language Models at the Edge
13 May 2025
Davide Macario
H. Seferoglu
Erdem Koyuncu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Model-Distributed Inference for Large Language Models at the Edge"
15 / 15 papers shown
Title
Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li
Hulya Seferoglu
91
1
0
16 Dec 2024
Early-Exit meets Model-Distributed Inference at Edge Networks
Marco Colocrese
Erdem Koyuncu
H. Seferoglu
35
3
0
08 Aug 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
139
393
0
04 Jan 2024
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Yilong Zhao
Chien-Yu Lin
Kan Zhu
Zihao Ye
Lequn Chen
Wenlei Bao
Luis Ceze
Arvind Krishnamurthy
Tianqi Chen
Baris Kasikci
MQ
70
148
0
29 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
95
50
0
02 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
326
12,044
0
18 Jul 2023
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
134
436
0
20 Jun 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
84
688
0
22 May 2023
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
97
324
0
09 Nov 2022
Pipeline Parallelism for Inference on Heterogeneous Edge Computing
Yang Hu
Connor Imes
Xuanang Zhao
Souvik Kundu
Peter A. Beerel
S. Crago
J. Walters
MoE
139
20
0
28 Oct 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLM
GNN
21
32
0
30 May 2021
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
107
2,976
0
09 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
835
42,332
0
28 May 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,914
0
17 Sep 2019
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Chi-Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
74
101
0
08 Sep 2018
1