Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.07104
Cited By
SGLang: Efficient Execution of Structured Language Model Programs
12 December 2023
Lianmin Zheng
Liangsheng Yin
Zhiqiang Xie
Chuyue Sun
Jeff Huang
Cody Hao Yu
Shiyi Cao
Christos Kozyrakis
Ion Stoica
Joseph E. Gonzalez
Clark W. Barrett
Ying Sheng
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SGLang: Efficient Execution of Structured Language Model Programs"
31 / 31 papers shown
Title
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Yongqian Li
Haojie Wang
Biao Hou
Jidong Zhai
40
0
0
12 May 2025
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
63
0
0
06 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Y. Chen
J. Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
38
0
0
05 May 2025
GPU Performance Portability needs Autotuning
Burkhard Ringlein
Thomas Parnell
Radu Stoica
125
0
0
30 Apr 2025
GenTorrent: Scaling Large Language Model Serving with An Overley Network
Fei Fang
Yifan Hua
Shengze Wang
Ruilin Zhou
Y. Liu
Chen Qian
Xuzhi Zhang
60
0
0
27 Apr 2025
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula
Benjamin LeBrun
Li Du
Ben Lipkin
Clemente Pasti
...
Ryan Cotterel
Vikash K. Mansinghka
Alexander K. Lew
Tim Vieira
Timothy J. O'Donnell
34
1
0
17 Apr 2025
Token Level Routing Inference System for Edge Devices
Jianshu She
Wenhao Zheng
Zhengzhong Liu
Hongyi Wang
Eric P. Xing
Huaxiu Yao
Qirong Ho
36
0
0
10 Apr 2025
Hawkeye:Efficient Reasoning with Model Collaboration
Jianshu She
Z. Li
Zhemin Huang
Qi Li
Peiran Xu
Haonan Li
Qirong Ho
LRM
60
2
0
01 Apr 2025
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
Huan Yang
Renji Zhang
Mingzhe Huang
Weijun Wang
Yin Tang
Yuanchun Li
Yunxin Liu
Deyu Zhang
44
0
0
17 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Hongzhi Zhang
Jun Wang
168
0
0
15 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Y. Liu
Jin Jiang
M. Zhang
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
Exploiting Edited Large Language Models as General Scientific Optimizers
Qitan Lv
T. Liu
Haoyu Wang
41
0
0
08 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Z. Wang
MoE
137
0
0
06 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
120
5
0
03 Mar 2025
CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
68
0
0
21 Feb 2025
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
51
0
0
16 Feb 2025
Auditing Prompt Caching in Language Model APIs
Chenchen Gu
Xiang Lisa Li
Rohith Kuditipudi
Percy Liang
Tatsunori Hashimoto
76
0
0
11 Feb 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
108
141
0
22 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
148
1
0
15 Jan 2025
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Yu Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Zehua Wang
Hui Xiong
Hui Xiong
LLMAG
34
5
0
05 Nov 2024
On the Design and Analysis of LLM-Based Algorithms
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
48
5
0
20 Jul 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
57
25
0
07 May 2024
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin
Zili Zhang
Xuanlin Jiang
Fangyue Liu
Xin Liu
Xuanzhe Liu
Xin Jin
40
39
0
18 Apr 2024
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao
Kaiqi Chen
Kexun Zhang
Jiaxuan You
Binhang Yuan
Zeke Wang
Tao Lin
35
2
0
30 Mar 2024
Optimizing LLM Queries in Relational Data Analytics Workloads
Shu Liu
Asim Biswal
Audrey Cheng
Xiangxi Mo
Shiyi Cao
...
Ion Stoica
Matei A. Zaharia
Ion Stoica
Joseph E. Gonzalez
Matei Zaharia
66
18
0
09 Mar 2024
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Jordan Juravsky
Bradley Brown
Ryan Ehrlich
Daniel Y. Fu
Christopher Ré
Azalia Mirhoseini
58
36
0
07 Feb 2024
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,742
0
07 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
298
3,007
0
22 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,248
0
21 Mar 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1