Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

17 June 2024

Papers citing "Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead"

2 / 2 papers shown

Title
RaSA: Rank-Sharing Low-Rank Adaptation Zhiwei He Zhaopeng Tu Xing Wang Xingyu Chen Zekun Wang Jiahao Xu Tian Liang Wenxiang Jiao Z. Zhang Rui Wang ALM 87 1 0 16 Mar 2025
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention Emily Xiao Chin-Jou Li Yilin Zhang Graham Neubig Amanda Bertsch BDL 72 0 0 11 Mar 2025