ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.04992
  4. Cited By
InstInfer: In-Storage Attention Offloading for Cost-Effective
  Long-Context LLM Inference

InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

8 September 2024
Xiurui Pan
Endian Li
Qiao Li
Shengwen Liang
Yizhou Shan
Ke Zhou
Yingwei Luo
Xiaolin Wang
Jie Zhang
ArXiv (abs)PDFHTML

Papers citing "InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference"

9 / 9 papers shown
Title
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAGKELM
508
3
0
03 Apr 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
505
0
0
08 Jan 2025
Personal LLM Agents: Insights and Survey about the Capability,
  Efficiency and Security
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
Yuanchun Li
Hao Wen
Weijun Wang
Xiangyu Li
Yizhen Yuan
...
Zhijun Li
Peng Li
Yang Liu
Yaqiong Zhang
Yunxin Liu
LLMAG
81
189
0
10 Jan 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Zirui Liu
Chia-Yuan Chang
Huiyuan Chen
Helen Zhou
108
117
0
02 Jan 2024
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
147
313
0
24 Jun 2023
Element-aware Summarization with Large Language Models: Expert-aligned
  Evaluation and Chain-of-Thought Method
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method
Yiming Wang
Zhuosheng Zhang
Rui Wang
112
87
0
22 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,748
0
15 Mar 2023
An Attentive Survey of Attention Models
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
149
662
0
05 Apr 2019
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for
  Reading Comprehension
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
231
2,686
0
09 May 2017
1