
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
Surin Ahn
Zhenhua Han
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
Papers citing "MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention"
38 / 38 papers shown
Title |
---|
![]() Bench: Extending Long Context Evaluation Beyond 100K Tokens Xinrong Zhang Yingfa Chen Shengding Hu Zihang Xu Junhao Chen ...Xu Han Zhen Leng Thai Shuo Wang Zhiyuan Liu Maosong Sun |
![]() Qwen Technical Report Jinze Bai Shuai Bai Yunfei Chu Zeyu Cui Kai Dang ...Zhenru Zhang Chang Zhou Jingren Zhou Xiaohuan Zhou Tianhang Zhu |
![]() XGen-7B Technical Report Erik Nijkamp Tian Xie Hiroaki Hayashi Bo Pang Congying Xia ...Chien-Sheng Wu Silvio Savarese Yingbo Zhou Shafiq Joty Caiming Xiong |