Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.09336
Cited By
Self-Selected Attention Span for Accelerating Large Language Model Inference
14 April 2024
Tian Jin
W. Yazar
Zifei Xu
Sayeh Sharify
Xin Eric Wang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Selected Attention Span for Accelerating Large Language Model Inference"
6 / 6 papers shown
Title
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
601
9,009
0
28 Jan 2022
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
93
384
0
17 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
491
2,051
0
28 Jul 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
102
3,996
0
10 Apr 2020
Decoupled Neural Interfaces using Synthetic Gradients
Max Jaderberg
Wojciech M. Czarnecki
Simon Osindero
Oriol Vinyals
Alex Graves
David Silver
Koray Kavukcuoglu
68
356
0
18 Aug 2016
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
264
6,628
0
08 Jun 2015
1