Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.04451
Cited By
Reformer: The Efficient Transformer
13 January 2020
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reformer: The Efficient Transformer"
50 / 498 papers shown
Title
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
36
15
0
28 Sep 2023
Learning the Efficient Frontier
Philippe Chatigny
Ivan Sergienko
Ryan Ferguson
Jordan Weir
Maxime Bergeron
19
1
0
27 Sep 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia
61
153
0
21 Sep 2023
Transformers versus LSTMs for electronic trading
Paul Bilokon
Yitao Qiu
AI4TS
AIFin
21
13
0
20 Sep 2023
Embed-Search-Align: DNA Sequence Alignment using Transformer Models
Pavan Holur
Kenneth C. Enevoldsen
Shreyas Rajesh
L. Mboning
Thalia Georgiou
Louis-S. Bouchard
Matteo Pellegrini
V. Roychowdhury
27
0
0
20 Sep 2023
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Sicheng Yang
Zehao Wang
Zhiyong Wu
Minglei Li
Zhensong Zhang
...
Lei Hao
Songcen Xu
Xiaofei Wu
Changpeng Yang
Zonghong Dai
DiffM
54
14
0
13 Sep 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson
Yin Li
M. Gupta
ViT
45
8
0
25 Aug 2023
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao Song
Chiwun Yang
45
29
0
23 Aug 2023
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
64
8
0
23 Aug 2023
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Chenxi Sun
Hongyan Li
Yaliang Li
linda Qiao
AI4TS
42
115
0
16 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
28
3
0
07 Aug 2023
Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs
Omri Suissa
M. Zhitomirsky-Geffet
Avshalom Elmalech
GNN
BDL
34
8
0
30 Jul 2023
Attention over pre-trained Sentence Embeddings for Long Document Classification
Amine Abdaoui
Sourav Dutta
27
1
0
18 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
41
152
0
05 Jul 2023
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
M. Wong
Shangxin Guo
Ching Nam Hang
Siu-Wai Ho
C. Tan
44
78
0
04 Jul 2023
Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network
Zizhuo Li
Jiayi Ma
36
2
0
04 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
29
5
0
03 Jul 2023
Long-range Language Modeling with Self-retrieval
Ohad Rubin
Jonathan Berant
RALM
KELM
36
18
0
23 Jun 2023
FDNet: Focal Decomposed Network for Efficient, Robust and Practical Time Series Forecasting
Li Shen
Yuning Wei
Yangzhu Wang
Huaxin Qiu
OOD
AI4TS
13
7
0
19 Jun 2023
PaReprop: Fast Parallelized Reversible Backpropagation
Tyler Lixuan Zhu
K. Mangalam
17
1
0
15 Jun 2023
Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
Zhiyu Jin
Xuli Shen
Bin Li
Xiangyang Xue
34
36
0
14 Jun 2023
Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for Multivariate Time Series Forecasting Dependency for Multivariate Time Series Forecasting
Donghao Luo
Xue Wang
BDL
AI4TS
21
2
0
04 Jun 2023
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Baohao Liao
Shaomu Tan
Christof Monz
KELM
23
29
0
01 Jun 2023
Recasting Self-Attention with Holographic Reduced Representations
Mohammad Mahmudul Alam
Edward Raff
Stella Biderman
Tim Oates
James Holt
16
8
0
31 May 2023
Koopa: Learning Non-stationary Time Series Dynamics with Koopman Predictors
Yong Liu
Chenyu Li
Jianmin Wang
Mingsheng Long
AI4TS
37
105
0
30 May 2023
Networked Time Series Imputation via Position-aware Graph Enhanced Variational Autoencoders
Dingsu Wang
Yuchen Yan
Ruizhong Qiu
Yada Zhu
Kaiyu Guan
A. Margenot
Hanghang Tong
AI4TS
45
28
0
29 May 2023
Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers
Zahra Atashgahi
Mykola Pechenizkiy
Raymond N. J. Veldhuis
Decebal Constantin Mocanu
AI4TS
AI4CE
34
1
0
28 May 2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu
Aditya Desai
Fangshuo Liao
Weitao Wang
Victor Xie
Zhaozhuo Xu
Anastasios Kyrillidis
Anshumali Shrivastava
33
204
0
26 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng-Da Yang
Minwei Feng
Jingcheng Yin
Xinbing Wang
Jingwen Leng
Zhouhan Lin
ViT
40
13
0
24 May 2023
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model
Leo Liu
Tim Dettmers
Xi Lin
Ves Stoyanov
Xian Li
MoE
26
9
0
23 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
92
562
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
32
12
0
22 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
Hao Chen
Jingkuan Song
Feng Zheng
ViT
32
0
0
17 May 2023
SoundStorm: Efficient Parallel Audio Generation
Zalan Borsos
Matthew Sharifi
Damien Vincent
Eugene Kharitonov
Neil Zeghidour
Marco Tagliasacchi
28
98
0
16 May 2023
DLUE: Benchmarking Document Language Understanding
Ruoxi Xu
Hongyu Lin
Xinyan Guan
Xianpei Han
Yingfei Sun
Le Sun
ELM
44
0
0
16 May 2023
Treasure What You Have: Exploiting Similarity in Deep Neural Networks for Efficient Video Processing
Hadjer Benmeziane
Halima Bouzidi
Hamza Ouarnoughi
Ozcan Ozturk
Smail Niar
38
0
0
10 May 2023
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Zhanpeng Zeng
Cole Hawkins
Min-Fong Hong
Aston Zhang
Nikolaos Pappas
Vikas Singh
Shuai Zheng
21
6
0
07 May 2023
The Role of Global and Local Context in Named Entity Recognition
Arthur Amalvy
Vincent Labatut
Richard Dufour
38
4
0
04 May 2023
Cuttlefish: Low-Rank Model Training without All the Tuning
Hongyi Wang
Saurabh Agarwal
Pongsakorn U-chupala
Yoshiki Tanaka
Eric P. Xing
Dimitris Papailiopoulos
OffRL
63
22
0
04 May 2023
Learning to Extrapolate: A Transductive Approach
Aviv Netanyahu
Abhishek Gupta
Max Simchowitz
Kaipeng Zhang
Pulkit Agrawal
49
15
0
27 Apr 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
27
1
0
17 Apr 2023
Training Large Language Models Efficiently with Sparsity and Dataflow
V. Srinivasan
Darshan Gandhi
Urmish Thakker
R. Prabhakar
MoE
38
6
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
32
41
0
07 Apr 2023
ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention
Alec Diaz-Arias
Dmitriy Shin
ViT
18
10
0
04 Apr 2023
Dialogue-Contextualized Re-ranking for Medical History-Taking
Jian Zhu
Ilya Valmianski
Anitha Kannan
21
1
0
04 Apr 2023
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Abdelrahman M. Shaker
Muhammad Maaz
H. Rasheed
Salman Khan
Ming Yang
Fahad Shahbaz Khan
ViT
53
84
0
27 Mar 2023
You Only Segment Once: Towards Real-Time Panoptic Segmentation
Jie Hu
Linyan Huang
Tianhe Ren
Shengchuan Zhang
Rongrong Ji
Liujuan Cao
SSeg
46
55
0
26 Mar 2023
An Evaluation of Memory Optimization Methods for Training Neural Networks
Xiaoxuan Liu
Siddharth Jha
Alvin Cheung
29
0
0
26 Mar 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next