Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.10430
Cited By
v1
v2 (latest)
Pay Less Attention with Lightweight and Dynamic Convolutions
29 January 2019
Felix Wu
Angela Fan
Alexei Baevski
Yann N. Dauphin
Michael Auli
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Pay Less Attention with Lightweight and Dynamic Convolutions"
50 / 241 papers shown
Title
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
67
5
0
15 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
360
5,143
0
08 Oct 2020
Shallow-to-Deep Training for Neural Machine Translation
Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
69
49
0
08 Oct 2020
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization
Jiaao Chen
Diyi Yang
80
148
0
04 Oct 2020
SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
Nathan Ng
Kyunghyun Cho
Marzyeh Ghassemi
102
146
0
21 Sep 2020
Very Deep Transformers for Neural Machine Translation
Xiaodong Liu
Kevin Duh
Liyuan Liu
Jianfeng Gao
94
104
0
18 Aug 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
168
551
0
17 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
137
163
0
06 Aug 2020
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
90
32
0
03 Aug 2020
Fine-Tune Longformer for Jointly Predicting Rumor Stance and Veracity
Anant Khandelwal
77
22
0
15 Jul 2020
Modeling Voting for System Combination in Machine Translation
Xuancheng Huang
Jiacheng Zhang
Zhixing Tan
Derek F. Wong
Huanbo Luan
Jingfang Xu
Maosong Sun
Yang Liu
50
60
0
14 Jul 2020
Rewiring the Transformer with Depth-Wise LSTMs
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
74
6
0
13 Jul 2020
Do Transformers Need Deep Long-Range Memory
Jack W. Rae
Ali Razavi
RALM
78
41
0
07 Jul 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
325
5,878
0
20 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
90
128
0
19 Jun 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
137
140
0
18 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
121
236
0
05 Jun 2020
Enhanced back-translation for low resource neural machine translation using self-training
Idris Abdulmumin
B. Galadanci
Abubakar Isa
SyDa
38
2
0
04 Jun 2020
Cross-model Back-translated Distillation for Unsupervised Machine Translation
Xuan-Phi Nguyen
Shafiq Joty
Thanh-Tung Nguyen
Wu Kui
Ai Ti Aw
45
14
0
03 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
107
263
0
28 May 2020
Variational Neural Machine Translation with Normalizing Flows
Hendra Setiawan
Matthias Sperber
Udhay Nallasamy
Matthias Paulik
DRL
58
12
0
28 May 2020
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
96
21
0
19 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
231
3,179
0
16 May 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
76
342
0
02 May 2020
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
100
790
0
28 Apr 2020
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
62
323
0
24 Apr 2020
DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks
Yikang Zhang
Jian Zhang
Qiang-qiang Wang
Zhaobai Zhong
66
90
0
22 Apr 2020
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
91
259
0
17 Apr 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Yekun Chai
Jin Shuo
Xinwen Hou
48
17
0
17 Apr 2020
Transform and Tell: Entity-Aware News Image Captioning
Alasdair Tran
A. Mathews
Lexing Xie
VLM
60
97
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
113
246
0
15 Apr 2020
Neural Machine Translation: Challenges, Progress and Future
Jiajun Zhang
Chengqing Zong
56
55
0
13 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
238
4,111
0
10 Apr 2020
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning
Zhaojiang Lin
Andrea Madotto
Pascale Fung
107
163
0
08 Apr 2020
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Marjan Ghazvininejad
Vladimir Karpukhin
Luke Zettlemoyer
Omer Levy
92
116
0
03 Apr 2020
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Dmitrii Aksenov
J. Moreno-Schneider
Peter Bourgonje
Robert Schwarzenberg
Leonhard Hennig
Georg Rehm
115
26
0
29 Mar 2020
PowerNorm: Rethinking Batch Normalization in Transformers
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
118
16
0
17 Mar 2020
Transformer++
Prakhar Thapak
P. Hore
16
0
0
02 Mar 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
155
1,511
0
27 Feb 2020
On Feature Normalization and Data Augmentation
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
56
138
0
25 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
102
92
0
24 Feb 2020
Tree-structured Attention with Hierarchical Accumulation
Xuan-Phi Nguyen
Shafiq Joty
Guosheng Lin
R. Socher
58
76
0
19 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
78
97
0
17 Feb 2020
Incorporating BERT into Neural Machine Translation
Jinhua Zhu
Yingce Xia
Lijun Wu
Di He
Tao Qin
Wen-gang Zhou
Houqiang Li
Tie-Yan Liu
FedML
AIMat
50
360
0
17 Feb 2020
Time-aware Large Kernel Convolutions
Vasileios Lioutas
Yuhong Guo
AI4TS
97
29
0
08 Feb 2020
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
Peter Henderson
Jie Hu
Joshua Romoff
Emma Brunskill
Dan Jurafsky
Joelle Pineau
108
459
0
31 Jan 2020
Semi-Autoregressive Training Improves Mask-Predict Decoding
Marjan Ghazvininejad
Omer Levy
Luke Zettlemoyer
109
72
0
23 Jan 2020
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
36
4
0
27 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
79
113
0
25 Dec 2019
Tag-less Back-Translation
Idris Abdulmumin
B. Galadanci
Aliyu Dadan Garba
86
11
0
22 Dec 2019
Previous
1
2
3
4
5
Next