Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.10430
Cited By
v1
v2 (latest)
Pay Less Attention with Lightweight and Dynamic Convolutions
29 January 2019
Felix Wu
Angela Fan
Alexei Baevski
Yann N. Dauphin
Michael Auli
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Pay Less Attention with Lightweight and Dynamic Convolutions"
50 / 241 papers shown
Title
Container: Context Aggregation Network
Peng Gao
Jiasen Lu
Hongsheng Li
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
106
72
0
02 Jun 2021
Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation
Wenxiang Jiao
Xing Wang
Zhaopeng Tu
Shuming Shi
Michael R. Lyu
Irwin King
UQLM
65
35
0
02 Jun 2021
DLA-Net: Learning Dual Local Attention Features for Semantic Segmentation of Large-Scale Building Facade Point Clouds
Yanfei Su
Weiquan Liu
Zhimin Yuan
Ming Cheng
Zhihong Zhang
Xuelun Shen
Cheng-Yu Wang
3DPC
108
40
0
01 Jun 2021
Memory-Efficient Differentiable Transformer Architecture Search
Yuekai Zhao
Li Dong
Yelong Shen
Zhihua Zhang
Furu Wei
Weizhu Chen
ViT
66
17
0
31 May 2021
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
Zhiyong Wu
Lingpeng Kong
W. Bi
Xiang Li
B. Kao
LRM
71
81
0
30 May 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Jin Xu
Xu Tan
Renqian Luo
Kaitao Song
Jian Li
Tao Qin
Tie-Yan Liu
MQ
62
79
0
30 May 2021
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
94
132
0
28 May 2021
Controllable Abstractive Dialogue Summarization with Sketch Supervision
Chien-Sheng Wu
Linqing Liu
Wenhao Liu
Pontus Stenetorp
Caiming Xiong
83
52
0
28 May 2021
Learning Language Specific Sub-network for Multilingual Machine Translation
Zehui Lin
Liwei Wu
Mingxuan Wang
Lei Li
82
84
0
19 May 2021
Pay Attention to MLPs
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
168
672
0
17 May 2021
Dynamic Pooling Improves Nanopore Base Calling Accuracy
V. Boža
Peter Perešíni
Broňa Brejová
T. Vinař
113
4
0
16 May 2021
The Volctrans Neural Speech Translation System for IWSLT 2021
Chengqi Zhao
Zhicheng Liu
Jian-Fei Tong
Tao Wang
Mingxuan Wang
Rong Ye
Qianqian Dong
Jun Cao
Lei Li
59
8
0
16 May 2021
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
106
99
0
10 May 2021
Are Pre-trained Convolutions Better than Pre-trained Transformers?
Yi Tay
Mostafa Dehghani
J. Gupta
Dara Bahri
V. Aribandi
Zhen Qin
Donald Metzler
AI4CE
77
49
0
07 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
527
2,721
0
04 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
94
25
0
20 Apr 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
177
466
0
18 Apr 2021
How to Train BERT with an Academic Budget
Peter Izsak
Moshe Berchansky
Omer Levy
142
119
0
15 Apr 2021
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
Zhen Wu
Lijun Wu
Qi Meng
Yingce Xia
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
88
22
0
11 Apr 2021
Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog
Arun Babu
Akshat Shrivastava
Armen Aghajanyan
Ahmed Aly
Angela Fan
Marjan Ghazvininejad
68
21
0
11 Apr 2021
Learning Graph Structures with Transformer for Multivariate Time Series Anomaly Detection in IoT
Zekai Chen
Dingshuo Chen
Xiao Zhang
Zixuan Yuan
Xiuzhen Cheng
AI4TS
100
359
0
08 Apr 2021
Do We Need Anisotropic Graph Neural Networks?
Shyam A. Tailor
Felix L. Opolka
Pietro Lio
Nicholas D. Lane
99
35
0
03 Apr 2021
Dual Contrastive Loss and Attention for GANs
Ning Yu
Guilin Liu
Aysegül Dündar
Andrew Tao
Bryan Catanzaro
Larry S. Davis
Mario Fritz
GAN
121
61
0
31 Mar 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
164
1,927
0
29 Mar 2021
Unified Graph Structured Models for Video Understanding
Anurag Arnab
Chen Sun
Cordelia Schmid
125
46
0
29 Mar 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
68
75
0
25 Mar 2021
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
114
67
0
24 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
136
362
0
03 Mar 2021
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
100
128
0
23 Feb 2021
MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records
Zhen Xu
David R. So
Andrew M. Dai
Mamba
111
53
0
03 Feb 2021
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Madhura Pande
Aakriti Budhraja
Preksha Nema
Pratyush Kumar
Mitesh M. Khapra
68
19
0
22 Jan 2021
Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers
Machel Reid
Edison Marrese-Taylor
Y. Matsuo
MoE
112
48
0
01 Jan 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
...
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip Torr
Li Zhang
ViT
206
2,928
0
31 Dec 2020
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
99
110
0
31 Dec 2020
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
97
18
0
30 Dec 2020
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Tri Dao
N. Sohoni
Albert Gu
Matthew Eichhorn
Amit Blonder
Megan Leszczynski
Atri Rudra
Christopher Ré
90
49
0
29 Dec 2020
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning
Xuebo Liu
Longyue Wang
Derek F. Wong
Liang Ding
Lidia S. Chao
Zhaopeng Tu
AI4CE
66
35
0
29 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
171
40
0
27 Dec 2020
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation
Y. Nirkin
Lior Wolf
Tal Hassner
SSeg
83
180
0
21 Dec 2020
Comparison of Attention-based Deep Learning Models for EEG Classification
Giulia Cisotto
Alessio Zanga
Joanna Chlebus
I. Zoppis
Sara Manzoni
Urszula Markowska-Kaczmar
55
20
0
02 Dec 2020
Deeper or Wider Networks of Point Clouds with Self-attention?
Haoxi Ran
Li Lu
3DPC
53
1
0
29 Nov 2020
Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding
Shohreh Deldari
Daniel V. Smith
Hao Xue
Flora D. Salim
AI4TS
125
112
0
28 Nov 2020
Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
Shruti Bhosale
Kyra Yee
Sergey Edunov
Michael Auli
85
7
0
13 Nov 2020
BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention
Zhebin Zhang
Sai Wu
Dawei Jiang
Gang Chen
48
0
0
09 Nov 2020
Layer-Wise Multi-View Learning for Neural Machine Translation
Qiang Wang
Changliang Li
Yue Zhang
Tong Xiao
Jingbo Zhu
26
3
0
03 Nov 2020
The Volctrans Machine Translation System for WMT20
Liwei Wu
Xiao Pan
Zehui Lin
Yaoming Zhu
Mingxuan Wang
Lei Li
VLM
52
17
0
28 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
84
32
0
23 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
76
103
0
22 Oct 2020
Beyond English-Centric Multilingual Machine Translation
Angela Fan
Shruti Bhosale
Holger Schwenk
Zhiyi Ma
Ahmed El-Kishky
...
Vitaliy Liptchinsky
Sergey Edunov
Edouard Grave
Michael Auli
Armand Joulin
LRM
102
865
0
21 Oct 2020
Multi-Unit Transformers for Neural Machine Translation
Jianhao Yan
Fandong Meng
Jie Zhou
59
17
0
21 Oct 2020
Previous
1
2
3
4
5
Next