ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.10430
  4. Cited By
Pay Less Attention with Lightweight and Dynamic Convolutions
v1v2 (latest)

Pay Less Attention with Lightweight and Dynamic Convolutions

29 January 2019
Felix Wu
Angela Fan
Alexei Baevski
Yann N. Dauphin
Michael Auli
ArXiv (abs)PDFHTML

Papers citing "Pay Less Attention with Lightweight and Dynamic Convolutions"

41 / 241 papers shown
Title
Are Transformers universal approximators of sequence-to-sequence
  functions?
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
142
359
0
20 Dec 2019
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DVAI4TSMedIm
142
332
0
04 Dec 2019
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive
  Summarization
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
Bogdan Gliwa
Iwona Mochol
M. Biesek
A. Wawer
170
640
0
27 Nov 2019
Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for
  Distantly Supervised Relation Extraction
Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction
Tao Shen
Guodong Long
Tao Shen
Dinesh Manocha
Lina Yao
Huan Huo
Jing Jiang
90
80
0
27 Nov 2019
Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model
Idris Abdulmumin
B. Galadanci
Abubakar Isa
42
0
0
26 Nov 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
61
49
0
17 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALMVLMKELM
105
656
0
13 Nov 2019
Two-Headed Monster And Crossed Co-Attention Networks
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
64
0
0
10 Nov 2019
Data Diversification: A Simple Strategy For Neural Machine Translation
Data Diversification: A Simple Strategy For Neural Machine Translation
Xuan-Phi Nguyen
Shafiq Joty
Wu Kui
Ai Ti Aw
117
15
0
05 Nov 2019
Depth-Adaptive Transformer
Depth-Adaptive Transformer
Maha Elbayad
Jiatao Gu
Edouard Grave
Michael Auli
94
195
0
22 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
160
597
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
138
1,881
0
23 Sep 2019
Multi-agent Learning for Neural Machine Translation
Multi-agent Learning for Neural Machine Translation
Tianchi Bi
Hao Xiong
Zhongjun He
Hua Wu
Haifeng Wang
AI4CE
55
12
0
03 Sep 2019
A Unified Neural Coherence Model
A Unified Neural Coherence Model
Han Cheol Moon
Tasnim Mohiuddin
Shafiq Joty
Xu Chi
37
47
0
01 Sep 2019
Adaptively Sparse Transformers
Adaptively Sparse Transformers
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
141
257
0
30 Aug 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged
  Attention
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Biao Zhang
Ivan Titov
Rico Sennrich
64
104
0
29 Aug 2019
Revealing the Dark Secrets of BERT
Revealing the Dark Secrets of BERT
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
96
556
0
21 Aug 2019
Dynamic Graph Message Passing Networks
Dynamic Graph Message Passing Networks
Li Zhang
Dan Xu
Anurag Arnab
Philip Torr
GNN
102
138
0
19 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
69
82
0
10 Aug 2019
UdS Submission for the WMT 19 Automatic Post-Editing Task
UdS Submission for the WMT 19 Automatic Post-Editing Task
Hongfei Xu
Qiuhui Liu
Josef van Genabith
21
4
0
09 Aug 2019
Extracting Interpretable Physical Parameters from Spatiotemporal Systems
  using Unsupervised Learning
Extracting Interpretable Physical Parameters from Spatiotemporal Systems using Unsupervised Learning
Peter Y. Lu
Samuel Kim
Marin Soljacic
AI4CE
58
60
0
13 Jul 2019
Massively Multilingual Neural Machine Translation in the Wild: Findings
  and Challenges
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
N. Arivazhagan
Ankur Bapna
Orhan Firat
Dmitry Lepikhin
Melvin Johnson
...
George F. Foster
Colin Cherry
Wolfgang Macherey
Zhiwen Chen
Yonghui Wu
109
429
0
11 Jul 2019
Positional Normalization
Positional Normalization
Boyi Li
Felix Wu
Kilian Q. Weinberger
Serge J. Belongie
78
92
0
09 Jul 2019
The Indirect Convolution Algorithm
The Indirect Convolution Algorithm
Marat Dukhan
70
42
0
03 Jul 2019
Augmenting Self-attention with Persistent Memory
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Hervé Jégou
Armand Joulin
RALMKELM
77
139
0
02 Jul 2019
The University of Sydney's Machine Translation System for WMT19
The University of Sydney's Machine Translation System for WMT19
Liang Ding
Dacheng Tao
57
13
0
30 Jun 2019
GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation
GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation
Marc Brockschmidt
118
141
0
28 Jun 2019
Stand-Alone Self-Attention in Vision Models
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLMSLRViT
179
1,218
0
13 Jun 2019
Understanding and Improving Transformer From a Multi-Particle Dynamic
  System Point of View
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
84
176
0
06 Jun 2019
Revisiting Low-Resource Neural Machine Translation: A Case Study
Revisiting Low-Resource Neural Machine Translation: A Case Study
Rico Sennrich
Biao Zhang
76
223
0
28 May 2019
Joint Source-Target Self Attention with Locality Constraints
Joint Source-Target Self Attention with Locality Constraints
José A. R. Fonollosa
Noe Casas
Marta R. Costa-jussá
61
23
0
16 May 2019
Taming Pretrained Transformers for Extreme Multi-label Text
  Classification
Taming Pretrained Transformers for Extreme Multi-label Text Classification
Wei-Cheng Chang
Hsiang-Fu Yu
Kai Zhong
Yiming Yang
Inderjit Dhillon
75
20
0
07 May 2019
Low-Memory Neural Network Training: A Technical Report
Low-Memory Neural Network Training: A Technical Report
N. Sohoni
Christopher R. Aberger
Megan Leszczynski
Jian Zhang
Christopher Ré
92
103
0
24 Apr 2019
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
622
5,892
0
21 Apr 2019
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
Xizhou Zhu
Dazhi Cheng
Zheng Zhang
Stephen Lin
Jifeng Dai
92
416
0
11 Apr 2019
CondConv: Conditionally Parameterized Convolutions for Efficient
  Inference
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Brandon Yang
Gabriel Bender
Quoc V. Le
Jiquan Ngiam
MedIm3DV
100
642
0
10 Apr 2019
Sequence-to-Sequence Speech Recognition with Time-Depth Separable
  Convolutions
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
Awni Y. Hannun
Ann Lee
Qiantong Xu
R. Collobert
78
97
0
04 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLMFaML
185
3,160
0
01 Apr 2019
Strategies for Structuring Story Generation
Strategies for Structuring Story Generation
Angela Fan
M. Lewis
Yann N. Dauphin
116
216
0
04 Feb 2019
The Evolved Transformer
The Evolved Transformer
David R. So
Chen Liang
Quoc V. Le
ViT
141
467
0
30 Jan 2019
Tensorized Embedding Layers for Efficient Model Compression
Tensorized Embedding Layers for Efficient Model Compression
Oleksii Hrinchuk
Valentin Khrulkov
L. Mirvakhabova
Elena Orlova
Ivan Oseledets
91
73
0
30 Jan 2019
Previous
12345