Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.09483
Cited By
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
17 November 2019
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning"
9 / 9 papers shown
Title
TranSFormer: Slow-Fast Transformer for Machine Translation
Bei Li
Yi Jing
Xu Tan
Zhen Xing
Tong Xiao
Jingbo Zhu
49
7
0
26 May 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
90
574
0
10 Feb 2023
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
MoMe
29
11
0
21 Oct 2022
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
28
2
0
30 Aug 2022
Born for Auto-Tagging: Faster and better with new objective functions
Chiung-ju Liu
Huang-Ting Shieh
30
1
0
15 Jun 2022
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
47
424
0
28 Jun 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
34
72
0
25 Mar 2021
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
1