Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.07772
Cited By
Very Deep Transformers for Neural Machine Translation
18 August 2020
Xiaodong Liu
Kevin Duh
Liyuan Liu
Jianfeng Gao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Very Deep Transformers for Neural Machine Translation"
50 / 52 papers shown
Title
IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic
Hassan Wasswa
Timothy Lynar
Aziida Nanyonga
Hussein Abbass
58
1
0
26 Apr 2025
Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder
Hassan Wasswa
Aziida Nanyonga
Timothy Lynar
DRL
53
2
0
21 Apr 2025
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
68
0
0
18 Mar 2025
Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation
Kamal Kumar
Yinhan Liu
Parth Patwa
Tanmoy
Mihir Adam Roberts
29
1
0
25 Mar 2024
Enhancing Context Through Contrast
Kshitij Ambilduke
Aneesh Shetye
Diksha Bagade
Rishika Bhagwatkar
Khurshed Fitter
P. Vagdargi
Shital S. Chiddarwar
26
0
0
06 Jan 2024
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
J. Hu
Roberto Cavicchioli
Giulia Berardinelli
Alessandro Capotondi
44
2
0
26 Dec 2023
CLIP-QDA: An Explainable Concept Bottleneck Model
Rémi Kazmierczak
Eloise Berthier
Goran Frehse
Gianni Franchi
24
7
0
30 Nov 2023
Core Building Blocks: Next Gen Geo Spatial GPT Application
Ashley Fernandez
Swaraj Dube
24
5
0
17 Oct 2023
Enhanced Transformer Architecture for Natural Language Processing
Woohyeon Moon
Taeyoung Kim
Bumgeun Park
Dongsoo Har
30
0
0
17 Oct 2023
Sparse Universal Transformer
Shawn Tan
Songlin Yang
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
40
12
0
11 Oct 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
39
9
0
29 Aug 2023
UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language
Nuwa Xi
Sendong Zhao
Hao Wang
Chi-Liang Liu
Bing Qin
Ting Liu
32
19
0
06 Jul 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
66
261
0
24 Jun 2023
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
26
2
0
15 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design
Shashank Sonkar
Richard G. Baraniuk
19
3
0
22 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
41
80
0
17 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
38
0
0
10 May 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
40
1
0
21 Mar 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
48
68
0
11 Mar 2023
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
Tianlong Chen
Zhenyu Zhang
Ajay Jaiswal
Shiwei Liu
Zhangyang Wang
MoE
43
46
0
02 Mar 2023
An Evaluation of Persian-English Machine Translation Datasets with Transformers
A. Sartipi
Meghdad Dehghan
A. Fatemi
35
3
0
01 Feb 2023
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
Grammatical Error Correction: A Survey of the State of the Art
Christopher Bryant
Zheng Yuan
Muhammad Reza Qorib
Hannan Cao
Hwee Tou Ng
Ted Briscoe
3DV
29
79
0
09 Nov 2022
Emergent Linguistic Structures in Neural Networks are Fragile
Emanuele La Malfa
Matthew Wicker
Marta Kiatkowska
22
1
0
31 Oct 2022
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Songlin Yang
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
99
42
0
11 Oct 2022
Effective General-Domain Data Inclusion for the Machine Translation Task by Vanilla Transformers
H. Soliman
37
0
0
28 Sep 2022
Variational Inference for Infinitely Deep Neural Networks
Achille Nazaret
David M. Blei
BDL
25
11
0
21 Sep 2022
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages
Paul Soulos
Sudha Rao
Caitlin Smith
Eric Rosen
Asli Celikyilmaz
...
Coleman Haley
Roland Fernandez
Hamid Palangi
Jianfeng Gao
P. Smolensky
32
6
0
11 Aug 2022
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
Jian Yang
Yuwei Yin
Liqun Yang
Shuming Ma
Haoyang Huang
Dongdong Zhang
Furu Wei
Zhoujun Li
AI4CE
22
16
0
29 Jul 2022
Why Robust Natural Language Understanding is a Challenge
Marco Casadio
Ekaterina Komendantskaya
Verena Rieser
M. Daggitt
Daniel Kienitz
Luca Arnaboldi
Wen Kokke
OOD
AAML
30
0
0
21 Jun 2022
Isomorphic Cross-lingual Embeddings for Low-Resource Languages
Sonal Sannigrahi
Jesse Read
32
1
0
28 Mar 2022
Towards Personalized Intelligence at Scale
Yiping Kang
Ashish Mahendra
Christopher Clarke
Lingjia Tang
Jason Mars
31
1
0
13 Mar 2022
Cedille: A large autoregressive French language model
Martin Müller
Florian Laurent
36
19
0
07 Feb 2022
FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Shien Zhu
Luan H. K. Duong
Hui Chen
Di Liu
Weichen Liu
MQ
24
5
0
19 Jan 2022
The King is Naked: on the Notion of Robustness for Natural Language Processing
Emanuele La Malfa
Marta Z. Kwiatkowska
20
28
0
13 Dec 2021
ZeBRA: Precisely Destroying Neural Networks with Zero-Data Based Repeated Bit Flip Attack
Dahoon Park
K. Kwon
Sunghoon Im
Jaeha Kung
AAML
16
3
0
01 Nov 2021
Taming Sparsely Activated Transformer with Stochastic Experts
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
44
109
0
08 Oct 2021
Text analysis and deep learning: A network approach
Ingo Marquart
25
0
0
08 Oct 2021
WeChat Neural Machine Translation Systems for WMT21
Xianfeng Zeng
Yanjun Liu
Ernan Li
Qiu Ran
Fandong Meng
Peng Li
Jinan Xu
Jie Zhou
25
20
0
05 Aug 2021
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
47
424
0
28 Jun 2021
Controlling Neural Networks with Rule Representations
Sungyong Seo
Sercan Ö. Arik
Jinsung Yoon
Xiang Zhang
Kihyuk Sohn
Tomas Pfister
OOD
AI4CE
32
35
0
14 Jun 2021
Self-supervised and Supervised Joint Training for Resource-rich Machine Translation
Yong Cheng
Wei Wang
Lu Jiang
Wolfgang Macherey
26
17
0
08 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OOD
ViT
33
55
0
06 Jun 2021
Cascaded Head-colliding Attention
Lin Zheng
Zhiyong Wu
Lingpeng Kong
27
2
0
31 May 2021
On Compositional Generalization of Neural Machine Translation
Yafu Li
Yongjing Yin
Yulong Chen
Yue Zhang
156
45
0
31 May 2021
A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen
Henry Tsai
Srinadh Bhojanapalli
Hyung Won Chung
Yin-Wen Chang
Chun-Sung Ferng
61
62
0
18 Apr 2021
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
47
26
0
01 Mar 2021
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation
Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
34
51
0
30 Oct 2020
On Learning Universal Representations Across Languages
Xiangpeng Wei
Rongxiang Weng
Yue Hu
Luxi Xing
Heng Yu
Weihua Luo
SSL
VLM
33
85
0
31 Jul 2020
1
2
Next