ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.04946
  4. Cited By
UniDrop: A Simple yet Effective Technique to Improve Transformer without
  Extra Cost

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

11 April 2021
Zhen Wu
Lijun Wu
Qi Meng
Yingce Xia
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
ArXivPDFHTML

Papers citing "UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost"

12 / 12 papers shown
Title
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
94
2
0
21 Feb 2025
Integrating Pre-trained Language Model into Neural Machine Translation
Integrating Pre-trained Language Model into Neural Machine Translation
Soon-Jae Hwang
Chang-Sung Jeong
24
0
0
30 Oct 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
30
2
0
14 Jun 2023
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For
  Vision-and-Language Navigation
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
Liuyi Wang
Chengju Liu
Zongtao He
Shu Li
Qingqing Yan
Huiyi Chen
Qi Chen
21
9
0
19 May 2023
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Pengzhi Gao
Zhongjun He
Hua-Hong Wu
Haifeng Wang
30
13
0
06 Jun 2022
BayesFormer: Transformer with Uncertainty Estimation
BayesFormer: Transformer with Uncertainty Estimation
Karthik Abinav Sankararaman
Sinong Wang
Han Fang
UQCV
BDL
22
10
0
02 Jun 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine
  Translation
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla
Logan Born
Anoop Sarkar
18
16
0
01 Apr 2022
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
J. E. Grigsby
Zhe Wang
Nam Nguyen
Yanjun Qi
AI4TS
69
87
0
24 Sep 2021
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural
  Machine Translation
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Haoran Xu
Benjamin Van Durme
Kenton W. Murray
50
57
0
09 Sep 2021
Not All Attention Is All You Need
Not All Attention Is All You Need
Hongqiu Wu
Hai Zhao
Min Zhang
14
9
0
10 Apr 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1