Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.13342
Cited By
Scheduled DropHead: A Regularization Method for Transformer Models
28 April 2020
Wangchunshu Zhou
Tao Ge
Ke Xu
Furu Wei
Ming Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scheduled DropHead: A Regularization Method for Transformer Models"
10 / 10 papers shown
Title
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
94
2
0
21 Feb 2025
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
28
12
0
20 Apr 2023
Semi-Structured Object Sequence Encoders
V. Rudramurthy
Riyaz Ahmad Bhat
Chulaka Gunasekara
Siva Sankalp Patel
H. Wan
Tejas I. Dhamecha
Danish Contractor
Marina Danilevsky
61
0
0
03 Jan 2023
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
A Survey on Dropout Methods and Experimental Verification in Recommendation
Yong Li
Weizhi Ma
C. L. Philip Chen
Hao Fei
Yiqun Liu
Shaoping Ma
Yue Yang
33
9
0
05 Apr 2022
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Fangyu Liu
Yunlong Jiao
Jordan Massiah
Emine Yilmaz
Serhii Havrylov
SSL
95
29
0
27 Sep 2021
How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?
Chantal Amrhein
Rico Sennrich
27
13
0
02 Sep 2021
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
47
424
0
28 Jun 2021
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
Zhen Wu
Lijun Wu
Qi Meng
Yingce Xia
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
18
22
0
11 Apr 2021
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,925
0
17 Aug 2015
1