ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11065
  4. Cited By
DropAttention: A Regularization Method for Fully-Connected
  Self-Attention Networks
v1v2 (latest)

DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

25 July 2019
Zehui Lin
Pengfei Liu
Luyao Huang
Junkun Chen
Xipeng Qiu
Xuanjing Huang
    3DPC
ArXiv (abs)PDFHTML

Papers citing "DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks"

20 / 20 papers shown
Title
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
200
2
0
21 Feb 2025
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Haiyue Ma
Jian Liu
Ronny Krashinsky
57
0
0
10 Oct 2024
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics
  from Digital Pathology Images
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images
Hongyi Wang
Xiuju Du
Jing Liu
Shuyi Ouyang
Yen-Wei Chen
Lanfen Lin
72
0
0
23 Sep 2024
SAMSA: Efficient Transformer for Many Data Modalities
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
116
0
0
10 Aug 2024
Hierarchical Windowed Graph Attention Network and a Large Scale Dataset
  for Isolated Indian Sign Language Recognition
Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition
Suvajit Patra
Arkadip Maitra
Megha Tiwari
K. Kumaran
Swathy Prabhu
Swami Punyeshwarananda
Soumitra Samanta
75
5
0
19 Jul 2024
LoRA Meets Dropout under a Unified Framework
LoRA Meets Dropout under a Unified Framework
Sheng Wang
Liheng Chen
Jiyue Jiang
Boyang Xue
Lingpeng Kong
Chuan Wu
81
17
0
25 Feb 2024
Triplet Interaction Improves Graph Transformers: Accurate Molecular
  Graph Learning with Triplet Graph Transformers
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
ViT
122
8
0
07 Feb 2024
CacheGen: KV Cache Compression and Streaming for Fast Language Model
  Serving
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Yuhan Liu
Hanchen Li
Yihua Cheng
Siddhant Ray
Yuyang Huang
...
Ganesh Ananthanarayanan
Michael Maire
Henry Hoffmann
Ari Holtzman
Junchen Jiang
130
53
0
11 Oct 2023
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
176
3
0
02 Jun 2023
DropDim: A Regularization Method for Transformer Networks
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
79
12
0
20 Apr 2023
Neural Attentive Circuits
Neural Attentive Circuits
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
124
7
0
14 Oct 2022
Latent Neural ODEs with Sparse Bayesian Multiple Shooting
Latent Neural ODEs with Sparse Bayesian Multiple Shooting
V. Iakovlev
Çağatay Yıldız
Markus Heinonen
Harri Lähdesmäki
BDL
72
11
0
07 Oct 2022
DropKey
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
59
12
0
04 Aug 2022
Causal Transformer for Estimating Counterfactual Outcomes
Causal Transformer for Estimating Counterfactual Outcomes
Valentyn Melnychuk
Dennis Frauen
Stefan Feuerriegel
CML
113
99
0
14 Apr 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
Liu Liu
Yufei Ding
Yuan Xie
94
23
0
28 Feb 2022
Dropout Regularization for Self-Supervised Learning of Transformer
  Encoder Speech Representation
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Jian Luo
Jianzong Wang
Ning Cheng
Jing Xiao
SSL
79
6
0
09 Jul 2021
R-Drop: Regularized Dropout for Neural Networks
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
90
442
0
28 Jun 2021
Spatial Temporal Transformer Network for Skeleton-based Action
  Recognition
Spatial Temporal Transformer Network for Skeleton-based Action Recognition
Chiara Plizzari
Marco Cannici
Matteo Matteucci
ViT
66
200
0
11 Dec 2020
Skeleton-based Action Recognition via Spatial and Temporal Transformer
  Networks
Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks
Chiara Plizzari
Marco Cannici
Matteo Matteucci
ViTMedIm
86
312
0
17 Aug 2020
Scheduled DropHead: A Regularization Method for Transformer Models
Scheduled DropHead: A Regularization Method for Transformer Models
Wangchunshu Zhou
Tao Ge
Ke Xu
Furu Wei
Ming Zhou
62
36
0
28 Apr 2020
1