ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.02436
  4. Cited By
Talking-Heads Attention

Talking-Heads Attention

5 March 2020
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
ArXivPDFHTML

Papers citing "Talking-Heads Attention"

16 / 16 papers shown
Title
Depth-Wise Convolutions in Vision Transformers for Efficient Training on
  Small Datasets
Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Tianxiao Zhang
Wenju Xu
Bo Luo
Guanghui Wang
ViT
MDE
40
7
0
28 Jul 2024
Finding Stakeholder-Material Information from 10-K Reports using
  Fine-Tuned BERT and LSTM Models
Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models
V. Z. Chen
24
0
0
15 Aug 2023
Semantic Feature Integration network for Fine-grained Visual
  Classification
Semantic Feature Integration network for Fine-grained Visual Classification
Haibo Wang
Yueyang Li
Haichi Luo
42
0
0
13 Feb 2023
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
29
2
0
20 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
30
159
0
15 Dec 2022
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
Yunlong Liang
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
16
2
0
28 Nov 2022
FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration
FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration
Yangjun Wu
Kebin Fang
Yao Zhao
Hao Zhang
Lifeng Shi
Mengqi Zhang
34
0
0
09 Nov 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
23
123
0
14 Apr 2022
Streaming Transformer Transducer Based Speech Recognition Using
  Non-Causal Convolution
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
28
15
0
07 Oct 2021
WeChat Neural Machine Translation Systems for WMT21
WeChat Neural Machine Translation Systems for WMT21
Xianfeng Zeng
Yanjun Liu
Ernan Li
Qiu Ran
Fandong Meng
Peng Li
Jinan Xu
Jie Zhou
25
20
0
05 Aug 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
23
39
0
07 Jul 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Vision Transformers with Patch Diversification
Vision Transformers with Patch Diversification
Chengyue Gong
Dilin Wang
Meng Li
Vikas Chandra
Qiang Liu
ViT
37
62
0
26 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
25
986
0
31 Mar 2021
Global Attention based Graph Convolutional Neural Networks for Improved
  Materials Property Prediction
Global Attention based Graph Convolutional Neural Networks for Improved Materials Property Prediction
Steph-Yves M. Louis
Yong Zhao
Alireza Nasiri
Xiran Wong
Yuqi Song
Fei Liu
Jianjun Hu
AI4CE
17
15
0
11 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1