Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.11296
Cited By
Sparse Sinkhorn Attention
26 February 2020
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sparse Sinkhorn Attention"
33 / 33 papers shown
Title
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
331
0
0
03 Mar 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
92
2
0
24 Jan 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
74
17
0
06 Oct 2024
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Jingyun Xue
Tao Wang
Jun Wang
Kaihao Zhang
ViT
53
2
0
09 Mar 2024
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
Hao Shao
Quansheng Zeng
Qibin Hou
Jufeng Yang
58
14
0
14 Dec 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
63
4
0
18 Aug 2023
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Dianbo Sui
3DV
77
9
0
14 Oct 2022
Reformer: The Efficient Transformer
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
131
2,279
0
13 Jan 2020
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
30
636
0
13 Nov 2019
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
41
252
0
07 Nov 2019
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
Yi Tay
Shuohang Wang
Anh Tuan Luu
Jie Fu
Minh C. Phan
Xingdi Yuan
J. Rao
S. Hui
Aston Zhang
51
109
0
26 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
54
1,880
0
23 Apr 2019
Star-Transformer
Qipeng Guo
Xipeng Qiu
Pengfei Liu
Yunfan Shao
Xiangyang Xue
Zheng Zhang
44
262
0
25 Feb 2019
The Evolved Transformer
David R. So
Chen Liang
Quoc V. Le
ViT
56
461
0
30 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
75
3,707
0
09 Jan 2019
Mesh-TensorFlow: Deep Learning for Supercomputers
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
...
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
GNN
MoE
AI4CE
34
387
0
05 Nov 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
70
389
0
28 Sep 2018
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Chengqi Zhang
HAI
49
148
0
03 Apr 2018
Tensor2Tensor for Neural Machine Translation
Ashish Vaswani
Samy Bengio
E. Brevdo
François Chollet
Aidan Gomez
...
Nal Kalchbrenner
Niki Parmar
Ryan Sepassi
Noam M. Shazeer
Jakob Uszkoreit
72
528
0
16 Mar 2018
Learning Latent Permutations with Gumbel-Sinkhorn Networks
Gonzalo E. Mena
David Belanger
Scott W. Linderman
Jasper Snoek
64
267
0
23 Feb 2018
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
74
1,671
0
15 Feb 2018
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Sen Wang
Chengqi Zhang
AI4TS
93
144
0
31 Jan 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
223
129,831
0
12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
298
4,444
0
18 Apr 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
81
2,582
0
23 Jan 2017
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
162
5,323
0
03 Nov 2016
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
André F. T. Martins
Ramón Fernández Astudillo
50
711
0
05 Feb 2016
A large annotated corpus for learning natural language inference
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning
161
4,256
0
21 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
254
7,942
0
17 Aug 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
226
10,034
0
10 Feb 2015
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
202
20,467
0
10 Sep 2014
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Ciprian Chelba
Tomas Mikolov
M. Schuster
Qi Ge
T. Brants
P. Koehn
T. Robinson
56
1,099
0
11 Dec 2013
Ranking via Sinkhorn Propagation
Ryan P. Adams
R. Zemel
65
146
0
09 Jun 2011
1