Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.02030
Cited By
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
20 November 2020
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks"
3 / 3 papers shown
Title
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
280
2,015
0
28 Jul 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1