ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.03171
  4. Cited By
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
v1v2v3 (latest)

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

5 November 2024
Nasib Ullah
Erik Schultheis
Mike Lasby
Yani Andrew Ioannou
Rohit Babbar
ArXiv (abs)PDFHTML

Papers citing "Navigating Extremes: Dynamic Sparsity in Large Output Spaces"

49 / 49 papers shown
Title
Progressive Gradient Flow for Robust N:M Sparsity Training in
  Transformers
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
Abhimanyu Bambhaniya
Amir Yazdanbakhsh
Suvinay Subramanian
Sheng-Chun Kao
Shivani Agrawal
Utku Evci
Tushar Krishna
110
18
0
07 Feb 2024
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
  May Cry'' Benchmark
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Eldar Kurtic
Torsten Hoefler
Dan Alistarh
59
3
0
21 Dec 2023
Generalized test utilities for long-tail performance in extreme
  multi-label classification
Generalized test utilities for long-tail performance in extreme multi-label classification
Erik Schultheis
Marek Wydmuch
Wojciech Kotlowski
Rohit Babbar
Krzysztof Dembczyñski
ELM
66
4
0
09 Nov 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor
  Cores
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
47
17
0
03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
100
50
0
02 Oct 2023
Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse
  Training
Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training
A. Nowak
Bram Grooten
Decebal Constantin Mocanu
Jacek Tabor
63
12
0
21 Jun 2023
Towards Memory-Efficient Training for Extremely Large Output Spaces --
  Learning with 500k Labels on a Single Commodity GPU
Towards Memory-Efficient Training for Extremely Large Output Spaces -- Learning with 500k Labels on a Single Commodity GPU
Erik Schultheis
Rohit Babbar
61
5
0
06 Jun 2023
Dynamic Sparse Training with Structured Sparsity
Dynamic Sparse Training with Structured Sparsity
Mike Lasby
A. Golubeva
Utku Evci
Mihai Nica
Yani Andrew Ioannou
114
22
0
03 May 2023
JaxPruner: A concise library for sparsity research
JaxPruner: A concise library for sparsity research
Jooyoung Lee
Wonpyo Park
Nicole Mitchell
Jonathan Pilault
J. Obando-Ceron
...
Hong-Seok Kim
Yann N. Dauphin
Karolina Dziugaite
Pablo Samuel Castro
Utku Evci
83
16
0
27 Apr 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Shiwei Liu
Tianlong Chen
Zhenyu Zhang
Xuxi Chen
Tianjin Huang
Ajay Jaiswal
Zhangyang Wang
62
28
0
03 Mar 2023
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook
  for Sparse Neural Network Researchers
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers
Shiwei Liu
Zhangyang Wang
70
32
0
06 Feb 2023
CascadeXML: Rethinking Transformers for End-to-end Multi-resolution
  Training in Extreme Multi-label Classification
CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification
Siddhant Kharbanda
Atmadeep Banerjee
Erik Schultheis
Rohit Babbar
80
14
0
29 Oct 2022
On Missing Labels, Long-tails and Propensities in Extreme Multi-label
  Classification
On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification
Erik Schultheis
Marek Wydmuch
Rohit Babbar
Krzysztof Dembczyñski
46
28
0
26 Jul 2022
NGAME: Negative Mining-aware Mini-batching for Extreme Classification
NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Kunal Dahiya
Nilesh Gupta
Deepak Saini
Akshay Soni
Yajun Wang
...
Sonu Mehta
Ramachandran Ramjee
Sumeet Agarwal
Purushottam Kar
Manik Varma
74
38
0
10 Jul 2022
The State of Sparse Training in Deep Reinforcement Learning
The State of Sparse Training in Deep Reinforcement Learning
L. Graesser
Utku Evci
Erich Elsen
Pablo Samuel Castro
OffRL
61
39
0
17 Jun 2022
RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
Y. Tan
Pihe Hu
L. Pan
Jiatai Huang
Longbo Huang
OffRL
61
23
0
30 May 2022
Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label
  Text Classification
Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Jiong Zhang
Wei-Cheng Chang
Hsiang-Fu Yu
Inderjit S. Dhillon
79
103
0
01 Oct 2021
Unbiased Loss Functions for Multilabel Classification with Missing
  Labels
Unbiased Loss Functions for Multilabel Classification with Missing Labels
Erik Schultheis
Rohit Babbar
44
6
0
23 Sep 2021
InceptionXML: A Lightweight Framework with Synchronized Negative
  Sampling for Short Text Extreme Classification
InceptionXML: A Lightweight Framework with Synchronized Negative Sampling for Short Text Extreme Classification
Siddhant Kharbanda
Atmadeep Banerjee
Devaansh Gupta
Akash Palrecha
Rohit Babbar
72
9
0
13 Sep 2021
Towards Structured Dynamic Sparse Pre-Training of BERT
Towards Structured Dynamic Sparse Pre-Training of BERT
A. Dietrich
Frithjof Gressmann
Douglas Orr
Ivan Chelombiev
Daniel Justus
Carlo Luschi
51
17
0
13 Aug 2021
ECLARE: Extreme Classification with Label Graph Correlations
ECLARE: Extreme Classification with Label Graph Correlations
Anshul Mittal
Noveen Sachdeva
Sheshansh Agrawal
Sumeet Agarwal
Purushottam Kar
Manik Varma
65
66
0
31 Jul 2021
Deep Ensembling with No Overhead for either Training or Testing: The
  All-Round Blessings of Dynamic Sparsity
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity
Shiwei Liu
Tianlong Chen
Zahra Atashgahi
Xiaohan Chen
Ghada Sokar
Elena Mocanu
Mykola Pechenizkiy
Zhangyang Wang
Decebal Constantin Mocanu
OOD
87
52
0
28 Jun 2021
Accelerating Sparse Deep Neural Networks
Accelerating Sparse Deep Neural Networks
Asit K. Mishra
J. Latorre
Jeff Pool
Darko Stosic
Dusan Stosic
Ganesh Venkatesh
Chong Yu
Paulius Micikevicius
160
235
0
16 Apr 2021
Dense for the Price of Sparse: Improved Performance of Sparsely
  Initialized Networks via a Subspace Offset
Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset
Ilan Price
Jared Tanner
52
15
0
12 Feb 2021
Truly Sparse Neural Networks at Scale
Truly Sparse Neural Networks at Scale
Selima Curci
Decebal Constantin Mocanu
Mykola Pechenizkiy
107
22
0
02 Feb 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference
  and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
314
724
0
31 Jan 2021
LightXML: Transformer with Dynamic Negative Sampling for
  High-Performance Extreme Multi-label Text Classification
LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification
Ting Jiang
Deqing Wang
Leilei Sun
Huayi Yang
Zhengyang Zhao
Fuzhen Zhuang
VLM
171
139
0
09 Jan 2021
The Lottery Tickets Hypothesis for Supervised and Self-supervised
  Pre-training in Computer Vision Models
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Michael Carbin
Zhangyang Wang
66
123
0
12 Dec 2020
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
Utku Evci
Yani Andrew Ioannou
Cem Keskin
Yann N. Dauphin
56
94
0
07 Oct 2020
Sparse GPU Kernels for Deep Learning
Sparse GPU Kernels for Deep Learning
Trevor Gale
Matei A. Zaharia
C. Young
Erich Elsen
75
234
0
18 Jun 2020
Dynamic Model Pruning with Feedback
Dynamic Model Pruning with Feedback
Tao R. Lin
Sebastian U. Stich
Luis Barba
Daniil Dmitriev
Martin Jaggi
151
204
0
12 Jun 2020
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
Eran Malach
Gilad Yehudai
Shai Shalev-Shwartz
Ohad Shamir
104
276
0
03 Feb 2020
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
163
628
0
11 Dec 2019
Rigging the Lottery: Making All Tickets Winners
Rigging the Lottery: Making All Tickets Winners
Utku Evci
Trevor Gale
Jacob Menick
Pablo Samuel Castro
Erich Elsen
197
607
0
25 Nov 2019
Extreme Classification in Log Memory using Count-Min Sketch: A Case
  Study of Amazon Search with 50M Products
Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products
Tharun Medini
Qixuan Huang
Yiqiu Wang
Vijai Mohan
Anshumali Shrivastava
55
70
0
28 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,547
0
02 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
677
24,541
0
26 Jul 2019
Sparse Networks from Scratch: Faster Training without Losing Performance
Sparse Networks from Scratch: Faster Training without Losing Performance
Tim Dettmers
Luke Zettlemoyer
145
340
0
10 Jul 2019
AttentionXML: Label Tree-based Attention-Aware Deep Model for
  High-Performance Extreme Multi-Label Text Classification
AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification
Rafael M. O. Cruz
Zihan Zhang
R. Sabourin
Suyang Dai
Hiroshi Mamitsuka
Shanfeng Zhu
VLM
73
252
0
01 Nov 2018
Sparse DNNs with Improved Adversarial Robustness
Sparse DNNs with Improved Adversarial Robustness
Yiwen Guo
Chao Zhang
Changshui Zhang
Yurong Chen
AAML
74
154
0
23 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
261
3,485
0
09 Mar 2018
NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm
NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm
Xiaoliang Dai
Hongxu Yin
N. Jha
DD
79
238
0
06 Nov 2017
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
174
1,805
0
10 Oct 2017
Scalable Training of Artificial Neural Networks with Adaptive Sparse
  Connectivity inspired by Network Science
Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science
Decebal Constantin Mocanu
Elena Mocanu
Peter Stone
Phuong H. Nguyen
M. Gibescu
A. Liotta
178
634
0
15 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
DiSMEC - Distributed Sparse Machines for Extreme Multi-label
  Classification
DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
Rohit Babbar
Bernhard Schölkopf
59
253
0
08 Sep 2016
Training Deep Nets with Sublinear Memory Cost
Training Deep Nets with Sublinear Memory Cost
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
106
1,172
0
21 Apr 2016
Enhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social Tags
A. Zubiaga
59
110
0
23 Feb 2012
1