ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.02068
  4. Cited By
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label
  Classification

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

5 February 2016
André F. T. Martins
Ramón Fernández Astudillo
ArXivPDFHTML

Papers citing "From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification"

50 / 128 papers shown
Title
Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses
Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses
Yuzhou Cao
Han Bao
Lei Feng
Bo An
31
0
0
14 May 2025
Smooth Quadratic Prediction Markets
Smooth Quadratic Prediction Markets
Enrique Nueve
Bo Waggoner
30
0
0
05 May 2025
Aligning Instance-Semantic Sparse Representation towards Unsupervised Object Segmentation and Shape Abstraction with Repeatable Primitives
Jiaxin Li
Hongxing Wang
Jiawei Tan
Zhilong Ou
Junsong Yuan
3DPC
47
0
0
10 Mar 2025
Transfer Learning with Pre-trained Conditional Generative Models
Transfer Learning with Pre-trained Conditional Generative Models
Shin'ya Yamaguchi
Sekitoshi Kanai
Atsutoshi Kumagai
Daiki Chijiwa
H. Kashima
VLM
CLL
BDL
DiffM
150
5
0
21 Feb 2025
Learning to Decouple Complex Systems
Learning to Decouple Complex Systems
Zihan Zhou
Tianshu Yu
BDL
79
4
0
17 Feb 2025
Aggregate to Adapt: Node-Centric Aggregation for Multi-Source-Free Graph Domain Adaptation
Aggregate to Adapt: Node-Centric Aggregation for Multi-Source-Free Graph Domain Adaptation
Zhen Zhang
Bingsheng He
111
2
0
05 Feb 2025
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Oussama Zekri
Nicolas Boullé
DiffM
73
3
0
03 Feb 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
68
0
0
22 Jan 2025
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
Amirmohammad Farzaneh
Osvaldo Simeone
94
0
0
22 Jan 2025
Privacy Vulnerabilities in Marginals-based Synthetic Data
Privacy Vulnerabilities in Marginals-based Synthetic Data
Steven Golob
Sikha Pentyala
Anuar Maratkhan
Martine De Cock
26
3
0
07 Oct 2024
Can Transformers Learn $n$-gram Language Models?
Can Transformers Learn nnn-gram Language Models?
Anej Svete
Nadav Borenstein
M. Zhou
Isabelle Augenstein
Ryan Cotterell
47
7
0
03 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
227
3
0
02 Oct 2024
q-exponential family for policy optimization
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
Large-scale Time-Varying Portfolio Optimisation using Graph Attention Networks
Large-scale Time-Varying Portfolio Optimisation using Graph Attention Networks
Kamesh Korangi
Christophe Mues
Cristián Bravo
46
1
0
22 Jul 2024
Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of
  Class-Balanced Loss Functions
Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions
Jiaqi Luo
Yuan Yuan
Shixin Xu
AI4CE
39
2
0
19 Jul 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
54
13
0
20 Jun 2024
Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction
Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction
Yuncheng Hua
Yujin Huang
Shuo Huang
Tao Feng
Lizhen Qu
Chris Bain
R. Bassed
Gholamreza Haffari
CML
OOD
56
2
0
18 Jun 2024
UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for
  Low-Resource Languages
UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages
Trinh Pham
Khoi M. Le
Luu Anh Tuan
47
1
0
14 Jun 2024
MultiMax: Sparse and Multi-Modal Attention Learning
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
Margret Keuper
45
1
0
03 Jun 2024
Building a stable classifier with the inflated argmax
Building a stable classifier with the inflated argmax
Jake A. Soloff
Rina Foygel Barber
Rebecca Willett
177
2
0
22 May 2024
SPARO: Selective Attention for Robust and Compositional Transformer
  Encodings for Vision
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani
Bac Nguyen
Samuel Lavoie
Ranjay Krishna
Aaron Courville
39
1
0
24 Apr 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Dennis Wu
Jerry Yao-Chieh Hu
Teng-Yun Hsiao
Han Liu
45
28
0
04 Apr 2024
Regularized Q-Learning with Linear Function Approximation
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
40
2
0
26 Jan 2024
An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced
  linear classification
An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification
Hyenkyun Woo
22
0
0
26 Dec 2023
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Anej Svete
Ryan Cotterell
42
2
0
08 Oct 2023
Decision-Focused Learning: Foundations, State of the Art, Benchmark and
  Future Opportunities
Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities
Jayanta Mandi
James Kotary
Senne Berden
Maxime Mulamba
Víctor Bucarey
Tias Guns
Ferdinando Fioretto
AI4CE
33
58
0
25 Jul 2023
Generative Meta-Learning Robust Quality-Diversity Portfolio
Generative Meta-Learning Robust Quality-Diversity Portfolio
K. Yuksel
23
2
0
15 Jul 2023
High-Similarity-Pass Attention for Single Image Super-Resolution
High-Similarity-Pass Attention for Single Image Super-Resolution
Jianmei Su
Min Gan
Ieee Guang-Yong Chen Senior Member
Wenzhong Guo
F. I. C. L. Philip Chen
29
16
0
25 May 2023
Interpretable Multimodal Misinformation Detection with Logic Reasoning
Interpretable Multimodal Misinformation Detection with Logic Reasoning
Hui Liu
Wenya Wang
Haoliang Li
46
22
0
10 May 2023
r-softmax: Generalized Softmax with Controllable Sparsity Rate
r-softmax: Generalized Softmax with Controllable Sparsity Rate
Klaudia Bałazy
Lukasz Struski
Marek Śmieja
Jacek Tabor
25
2
0
11 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Bo Li
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
32
55
0
11 Apr 2023
Filling out the missing gaps: Time Series Imputation with
  Semi-Supervised Learning
Filling out the missing gaps: Time Series Imputation with Semi-Supervised Learning
Karan Aggarwal
Jaideep Srivastava
AI4TS
35
0
0
09 Apr 2023
Learning Sparsity of Representations with Discrete Latent Variables
Learning Sparsity of Representations with Discrete Latent Variables
Zhao Xu
Daniel Oñoro-Rubio
G. Serra
Mathias Niepert
13
0
0
03 Apr 2023
GTRL: An Entity Group-Aware Temporal Knowledge Graph Representation
  Learning Method
GTRL: An Entity Group-Aware Temporal Knowledge Graph Representation Learning Method
Xing Tang
Ling-Hao Chen
AI4TS
22
5
0
22 Feb 2023
A Study on ReLU and Softmax in Transformer
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
29
53
0
13 Feb 2023
HanoiT: Enhancing Context-aware Translation via Selective Context
HanoiT: Enhancing Context-aware Translation via Selective Context
Jian Yang
Yuwei Yin
Shuming Ma
Liqun Yang
Hongcheng Guo
Haoyang Huang
Dongdong Zhang
Yutao Zeng
Zhoujun Li
Furu Wei
34
5
0
17 Jan 2023
A Measure-Theoretic Characterization of Tight Language Models
A Measure-Theoretic Characterization of Tight Language Models
Li Du
Lucas Torroba Hennigen
Tiago Pimentel
Clara Meister
Jason Eisner
Ryan Cotterell
36
30
0
20 Dec 2022
T2G-Former: Organizing Tabular Features into Relation Graphs Promotes
  Heterogeneous Feature Interaction
T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction
Jiahuan Yan
Jintai Chen
YiXuan Wu
Danny Chen
Jian Wu
37
36
0
30 Nov 2022
Weakly Supervised Learning Significantly Reduces the Number of Labels
  Required for Intracranial Hemorrhage Detection on Head CT
Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT
Jacopo Teneggi
Paul H. Yi
Jeremias Sulam
32
3
0
29 Nov 2022
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision
  Transformer with Heterogeneous Attention
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention
Wenyuan Zeng
Meng Li
Wenjie Xiong
Tong Tong
Wen-jie Lu
Jin Tan
Runsheng Wang
Ru Huang
29
21
0
25 Nov 2022
SEAT: Stable and Explainable Attention
SEAT: Stable and Explainable Attention
Lijie Hu
Yixin Liu
Ninghao Liu
Mengdi Huai
Lichao Sun
Di Wang
OOD
32
18
0
23 Nov 2022
On the Informativeness of Supervision Signals
On the Informativeness of Supervision Signals
Ilia Sucholutsky
Ruairidh M. Battleday
Katherine M. Collins
Raja Marjieh
Joshua C. Peterson
Pulkit Singh
Umang Bhatt
Nori Jacoby
Adrian Weller
Thomas Griffiths
27
12
0
02 Nov 2022
Revisiting Attention Weights as Explanations from an Information
  Theoretic Perspective
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective
Bingyang Wen
K. P. Subbalakshmi
Fan Yang
FAtt
27
6
0
31 Oct 2022
Truncation Sampling as Language Model Desmoothing
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
46
76
0
27 Oct 2022
SIMPLE: A Gradient Estimator for $k$-Subset Sampling
SIMPLE: A Gradient Estimator for kkk-Subset Sampling
Kareem Ahmed
Zhe Zeng
Mathias Niepert
Guy Van den Broeck
BDL
53
25
0
04 Oct 2022
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared
  Task
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
Ricardo Rei
Marcos Vinícius Treviso
Nuno M. Guerreiro
Chrysoula Zerva
Ana C. Farinha
...
T. Glushkova
Duarte M. Alves
A. Lavie
Luísa Coheur
André F. T. Martins
63
144
0
13 Sep 2022
Self-supervised Representation Learning on Electronic Health Records
  with Graph Kernel Infomax
Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax
Hao-Ren Yao
Nairen Cao
Katina Russell
D. Chang
O. Frieder
Jeremy T. Fineman
SSL
25
1
0
01 Sep 2022
Multiple Instance Neural Networks Based on Sparse Attention for Cancer
  Detection using T-cell Receptor Sequences
Multiple Instance Neural Networks Based on Sparse Attention for Cancer Detection using T-cell Receptor Sequences
Younghoon Kim
Tao Wang
Danyi Xiong
Xinlei Wang
S. Park
29
9
0
09 Aug 2022
Contrasting quadratic assignments for set-based representation learning
Contrasting quadratic assignments for set-based representation learning
A. Moskalev
Ivan Sosnovik
Volker Fischer
A. Smeulders
SSL
34
9
0
31 May 2022
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Ryuichi Kanoh
M. Sugiyama
36
2
0
25 May 2022
123
Next