From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

5 February 2016

André F. T. Martins

Papers citing "From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification"

50 / 128 papers shown

Title
Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses Yuzhou Cao Han Bao Lei Feng Bo An 31 0 0 14 May 2025
Smooth Quadratic Prediction Markets Enrique Nueve Bo Waggoner 30 0 0 05 May 2025
Aligning Instance-Semantic Sparse Representation towards Unsupervised Object Segmentation and Shape Abstraction with Repeatable Primitives Jiaxin Li Hongxing Wang Jiawei Tan Zhilong Ou Junsong Yuan 3DPC 47 0 0 10 Mar 2025
Transfer Learning with Pre-trained Conditional Generative Models Shin'ya Yamaguchi Sekitoshi Kanai Atsutoshi Kumagai Daiki Chijiwa H. Kashima VLM CLL BDL DiffM 150 5 0 21 Feb 2025
Learning to Decouple Complex Systems Zihan Zhou Tianshu Yu BDL 79 4 0 17 Feb 2025
Aggregate to Adapt: Node-Centric Aggregation for Multi-Source-Free Graph Domain Adaptation Zhen Zhang Bingsheng He 111 2 0 05 Feb 2025
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods Oussama Zekri Nicolas Boullé DiffM 73 3 0 03 Feb 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation Duc Hau Nguyen Cyrielle Mallart Guillaume Gravier Pascale Sébillot 68 0 0 22 Jan 2025
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs Amirmohammad Farzaneh Osvaldo Simeone 94 0 0 22 Jan 2025
Privacy Vulnerabilities in Marginals-based Synthetic Data Steven Golob Sikha Pentyala Anuar Maratkhan Martine De Cock 26 3 0 07 Oct 2024
Can Transformers Learn $n$ -gram Language Models? Anej Svete Nadav Borenstein M. Zhou Isabelle Augenstein Ryan Cotterell 47 7 0 03 Oct 2024
Attention layers provably solve single-location regression Pierre Marion Raphael Berthier Gérard Biau Claire Boyer 227 3 0 02 Oct 2024
q-exponential family for policy optimization Lingwei Zhu Haseeb Shah Han Wang Yukie Nagai Martha White OffRL 78 0 0 14 Aug 2024
Large-scale Time-Varying Portfolio Optimisation using Graph Attention Networks Kamesh Korangi Christophe Mues Cristián Bravo 46 1 0 22 Jul 2024
Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions Jiaqi Luo Yuan Yuan Shixin Xu AI4CE 39 2 0 19 Jul 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning Franz Nowak Anej Svete Alexandra Butoi Ryan Cotterell ReLM LRM 54 13 0 20 Jun 2024
Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction Yuncheng Hua Yujin Huang Shuo Huang Tao Feng Lizhen Qu Chris Bain R. Bassed Gholamreza Haffari CML OOD 56 2 0 18 Jun 2024
UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages Trinh Pham Khoi M. Le Luu Anh Tuan 47 1 0 14 Jun 2024
MultiMax: Sparse and Multi-Modal Attention Learning Yuxuan Zhou Mario Fritz Margret Keuper 45 1 0 03 Jun 2024
Building a stable classifier with the inflated argmax Jake A. Soloff Rina Foygel Barber Rebecca Willett 177 2 0 22 May 2024
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision Ankit Vani Bac Nguyen Samuel Lavoie Ranjay Krishna Aaron Courville 39 1 0 24 Apr 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models Dennis Wu Jerry Yao-Chieh Hu Teng-Yun Hsiao Han Liu 45 28 0 04 Apr 2024
Regularized Q-Learning with Linear Function Approximation Jiachen Xi Alfredo Garcia P. Momcilovic 40 2 0 26 Jan 2024
An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification Hyenkyun Woo 22 0 0 26 Dec 2023
Recurrent Neural Language Models as Probabilistic Finite-state Automata Anej Svete Ryan Cotterell 42 2 0 08 Oct 2023
Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities Jayanta Mandi James Kotary Senne Berden Maxime Mulamba Víctor Bucarey Tias Guns Ferdinando Fioretto AI4CE 33 58 0 25 Jul 2023
Generative Meta-Learning Robust Quality-Diversity Portfolio K. Yuksel 23 2 0 15 Jul 2023
High-Similarity-Pass Attention for Single Image Super-Resolution Jianmei Su Min Gan Ieee Guang-Yong Chen Senior Member Wenzhong Guo F. I. C. L. Philip Chen 29 16 0 25 May 2023
Interpretable Multimodal Misinformation Detection with Logic Reasoning Hui Liu Wenya Wang Haoliang Li 46 22 0 10 May 2023
r-softmax: Generalized Softmax with Controllable Sparsity Rate Klaudia Bałazy Lukasz Struski Marek Śmieja Jacek Tabor 25 2 0 11 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference Tao Lei Junwen Bai Siddhartha Brahma Joshua Ainslie Kenton Lee ... Vincent Zhao Yuexin Wu Bo Li Yu Zhang Ming-Wei Chang BDL AI4CE 32 55 0 11 Apr 2023
Filling out the missing gaps: Time Series Imputation with Semi-Supervised Learning Karan Aggarwal Jaideep Srivastava AI4TS 35 0 0 09 Apr 2023
Learning Sparsity of Representations with Discrete Latent Variables Zhao Xu Daniel Oñoro-Rubio G. Serra Mathias Niepert 13 0 0 03 Apr 2023
GTRL: An Entity Group-Aware Temporal Knowledge Graph Representation Learning Method Xing Tang Ling-Hao Chen AI4TS 22 5 0 22 Feb 2023
A Study on ReLU and Softmax in Transformer Kai Shen Junliang Guo Xuejiao Tan Siliang Tang Rui Wang Jiang Bian 29 53 0 13 Feb 2023
HanoiT: Enhancing Context-aware Translation via Selective Context Jian Yang Yuwei Yin Shuming Ma Liqun Yang Hongcheng Guo Haoyang Huang Dongdong Zhang Yutao Zeng Zhoujun Li Furu Wei 34 5 0 17 Jan 2023
A Measure-Theoretic Characterization of Tight Language Models Li Du Lucas Torroba Hennigen Tiago Pimentel Clara Meister Jason Eisner Ryan Cotterell 36 30 0 20 Dec 2022
T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction Jiahuan Yan Jintai Chen YiXuan Wu Danny Chen Jian Wu 37 36 0 30 Nov 2022
Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT Jacopo Teneggi Paul H. Yi Jeremias Sulam 32 3 0 29 Nov 2022
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention Wenyuan Zeng Meng Li Wenjie Xiong Tong Tong Wen-jie Lu Jin Tan Runsheng Wang Ru Huang 29 21 0 25 Nov 2022
SEAT: Stable and Explainable Attention Lijie Hu Yixin Liu Ninghao Liu Mengdi Huai Lichao Sun Di Wang OOD 32 18 0 23 Nov 2022
On the Informativeness of Supervision Signals Ilia Sucholutsky Ruairidh M. Battleday Katherine M. Collins Raja Marjieh Joshua C. Peterson Pulkit Singh Umang Bhatt Nori Jacoby Adrian Weller Thomas Griffiths 27 12 0 02 Nov 2022
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective Bingyang Wen K. P. Subbalakshmi Fan Yang FAtt 27 6 0 31 Oct 2022
Truncation Sampling as Language Model Desmoothing John Hewitt Christopher D. Manning Percy Liang BDL 46 76 0 27 Oct 2022
SIMPLE: A Gradient Estimator for $k$ -Subset Sampling Kareem Ahmed Zhe Zeng Mathias Niepert Guy Van den Broeck BDL 53 25 0 04 Oct 2022
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task Ricardo Rei Marcos Vinícius Treviso Nuno M. Guerreiro Chrysoula Zerva Ana C. Farinha ... T. Glushkova Duarte M. Alves A. Lavie Luísa Coheur André F. T. Martins 63 144 0 13 Sep 2022
Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax Hao-Ren Yao Nairen Cao Katina Russell D. Chang O. Frieder Jeremy T. Fineman SSL 25 1 0 01 Sep 2022
Multiple Instance Neural Networks Based on Sparse Attention for Cancer Detection using T-cell Receptor Sequences Younghoon Kim Tao Wang Danyi Xiong Xinlei Wang S. Park 29 9 0 09 Aug 2022
Contrasting quadratic assignments for set-based representation learning A. Moskalev Ivan Sosnovik Volker Fischer A. Smeulders SSL 34 9 0 31 May 2022
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel Ryuichi Kanoh M. Sugiyama 36 2 0 25 May 2022