ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.01988
23
11

Sparse Continuous Distributions and Fenchel-Young Losses

4 August 2021
André F. T. Martins
Marcos Vinícius Treviso
António Farinhas
P. Aguiar
Mário A. T. Figueiredo
Mathieu Blondel
Vlad Niculae
ArXivPDFHTML
Abstract

Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, α\alphaα-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define Ω\OmegaΩ-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When Ω\OmegaΩ is a Tsallis negentropy with parameter α\alphaα, we obtain ``deformed exponential families,'' which include α\alphaα-entmax and sparsemax (α=2\alpha=2α=2) as particular cases. For quadratic energy functions, the resulting densities are β\betaβ-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When Ω\OmegaΩ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for α∈{1,4/3,3/2,2}\alpha \in \{1, 4/3, 3/2, 2\}α∈{1,4/3,3/2,2}. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

View on arXiv
Comments on this paper