Gradient Estimation with Stochastic Softmax Tricks

15 June 2020

Papers citing "Gradient Estimation with Stochastic Softmax Tricks"

43 / 43 papers shown

Title
Large (Vision) Language Models are Unsupervised In-Context Learners Artyom Gadetsky Andrei Atanov Yulun Jiang Zhitong Gao Ghazal Hosseini Mighan Amir Zamir Maria Brbić VLM MLLM LRM 202 0 0 03 Apr 2025
Soft Condorcet Optimization for Ranking of General Agents Marc Lanctot Kate Larson Michael Kaisers Quentin Berthet I. Gemp Manfred Diaz Roberto-Rafael Maura-Rivero Yoram Bachrach Anna Koop Doina Precup 177 0 0 31 Oct 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang Dong Shen Chaoxiang Cai Fan Yang Size Li Tingting Gao Xi Li MoE 97 2 0 28 Jun 2024
Fast Differentiable Sorting and Ranking Mathieu Blondel O. Teboul Quentin Berthet Josip Djolonga 147 231 0 20 Feb 2020
Learning with Differentiable Perturbed Optimizers Quentin Berthet Mathieu Blondel O. Teboul Marco Cuturi Jean-Philippe Vert Francis R. Bach 56 109 0 20 Feb 2020
Decision-Making with Auto-Encoding Variational Bayes Romain Lopez Pierre Boyeau Nir Yosef Michael I. Jordan Jeffrey Regier BDL 317 10,591 0 17 Feb 2020
Estimating Gradients for Discrete Random Variables by Sampling without Replacement W. Kool H. V. Hoof Max Welling BDL 105 50 0 14 Feb 2020
Torch-Struct: Deep Structured Prediction Library Alexander M. Rush 52 63 0 03 Feb 2020
Differentiable Convex Optimization Layers Akshay Agrawal Brandon Amos Shane T. Barratt Stephen P. Boyd Steven Diamond Zico Kolter 81 653 0 28 Oct 2019
Structured Prediction with Projection Oracles Mathieu Blondel 68 33 0 24 Oct 2019
Monte Carlo Gradient Estimation in Machine Learning S. Mohamed Mihaela Rosca Michael Figurnov A. Mnih 67 408 0 25 Jun 2019
The Limited Multi-Label Projection Layer Brandon Amos V. Koltun J. Zico Kolter 50 36 0 20 Jun 2019
Stochastic Optimization of Sorting Networks via Continuous Relaxations Aditya Grover Eric Wang Aaron Zweig Stefano Ermon 53 173 0 21 Mar 2019
Reparameterizable Subset Sampling via Continuous Relaxations Sang Michael Xie Stefano Ermon BDL 48 99 0 29 Jan 2019
Learning with Fenchel-Young Losses Mathieu Blondel André F. T. Martins Vlad Niculae 123 133 0 08 Jan 2019
Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder Caio Corro Ivan Titov BDL 40 56 0 25 Jul 2018
Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang DRL 59 31 0 01 Jun 2018
ListOps: A Diagnostic Dataset for Latent Tree Learning Nikita Nangia Samuel R. Bowman 45 137 0 17 Apr 2018
Learning Latent Permutations with Gumbel-Sinkhorn Networks Gonzalo E. Mena David Belanger Scott W. Linderman Jasper Snoek 72 270 0 23 Feb 2018
Learning to Explain: An Information-Theoretic Perspective on Model Interpretation Jianbo Chen Le Song Martin J. Wainwright Michael I. Jordan MLT FAtt 129 572 0 21 Feb 2018
SparseMAP: Differentiable Sparse Structured Inference Vlad Niculae André F. T. Martins Mathieu Blondel Claire Cardie 43 122 0 12 Feb 2018
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation Will Grathwohl Dami Choi Yuhuai Wu Geoffrey Roeder David Duvenaud 89 300 0 31 Oct 2017
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models George Tucker A. Mnih Chris J. Maddison John Lawson Jascha Narain Sohl-Dickstein BDL 191 282 0 21 Mar 2017
OptNet: Differentiable Optimization as a Layer in Neural Networks Brandon Amos J. Zico Kolter 150 958 0 01 Mar 2017
Categorical Reparameterization with Gumbel-Softmax Eric Jang S. Gu Ben Poole BDL 279 5,360 0 03 Nov 2016
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables Chris J. Maddison A. Mnih Yee Whye Teh BDL 155 2,529 0 02 Nov 2016
The Generalized Reparameterization Gradient Francisco J. R. Ruiz Michalis K. Titsias David M. Blei BDL 58 169 0 07 Oct 2016
Rationalizing Neural Predictions Tao Lei Regina Barzilay Tommi Jaakkola 108 811 0 13 Jun 2016
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems Martín Abadi Ashish Agarwal P. Barham E. Brevdo Zhiwen Chen ... Pete Warden Martin Wattenberg Martin Wicke Yuan Yu Xiaoqiang Zheng 240 11,145 0 14 Mar 2016
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification André F. T. Martins Ramón Fernández Astudillo 156 719 0 05 Feb 2016
MuProp: Unbiased Backpropagation for Stochastic Neural Networks S. Gu Sergey Levine Ilya Sutskever A. Mnih BDL 46 143 0 16 Nov 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.4K 149,842 0 22 Dec 2014
A* Sampling Chris J. Maddison Daniel Tarlow T. Minka 75 393 0 31 Oct 2014
Neural Turing Machines Alex Graves Greg Wayne Ivo Danihelka 95 2,325 0 20 Oct 2014
Neural Variational Inference and Learning in Belief Networks A. Mnih Karol Gregor BDL 151 729 0 31 Jan 2014
On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations Tamir Hazan Subhransu Maji Tommi Jaakkola 50 56 0 29 Sep 2013
Tighter Linear Program Relaxations for High Order Graphical Models Elad Mezuman Daniel Tarlow Amir Globerson Yair Weiss 65 14 0 26 Sep 2013
Learning Graphical Model Parameters with Approximate Marginal Inference Justin Domke TPM 73 187 0 15 Jan 2013
Fast Exact Inference for Recursive Cardinality Models Daniel Tarlow Kevin Swersky R. Zemel Ryan P. Adams B. Frey TPM 59 59 0 16 Oct 2012
Learning Attitudes and Attributes from Multi-Aspect Reviews Julian McAuley J. Leskovec Dan Jurafsky 242 298 0 15 Oct 2012
On the Partition Function and Random Maximum A-Posteriori Perturbations Tamir Hazan Tommi Jaakkola 76 93 0 27 Jun 2012
Sum-Product Networks: A New Deep Architecture Hoifung Poon Pedro M. Domingos TPM 74 758 0 14 Feb 2012
Ranking via Sinkhorn Propagation Ryan P. Adams R. Zemel 88 147 0 09 Jun 2011