Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.05702
Cited By
Sparse Sequence-to-Sequence Models
14 May 2019
Ben Peters
Vlad Niculae
André F. T. Martins
TPM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sparse Sequence-to-Sequence Models"
32 / 32 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
177
0
0
26 Apr 2025
Weighted Graph Structure Learning with Attention Denoising for Node Classification
Tingting Wang
Jiaxin Su
Haobing Liu
Ruobing Jiang
68
0
0
15 Mar 2025
Learning to Decouple Complex Systems
Zihan Zhou
Tianshu Yu
BDL
74
4
0
17 Feb 2025
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
Amirmohammad Farzaneh
Osvaldo Simeone
86
0
0
22 Jan 2025
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
A Survey of Transformer Enabled Time Series Synthesis
Alexander Sommers
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
Thomas Arnold
AI4TS
48
2
0
04 Jun 2024
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
27
53
0
13 Feb 2023
A Measure-Theoretic Characterization of Tight Language Models
Li Du
Lucas Torroba Hennigen
Tiago Pimentel
Clara Meister
Jason Eisner
Ryan Cotterell
36
30
0
20 Dec 2022
Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT
Jacopo Teneggi
P. Yi
Jeremias Sulam
32
3
0
29 Nov 2022
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
44
76
0
27 Oct 2022
SIMPLE: A Gradient Estimator for
k
k
k
-Subset Sampling
Kareem Ahmed
Zhe Zeng
Mathias Niepert
Mathias Niepert
BDL
48
24
0
04 Oct 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
33
109
0
31 Aug 2022
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Ryuichi Kanoh
M. Sugiyama
31
2
0
25 May 2022
Learning to Scaffold: Optimizing Model Explanations for Teaching
Patrick Fernandes
Marcos Vinícius Treviso
Danish Pruthi
André F. T. Martins
Graham Neubig
FAtt
25
22
0
22 Apr 2022
NFormer: Robust Person Re-identification with Neighbor Transformer
Haochen Wang
Jiayi Shen
Yongtuo Liu
Yan Gao
E. Gavves
ViT
31
120
0
20 Apr 2022
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
25
14
0
06 Apr 2022
Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling
Chen Tang
Wei Zhan
Masayoshi Tomizuka
DRL
31
19
0
01 Dec 2021
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
Phil Chen
Masha Itkina
Ransalu Senanayake
Mykel J. Kochenderfer
36
6
0
27 Oct 2021
Deep Neural Networks and Tabular Data: A Survey
V. Borisov
Tobias Leemann
Kathrin Seßler
Johannes Haug
Martin Pawelczyk
Gjergji Kasneci
LMTD
45
648
0
05 Oct 2021
Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
Bryan Eikema
Wilker Aziz
28
45
0
10 Aug 2021
MeSIN: Multilevel Selective and Interactive Network for Medication Recommendation
Yang An
Liang Zhang
Mao You
Xueqing Tian
Bo Jin
Xiaopeng Wei
6
18
0
22 Apr 2021
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Zi-Yi Dou
Graham Neubig
96
257
0
20 Jan 2021
WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections
Mingda Chen
Sam Wiseman
Kevin Gimpel
27
30
0
29 Dec 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
31
44
0
11 Oct 2020
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation
Aditya Mogadala
Marius Mosbach
Dietrich Klakow
VLM
133
0
0
12 Jul 2020
Linking Social Media Posts to News with Siamese Transformers
Jacob Danovitch
24
2
0
10 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Dialogue Transformers
Vladimir Vlasov
Johannes E. M. Mosig
Alan Nichol
27
56
0
01 Oct 2019
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
56
185
0
14 Nov 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
273
1,896
0
10 Jan 2017
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,926
0
17 Aug 2015
1