ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.05702
  4. Cited By
Sparse Sequence-to-Sequence Models

Sparse Sequence-to-Sequence Models

14 May 2019
Ben Peters
Vlad Niculae
André F. T. Martins
    TPM
ArXivPDFHTML

Papers citing "Sparse Sequence-to-Sequence Models"

32 / 32 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
177
0
0
26 Apr 2025
Weighted Graph Structure Learning with Attention Denoising for Node Classification
Weighted Graph Structure Learning with Attention Denoising for Node Classification
Tingting Wang
Jiaxin Su
Haobing Liu
Ruobing Jiang
68
0
0
15 Mar 2025
Learning to Decouple Complex Systems
Learning to Decouple Complex Systems
Zihan Zhou
Tianshu Yu
BDL
74
4
0
17 Feb 2025
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
Amirmohammad Farzaneh
Osvaldo Simeone
86
0
0
22 Jan 2025
q-exponential family for policy optimization
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
A Survey of Transformer Enabled Time Series Synthesis
A Survey of Transformer Enabled Time Series Synthesis
Alexander Sommers
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
Thomas Arnold
AI4TS
48
2
0
04 Jun 2024
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
A Study on ReLU and Softmax in Transformer
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
27
53
0
13 Feb 2023
A Measure-Theoretic Characterization of Tight Language Models
A Measure-Theoretic Characterization of Tight Language Models
Li Du
Lucas Torroba Hennigen
Tiago Pimentel
Clara Meister
Jason Eisner
Ryan Cotterell
36
30
0
20 Dec 2022
Weakly Supervised Learning Significantly Reduces the Number of Labels
  Required for Intracranial Hemorrhage Detection on Head CT
Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT
Jacopo Teneggi
P. Yi
Jeremias Sulam
32
3
0
29 Nov 2022
Truncation Sampling as Language Model Desmoothing
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
44
76
0
27 Oct 2022
SIMPLE: A Gradient Estimator for $k$-Subset Sampling
SIMPLE: A Gradient Estimator for kkk-Subset Sampling
Kareem Ahmed
Zhe Zeng
Mathias Niepert
Mathias Niepert
BDL
48
24
0
04 Oct 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
33
109
0
31 Aug 2022
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Ryuichi Kanoh
M. Sugiyama
31
2
0
25 May 2022
Learning to Scaffold: Optimizing Model Explanations for Teaching
Learning to Scaffold: Optimizing Model Explanations for Teaching
Patrick Fernandes
Marcos Vinícius Treviso
Danish Pruthi
André F. T. Martins
Graham Neubig
FAtt
25
22
0
22 Apr 2022
NFormer: Robust Person Re-identification with Neighbor Transformer
NFormer: Robust Person Re-identification with Neighbor Transformer
Haochen Wang
Jiayi Shen
Yongtuo Liu
Yan Gao
E. Gavves
ViT
31
120
0
20 Apr 2022
Knowledge Infused Decoding
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
25
14
0
06 Apr 2022
Exploring Social Posterior Collapse in Variational Autoencoder for
  Interaction Modeling
Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling
Chen Tang
Wei Zhan
Masayoshi Tomizuka
DRL
31
19
0
01 Dec 2021
Evidential Softmax for Sparse Multimodal Distributions in Deep
  Generative Models
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
Phil Chen
Masha Itkina
Ransalu Senanayake
Mykel J. Kochenderfer
36
6
0
27 Oct 2021
Deep Neural Networks and Tabular Data: A Survey
Deep Neural Networks and Tabular Data: A Survey
V. Borisov
Tobias Leemann
Kathrin Seßler
Johannes Haug
Martin Pawelczyk
Gjergji Kasneci
LMTD
45
648
0
05 Oct 2021
Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural
  Machine Translation
Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
Bryan Eikema
Wilker Aziz
28
45
0
10 Aug 2021
MeSIN: Multilevel Selective and Interactive Network for Medication
  Recommendation
MeSIN: Multilevel Selective and Interactive Network for Medication Recommendation
Yang An
Liang Zhang
Mao You
Xueqing Tian
Bo Jin
Xiaopeng Wei
6
18
0
22 Apr 2021
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Zi-Yi Dou
Graham Neubig
96
257
0
20 Jan 2021
WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia
  Article Sections
WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections
Mingda Chen
Sam Wiseman
Kevin Gimpel
27
30
0
29 Dec 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
31
44
0
11 Oct 2020
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual
  Sequence Generation
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation
Aditya Mogadala
Marius Mosbach
Dietrich Klakow
VLM
133
0
0
12 Jul 2020
Linking Social Media Posts to News with Siamese Transformers
Linking Social Media Posts to News with Siamese Transformers
Jacob Danovitch
24
2
0
10 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Dialogue Transformers
Dialogue Transformers
Vladimir Vlasov
Johannes E. M. Mosig
Alan Nichol
27
56
0
01 Oct 2019
Classical Structured Prediction Losses for Sequence to Sequence Learning
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
56
185
0
14 Nov 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
273
1,896
0
10 Jan 2017
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,926
0
17 Aug 2015
1