ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.01749
  4. Cited By
Stack Attention: Improving the Ability of Transformers to Model
  Hierarchical Patterns
v1v2 (latest)

Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

3 October 2023
Brian DuSell
David Chiang
ArXiv (abs)PDFHTML

Papers citing "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"

22 / 22 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
175
120
0
10 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
100
0
0
29 Mar 2025
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
94
7
0
11 Nov 2024
Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning
Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning
Jason Piquenot
Maxime Bérar
Pierre Héroux
Jean-Yves Ramel
R. Raveaux
Sébastien Adam
59
0
0
02 Oct 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
104
1
0
01 Feb 2024
The Surprising Computational Power of Nondeterministic Stack RNNs
The Surprising Computational Power of Nondeterministic Stack RNNs
Brian DuSell
David Chiang
LRM
75
4
0
04 Oct 2022
Transformer Grammars: Augmenting Transformer Language Models with
  Syntactic Inductive Biases at Scale
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
Laurent Sartran
Samuel Barrett
A. Kuncoro
Milovs Stanojević
Phil Blunsom
Chris Dyer
73
50
0
01 Mar 2022
Learning Hierarchical Structures with Differentiable Nondeterministic
  Stacks
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
Brian DuSell
David Chiang
BDL
47
15
0
05 Sep 2021
Structural Guidance for Transformer Language Models
Structural Guidance for Transformer Language Models
Peng Qian
Tahira Naseem
R. Levy
Ramón Fernández Astudillo
97
31
0
30 Jul 2021
Transformers without Tears: Improving the Normalization of
  Self-Attention
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
86
231
0
14 Oct 2019
Tree Transformer: Integrating Tree Structures into Self-Attention
Tree Transformer: Integrating Tree Structures into Self-Attention
Yau-Shian Wang
Hung-yi Lee
Yun-Nung Chen
64
146
0
14 Sep 2019
SG-Net: Syntax-Guided Machine Reading Comprehension
SG-Net: Syntax-Guided Machine Reading Comprehension
Zhuosheng Zhang
Yuwei Wu
Junru Zhou
Sufeng Duan
Hai Zhao
Rui Wang
71
188
0
14 Aug 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
68
271
0
16 Jun 2019
Learning Deep Transformer Models for Machine Translation
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
76
672
0
05 Jun 2019
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
198
3,520
0
19 Aug 2018
Structured Attention Networks
Structured Attention Networks
Yoon Kim
Carl Denton
Luong Hoang
Alexander M. Rush
112
463
0
03 Feb 2017
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
Recurrent Neural Network Grammars
Recurrent Neural Network Grammars
Chris Dyer
A. Kuncoro
Miguel Ballesteros
Noah A. Smith
GNN
89
527
0
25 Feb 2016
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
224
7,745
0
31 Aug 2015
Describing Multimedia Content using Attention-based Encoder--Decoder
  Networks
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Kyunghyun Cho
Aaron Courville
Yoshua Bengio
78
411
0
04 Jul 2015
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Armand Joulin
Tomas Mikolov
TPM
140
411
0
03 Mar 2015
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
568
27,311
0
01 Sep 2014
1