Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01749
Cited By
v1
v2 (latest)
Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
3 October 2023
Brian DuSell
David Chiang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
22 / 22 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
175
120
0
10 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
100
0
0
29 Mar 2025
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
94
7
0
11 Nov 2024
Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning
Jason Piquenot
Maxime Bérar
Pierre Héroux
Jean-Yves Ramel
R. Raveaux
Sébastien Adam
59
0
0
02 Oct 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
104
1
0
01 Feb 2024
The Surprising Computational Power of Nondeterministic Stack RNNs
Brian DuSell
David Chiang
LRM
75
4
0
04 Oct 2022
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
Laurent Sartran
Samuel Barrett
A. Kuncoro
Milovs Stanojević
Phil Blunsom
Chris Dyer
73
50
0
01 Mar 2022
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
Brian DuSell
David Chiang
BDL
47
15
0
05 Sep 2021
Structural Guidance for Transformer Language Models
Peng Qian
Tahira Naseem
R. Levy
Ramón Fernández Astudillo
97
31
0
30 Jul 2021
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
86
231
0
14 Oct 2019
Tree Transformer: Integrating Tree Structures into Self-Attention
Yau-Shian Wang
Hung-yi Lee
Yun-Nung Chen
64
146
0
14 Sep 2019
SG-Net: Syntax-Guided Machine Reading Comprehension
Zhuosheng Zhang
Yuwei Wu
Junru Zhou
Sufeng Duan
Hai Zhao
Rui Wang
71
188
0
14 Aug 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
68
271
0
16 Jun 2019
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
76
672
0
05 Jun 2019
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
198
3,520
0
19 Aug 2018
Structured Attention Networks
Yoon Kim
Carl Denton
Luong Hoang
Alexander M. Rush
112
463
0
03 Feb 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
Recurrent Neural Network Grammars
Chris Dyer
A. Kuncoro
Miguel Ballesteros
Noah A. Smith
GNN
89
527
0
25 Feb 2016
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
224
7,745
0
31 Aug 2015
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Kyunghyun Cho
Aaron Courville
Yoshua Bengio
78
411
0
04 Jul 2015
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Armand Joulin
Tomas Mikolov
TPM
140
411
0
03 Mar 2015
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
568
27,311
0
01 Sep 2014
1