v1v2 (latest)

Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

3 October 2023

Papers citing "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"

22 / 22 papers shown

Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Alex Warstadt Aaron Mueller Leshem Choshen E. Wilcox Chengxu Zhuang ... Rafael Mosquera Bhargavi Paranjape Adina Williams Tal Linzen Ryan Cotterell 175 120 0 10 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 100 0 0 29 Mar 2025
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 94 7 0 11 Nov 2024
Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning Jason Piquenot Maxime Bérar Pierre Héroux Jean-Yves Ramel R. Raveaux Sébastien Adam 59 0 0 02 Oct 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 104 1 0 01 Feb 2024
The Surprising Computational Power of Nondeterministic Stack RNNs Brian DuSell David Chiang LRM 75 4 0 04 Oct 2022
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale Laurent Sartran Samuel Barrett A. Kuncoro Milovs Stanojević Phil Blunsom Chris Dyer 73 50 0 01 Mar 2022
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks Brian DuSell David Chiang BDL 47 15 0 05 Sep 2021
Structural Guidance for Transformer Language Models Peng Qian Tahira Naseem R. Levy Ramón Fernández Astudillo 97 31 0 30 Jul 2021
Transformers without Tears: Improving the Normalization of Self-Attention Toan Q. Nguyen Julian Salazar 86 231 0 14 Oct 2019
Tree Transformer: Integrating Tree Structures into Self-Attention Yau-Shian Wang Hung-yi Lee Yun-Nung Chen 64 146 0 14 Sep 2019
SG-Net: Syntax-Guided Machine Reading Comprehension Zhuosheng Zhang Yuwei Wu Junru Zhou Sufeng Duan Hai Zhao Rui Wang 71 188 0 14 Aug 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models Michael Hahn 68 271 0 16 Jun 2019
Learning Deep Transformer Models for Machine Translation Qiang Wang Bei Li Tong Xiao Jingbo Zhu Changliang Li Derek F. Wong Lidia S. Chao 76 672 0 05 Jun 2019
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing Taku Kudo John Richardson 198 3,520 0 19 Aug 2018
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush 112 463 0 03 Feb 2017
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 413 10,494 0 21 Jul 2016
Recurrent Neural Network Grammars Chris Dyer A. Kuncoro Miguel Ballesteros Noah A. Smith GNN 89 527 0 25 Feb 2016
Neural Machine Translation of Rare Words with Subword Units Rico Sennrich Barry Haddow Alexandra Birch 224 7,745 0 31 Aug 2015
Describing Multimedia Content using Attention-based Encoder--Decoder Networks Kyunghyun Cho Aaron Courville Yoshua Bengio 78 411 0 04 Jul 2015
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets Armand Joulin Tomas Mikolov TPM 140 411 0 03 Mar 2015
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 568 27,311 0 01 Sep 2014