Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.06028
Cited By
ListOps: A Diagnostic Dataset for Latent Tree Learning
17 April 2018
Nikita Nangia
Samuel R. Bowman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ListOps: A Diagnostic Dataset for Latent Tree Learning"
39 / 39 papers shown
Title
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
240
0
0
25 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Zhicheng Dou
MQ
46
0
0
22 Jan 2025
Layer-Adaptive State Pruning for Deep State Space Models
Minseon Gwak
Seongrok Moon
Joohwan Ko
PooGyeon Park
30
0
0
05 Nov 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
48
0
0
11 Aug 2024
Banyan: Improved Representation Learning with Explicit Structure
Mattia Opper
N. Siddharth
33
1
0
25 Jul 2024
HDT: Hierarchical Document Transformer
Haoyu He
Markus Flicke
Jan Buchmann
Iryna Gurevych
Andreas Geiger
43
0
0
11 Jul 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
37
8
0
12 Jun 2024
LOLAMEME: Logic, Language, Memory, Mechanistic Framework
Jay Desai
Xiaobo Guo
Srinivasan H. Sengamedu
21
0
0
31 May 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
43
1
0
01 Feb 2024
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
19
2
0
20 Dec 2023
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Nicolas Menet
Michael Hersche
G. Karunaratne
Luca Benini
Abu Sebastian
Abbas Rahimi
40
13
0
05 Dec 2023
Efficient Learning of Discrete-Continuous Computation Graphs
David Friede
Mathias Niepert
13
3
0
26 Jul 2023
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Liyuan Liu
Chengyu Dong
Xiaodong Liu
Bin-Xia Yu
Jianfeng Gao
BDL
26
20
0
17 Apr 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
33
6
0
16 Feb 2023
Ordered Memory Baselines
Daniel Borisov
Matthew D’Iorio
Jeffrey Hyacinthe
13
0
0
08 Feb 2023
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
127
36
0
15 Dec 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
42
3
0
27 Oct 2022
Neural Attentive Circuits
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
32
6
0
14 Oct 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Nurullah Sevim
Ege Ozan Özyedek
Furkan Şahinuç
Aykut Koç
40
11
0
26 Sep 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
38
183
0
21 Sep 2022
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention
Tong Yu
Ruslan Khalitov
Lei Cheng
Zhirong Yang
MoE
27
10
0
22 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention
Yang Liu
Jiaxiang Liu
L. Chen
Yuxiang Lu
Shi Feng
Zhida Feng
Yu Sun
Hao Tian
Huancheng Wu
Hai-feng Wang
31
9
0
23 Mar 2022
FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks
Maksim Zubkov
Daniil Gavrilov
27
0
0
23 Feb 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
38
212
0
17 Feb 2022
Flowformer: Linearizing Transformers with Conservation Flows
Haixu Wu
Jialong Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
14
90
0
13 Feb 2022
Dynamic Token Normalization Improves Vision Transformers
Wenqi Shao
Yixiao Ge
Zhaoyang Zhang
Xuyuan Xu
Xiaogang Wang
Ying Shan
Ping Luo
ViT
129
11
0
05 Dec 2021
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces
Kirill Struminsky
Artyom Gadetsky
D. Rakitin
Danil Karpushkin
Dmitry Vetrov
BDL
27
9
0
28 Oct 2021
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
33
55
0
14 Oct 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
38
45
0
18 May 2021
Pretrained Transformers as Universal Computation Engines
Kevin Lu
Aditya Grover
Pieter Abbeel
Igor Mordatch
28
218
0
09 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
36
349
0
03 Mar 2021
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
66
1,527
0
30 Sep 2020
Gradient Estimation with Stochastic Softmax Tricks
Max B. Paulus
Dami Choi
Daniel Tarlow
Andreas Krause
Chris J. Maddison
BDL
38
85
0
15 Jun 2020
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
22
90
0
30 Jan 2020
Cooperative Learning of Disjoint Syntax and Semantics
Serhii Havrylov
Germán Kruszewski
Armand Joulin
18
48
0
25 Feb 2019
Analyzing Compositionality-Sensitivity of NLI Models
Yixin Nie
Yicheng Wang
Joey Tianyi Zhou
CoGe
24
82
0
16 Nov 2018
1