Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.10655
Cited By
Mega: Moving Average Equipped Gated Attention
21 September 2022
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mega: Moving Average Equipped Gated Attention"
50 / 132 papers shown
Title
Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring
Huiyu Gao
Depeng Dang
43
13
0
29 Mar 2024
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
57
21
0
26 Mar 2024
Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
Jiqun Chu
Zuoquan Lin
AI4TS
33
2
0
26 Mar 2024
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Yujin Tang
Peijie Dong
Zhenheng Tang
Xiaowen Chu
Junwei Liang
Mamba
68
20
0
25 Mar 2024
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
59
50
0
22 Mar 2024
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
Zunnan Xu
Yukang Lin
Haonan Han
Sicheng Yang
Ronghui Li
Yachao Zhang
Xiu Li
Mamba
46
25
0
14 Mar 2024
Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy
Jiuming Liu
Ruiji Yu
Yian Wang
Yu Zheng
Tianchen Deng
Weicai Ye
Hesheng Wang
39
43
0
11 Mar 2024
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
39
58
0
03 Mar 2024
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Raunaq M. Bhirangi
Chenyu Wang
Venkatesh Pattabiraman
Carmel Majidi
Abhinav Gupta
Tess Hellebrekers
Lerrel Pinto
56
11
0
15 Feb 2024
Scalable Diffusion Models with State Space Backbone
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
67
34
0
08 Feb 2024
Attention as Robust Representation for Time Series Forecasting
Peisong Niu
Tian Zhou
Xue Wang
Liang Sun
Rong Jin
AI4TS
21
4
0
08 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
37
28
0
05 Feb 2024
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Razvan-Gabriel Dumitru
Darius Peteleaza
Mihai Surdeanu
AI4TS
14
2
0
04 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
34
35
0
03 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Topology-Aware Exploration of Energy-Based Models Equilibrium: Toric QC-LDPC Codes and Hyperbolic MET QC-LDPC Codes
V. Usatyuk
Denis Sapozhnikov
Sergey Egorov
19
0
0
26 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Aman Chadha
Amitava Das
37
27
0
15 Jan 2024
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery
Caleb Robinson
Isaac Corley
Anthony Ortiz
Rahul Dodhia
J. L. Ferres
Peyman Najafirad
19
0
0
12 Jan 2024
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing
Zi Yang
Nan Hua
RALM
39
4
0
10 Jan 2024
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Maciej Pióro
Kamil Ciebiera
Krystian Król
Jan Ludziejewski
Michał Krutul
Jakub Krajewski
Szymon Antoniak
Piotr Miłoś
Marek Cygan
Sebastian Jaszczur
MoE
Mamba
22
55
0
08 Jan 2024
Learning Long Sequences in Spiking Neural Networks
Matei Ioan Stan
Oliver Rhodes
37
11
0
14 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
48
142
0
11 Dec 2023
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Aleksandar Terzić
Michael Hersche
G. Karunaratne
Zixiao Huang
Abu Sebastian
Abbas Rahimi
AI4TS
22
1
0
09 Dec 2023
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
34
7
0
05 Dec 2023
Diffusion Models Without Attention
Jing Nathan Yan
Jiatao Gu
Alexander M. Rush
29
61
0
30 Nov 2023
Advancing State of the Art in Language Modeling
David Herel
Tomáš Mikolov
29
1
0
28 Nov 2023
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
27
7
0
28 Nov 2023
YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
Shaohua Wu
Xudong Zhao
Shenling Wang
Jiangang Luo
Lingjun Li
...
Wei Wang
Tong Yu
Rongguo Zhang
Jiahua Zhang
Chao Wang
OSLM
50
6
0
27 Nov 2023
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
Itamar Zimerman
Moran Baruch
Nir Drucker
Gilad Ezov
Omri Soceanu
Lior Wolf
18
15
0
15 Nov 2023
To Transformers and Beyond: Large Language Models for the Genome
Micaela Elisa Consens
Cameron Dufault
Michael Wainberg
Duncan Forster
Mehran Karimzadeh
Hani Goodarzi
Fabian J. Theis
Alan Moses
Bo Wang
LM&MA
MedIm
26
26
0
13 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
41
29
0
10 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Songlin Yang
Yiran Zhong
36
74
0
08 Nov 2023
Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
Jishnu Ray Chowdhury
Cornelia Caragea
37
5
0
08 Nov 2023
Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs
Longyue Wang
Zhaopeng Tu
Yan Gu
Siyou Liu
Dian Yu
...
Bonnie Webber
Philipp Koehn
Andy Way
Yulin Yuan
Shuming Shi
25
17
0
06 Nov 2023
General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History
Junu Kim
Chaeeun Shim
Bosco Seong Kyu Yang
Chami Im
Sung Yoon Lim
Han-Gil Jeong
Edward Choi
28
8
0
31 Oct 2023
Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks
Shen Yuan
Hongteng Xu
24
0
0
26 Oct 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoE
AI4CE
16
0
0
23 Oct 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
Qingru Zhang
Dhananjay Ram
Cole Hawkins
Sheng Zha
Tuo Zhao
27
15
0
19 Oct 2023
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Shuyang Jiang
Jinchao Zhang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
54
0
0
14 Oct 2023
Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond
Siyang Liu
Naihao Deng
Sahand Sabour
Yilin Jia
Minlie Huang
Rada Mihalcea
30
18
0
09 Oct 2023
USTEP: Spatio-Temporal Predictive Learning under A Unified View
Cheng Tan
Jue Wang
Zhangyang Gao
Siyuan Li
Stan Z. Li
38
1
0
09 Oct 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Ido Amos
Jonathan Berant
Ankit Gupta
30
24
0
04 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
31
15
0
28 Sep 2023
Multi-Dimensional Hyena for Spatial Inductive Bias
Itamar Zimerman
Lior Wolf
ViT
30
4
0
24 Sep 2023
Augmenting conformers with structured state-space sequence models for online speech recognition
Haozhe Shan
Albert Gu
Zhong Meng
Weiran Wang
Krzysztof Choromanski
Tara N. Sainath
RALM
19
4
0
15 Sep 2023
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs
Or Sharir
Anima Anandkumar
32
0
0
27 Jul 2023
Sparse Modular Activation for Efficient Sequence Modeling
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
43
13
0
19 Jun 2023
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
C. Pal
Pierre-Luc Bacon
Ross Goroshin
36
17
0
15 Jun 2023
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks
Aaron Spieler
Nasim Rahaman
Georg Martius
Bernhard Schölkopf
Anna Levina
8
5
0
14 Jun 2023
2-D SSM: A General Spatial Layer for Visual Transformers
Ethan Baron
Itamar Zimerman
Lior Wolf
28
14
0
11 Jun 2023
Previous
1
2
3
Next