ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.10655
  4. Cited By
Mega: Moving Average Equipped Gated Attention

Mega: Moving Average Equipped Gated Attention

21 September 2022
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
ArXivPDFHTML

Papers citing "Mega: Moving Average Equipped Gated Attention"

50 / 132 papers shown
Title
Revisiting Reset Mechanisms in Spiking Neural Networks for Sequential Modeling: Specialized Discretization for Binary Activated RNN
Revisiting Reset Mechanisms in Spiking Neural Networks for Sequential Modeling: Specialized Discretization for Binary Activated RNN
Enqi Zhang
MQ
149
0
0
24 Apr 2025
CacheFormer: High Attention-Based Segment Caching
CacheFormer: High Attention-Based Segment Caching
Sushant Singh
A. Mahmood
41
0
0
18 Apr 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
0
0
17 Apr 2025
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fa-Ting Hong
Zunnan Xu
Zixiang Zhou
Zhiqiang Zhang
Xiu Li
Qin Lin
Qinglin Lu
D. Xu
DiffM
VGen
57
2
0
03 Apr 2025
Fast Training of Recurrent Neural Networks with Stationary State Feedbacks
Fast Training of Recurrent Neural Networks with Stationary State Feedbacks
Paul Caillon
Erwan Fagnou
Alexandre Allauzen
39
0
0
29 Mar 2025
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Shriyank Somvanshi
Md Monzurul Islam
Mahmuda Sultana Mimi
Sazzad Bin Bashar Polock
Gaurab Chhetri
Subasish Das
Mamba
AI4TS
45
0
0
22 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
61
0
0
17 Mar 2025
Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability
Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability
Carlos E. Luis
A. Bottero
Julia Vinogradska
Felix Berkenkamp
Jan Peters
78
1
0
20 Feb 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
152
612
0
31 Dec 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma
Zhenliang Ni
Xinghao Chen
Mamba
85
2
0
26 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
54
4
0
16 Nov 2024
Revealing and Mitigating the Local Pattern Shortcuts of Mamba
Revealing and Mitigating the Local Pattern Shortcuts of Mamba
Wangjie You
Zecheng Tang
Juntao Li
Lili Yao
Min Zhang
Mamba
24
0
0
21 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
17
1
0
14 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
J. Li
Weiyao Lin
VLM
38
1
0
09 Oct 2024
S7: Selective and Simplified State Space Layers for Sequence Modeling
S7: Selective and Simplified State Space Layers for Sequence Modeling
Taylan Soydan
Nikola Zubić
Nico Messikommer
Siddhartha Mishra
Davide Scaramuzza
44
4
0
04 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
38
0
03 Oct 2024
A Little Goes a Long Way: Efficient Long Context Training and Inference
  with Partial Contexts
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Suyu Ge
Xihui Lin
Yunan Zhang
Jiawei Han
Hao Peng
33
4
0
02 Oct 2024
Analog In-Memory Computing Attention Mechanism for Fast and
  Energy-Efficient Large Language Models
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Nathan Leroux
Paul-Philipp Manea
Chirag Sudarshan
Jan Finkbeiner
Sebastian Siegel
J. Strachan
Emre Neftci
31
1
0
28 Sep 2024
SITSMamba for Crop Classification based on Satellite Image Time Series
SITSMamba for Crop Classification based on Satellite Image Time Series
Xiaolei Qin
Xin Su
Liangpei Zhang
Mamba
22
1
0
15 Sep 2024
Learning Brain Tumor Representation in 3D High-Resolution MR Images via
  Interpretable State Space Models
Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models
Qingqiao Hu
Daoan Zhang
Jiebo Luo
Zhenyu Gong
Benedikt Wiestler
Jianguo Zhang
Hongwei Bran Li
34
0
0
12 Sep 2024
BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking
BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking
Hanzheng Wang
Wei Li
X. Xia
Qian Du
57
1
0
22 Aug 2024
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series
  Forecasting
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting
Yongbo Yu
Weizhong Yu
Feiping Nie
Xuelong Li
AI4TS
19
1
0
20 Aug 2024
Reparameterized Multi-Resolution Convolutions for Long Sequence
  Modelling
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
Harry Jake Cunningham
Giorgio Giannone
Mingtian Zhang
M. Deisenroth
30
0
0
18 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
44
0
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
49
0
0
10 Aug 2024
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
Zifeng Ding
Yifeng Li
Yuan He
Antonio Norelli
Jingcheng Wu
Volker Tresp
Yunpu Ma
Michael Bronstein
Mamba
51
3
0
08 Aug 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
59
113
0
11 Jul 2024
On the Power of Convolution Augmented Transformer
On the Power of Convolution Augmented Transformer
Mingchen Li
Xuechen Zhang
Yixiao Huang
Samet Oymak
37
0
0
08 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive
  Benchmark of Long Context Capable Approaches
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
43
17
0
01 Jul 2024
From Efficient Multimodal Models to World Models: A Survey
From Efficient Multimodal Models to World Models: A Survey
Xinji Mai
Zeng Tao
Junxiong Lin
Haoran Wang
Yang Chang
Yanlan Kang
Yan Wang
Wenqiang Zhang
32
5
0
27 Jun 2024
Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range
  Global Weather Forecast
Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast
Yifan Hu
Fukang Yin
Weimin Zhang
Kaijun Ren
Junqiang Song
Kefeng Deng
Di Zhang
AI4Cl
40
0
0
26 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient
  Zero-Shot Text to Speech Synthesizers
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
40
3
0
22 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
35
7
0
12 Jun 2024
MambaLRP: Explaining Selective State Space Sequence Models
MambaLRP: Explaining Selective State Space Sequence Models
F. Jafari
G. Montavon
Klaus-Robert Müller
Oliver Eberle
Mamba
62
9
0
11 Jun 2024
Leveraging Large Language Models for Efficient Failure Analysis in Game
  Development
Leveraging Large Language Models for Efficient Failure Analysis in Game Development
Leonardo Marini
Linus Gisslén
Alessandro Sestini
54
0
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
74
56
0
11 Jun 2024
What Can We Learn from State Space Models for Machine Learning on
  Graphs?
What Can We Learn from State Space Models for Machine Learning on Graphs?
Yinan Huang
Siqi Miao
Pan Li
44
7
0
09 Jun 2024
Exploring Adversarial Robustness of Deep State Space Models
Exploring Adversarial Robustness of Deep State Space Models
Biqing Qi
Yang Luo
Junqi Gao
Pengfei Li
Kai Tian
Zhiyuan Ma
Bowen Zhou
AAML
50
1
0
08 Jun 2024
Learning 1D Causal Visual Representation with De-focus Attention
  Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Yu Qiao
Jie Zhou
Jifeng Dai
70
1
0
06 Jun 2024
SMR: State Memory Replay for Long Sequence Modeling
SMR: State Memory Replay for Long Sequence Modeling
Biqing Qi
Junqi Gao
Kaiyan Zhang
Dong Li
Jianxing Liu
Ligang Wu
Bowen Zhou
33
5
0
27 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
53
28
0
27 May 2024
GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images
GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images
Xinying Wang
Zhixiong Huang
Sifan Zhang
Jiawen Zhu
Lin Feng
Mamba
25
5
0
13 May 2024
Matten: Video Generation with Mamba-Attention
Matten: Video Generation with Mamba-Attention
Yu Gao
Jiancheng Huang
Xiaopeng Sun
Zequn Jie
Yujie Zhong
Lin Ma
72
12
0
05 May 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured
  Memory
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
27
4
0
17 Apr 2024
State Space Model for New-Generation Network Alternative to
  Transformers: A Survey
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Tianlin Li
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
33
49
0
15 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
53
27
0
12 Apr 2024
HGRN2: Gated Linear RNNs with State Expansion
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
47
47
0
11 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
51
73
0
08 Apr 2024
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
40
24
0
06 Apr 2024
123
Next