Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.10655
Cited By
Mega: Moving Average Equipped Gated Attention
21 September 2022
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mega: Moving Average Equipped Gated Attention"
32 / 132 papers shown
Title
Coneheads: Hierarchy Aware Attention
Albert Tseng
Tao Yu
Toni J.B. Liu
Chris De Sa
3DPC
17
5
0
01 Jun 2023
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
49
11
0
30 May 2023
A Quantitative Review on Language Model Efficiency Research
Meng Jiang
Hy Dang
Lingbo Tong
25
0
0
28 May 2023
Robust Natural Language Understanding with Residual Attention Debiasing
Fei Wang
James Y. Huang
Tianyi Yan
Wenxuan Zhou
Muhao Chen
34
10
0
28 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
32
9
0
24 May 2023
Adapting Language Models to Compress Contexts
Alexis Chevalier
Alexander Wettig
Anirudh Ajith
Danqi Chen
LLMAG
13
174
0
24 May 2023
Challenges in Context-Aware Neural Machine Translation
Linghao Jin
Jacqueline He
Jonathan May
Xuezhe Ma
38
7
0
23 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
557
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
41
6
0
21 May 2023
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
Xue Wang
Tian Zhou
Qingsong Wen
Jinyang Gao
Bolin Ding
Rong Jin
AI4TS
26
38
0
20 May 2023
Accurate Knowledge Distillation with n-best Reranking
Hendra Setiawan
21
2
0
20 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
21
0
0
15 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
32
84
0
12 May 2023
State Spaces Aren't Enough: Machine Translation Needs Attention
Ali Vardasbi
Telmo Pires
Robin M. Schmidt
Stephan Peitz
24
9
0
25 Apr 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
88
266
0
11 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
41
4
0
03 Mar 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
16
52
0
13 Feb 2023
DNArch: Learning Convolutional Neural Architectures by Backpropagation
David W. Romero
Neil Zeghidour
AI4CE
21
4
0
10 Feb 2023
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture
P. Verschuuren
Jie Gao
A. V. Eeden
Stylianos Oikonomou
Anil Bandhakavi
30
2
0
09 Jan 2023
A Length-Extrapolatable Transformer
Yutao Sun
Li Dong
Barun Patra
Shuming Ma
Shaohan Huang
Alon Benhaim
Vishrav Chaudhary
Xia Song
Furu Wei
30
115
0
20 Dec 2022
Pretraining Without Attention
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
27
48
0
20 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
125
36
0
15 Dec 2022
Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Ankit Gupta
Harsh Mehta
Jonathan Berant
29
21
0
01 Dec 2022
What Makes Convolutional Models Great on Long Sequence Modeling?
Yuhong Li
Tianle Cai
Yi Zhang
De-huai Chen
Debadeepta Dey
VLM
39
96
0
17 Oct 2022
WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability
Yufan Zhuang
Zihan Wang
Fangbo Tao
Jingbo Shang
ViT
AI4TS
35
3
0
05 Oct 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
78
222
0
21 Feb 2022
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes
David W. Romero
Robert-Jan Bruintjes
Jakub M. Tomczak
Erik J. Bekkers
Mark Hoogendoorn
J. C. V. Gemert
80
82
0
15 Oct 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
88
152
0
17 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
695
0
27 Aug 2021
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,926
0
17 Aug 2015
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
255
13,364
0
25 Aug 2014
Previous
1
2
3