Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.05892
Cited By
v1
v2 (latest)
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
8 April 2024
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
Stella Biderman
Eugene Cheah
Xingjian Du
Teddy Ferdinan
Haowen Hou
P. Kazienko
G. Kranthikiran
Jan Kocoñ
Bartlomiej Koptyra
Satyapriya Krishna
Ronald McClelland
Niklas Muennighoff
Fares Obeid
Atsushi Saito
Guangyu Song
Haoqin Tu
Stanislaw Wo'zniak
Ruichong Zhang
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
Re-assign community
ArXiv (abs)
PDF
HTML
Github (44★)
Papers citing
"Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence"
45 / 45 papers shown
Title
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
136
0
0
30 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
112
0
0
19 Apr 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
212
3
0
07 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng
113
1
0
03 Mar 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
144
2
0
21 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
132
4
0
19 Feb 2025
LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data
Peer Nagy
Sascha Frey
Kang Li
Bidipta Sarkar
Svitlana Vyetrenko
Stefan Zohren
Ani Calinescu
Jakob Foerster
153
1
0
13 Feb 2025
Linear Attention Modeling for Learned Image Compression
Donghui Feng
Zhengxue Cheng
Shen Wang
Ronghua Wu
Hongwei Hu
Guo Lu
Li Song
323
1
0
09 Feb 2025
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
152
0
0
22 Oct 2024
OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
Junming Wang
Wei Yin
Xiaoxiao Long
Xingyu Zhang
Zebin Xing
Xiaoyang Guo
Qian Zhang
3DPC
92
3
0
30 Sep 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
119
105
0
05 Jul 2024
Theoretical Foundations of Deep Selective State-Space Models
Nicola Muca Cirone
Antonio Orvieto
Benjamin Walker
C. Salvi
Terry Lyons
Mamba
197
32
0
29 Feb 2024
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Zhengqing Yuan
Zhaoxu Li
Weiran Huang
Yanfang Ye
Lichao Sun
47
51
0
28 Dec 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
356
4,388
0
09 Jun 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
216
597
0
22 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
107
2,067
0
11 May 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
326
295
0
11 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
91
299
0
21 Feb 2023
ChatGPT: Jack of all trades, master of none
Jan Kocoñ
Igor Cichecki
Oliwier Kaszyca
Mateusz Kochanek
Dominika Szydło
...
Maciej Piasecki
Lukasz Radliñski
Konrad Wojtasik
Stanislaw Wo'zniak
Przemyslaw Kazienko
AI4MH
107
548
0
21 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
426
4,563
0
30 Jan 2023
MTEB: Massive Text Embedding Benchmark
Niklas Muennighoff
Nouamane Tazi
L. Magne
Nils Reimers
517
404
0
13 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
278
1,245
0
20 Sep 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
174
835
0
14 Apr 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
119
306
0
27 Mar 2022
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
103
307
0
20 Dec 2021
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Mandy Guo
Joshua Ainslie
David C. Uthus
Santiago Ontanon
Jianmo Ni
Yun-hsuan Sung
Yinfei Yang
VLM
62
313
0
15 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
342
1,702
0
15 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
471
10,367
0
17 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
418
2,674
0
04 May 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
450
2,096
0
31 Dec 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
181
1,585
0
30 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
156
1,123
0
14 Sep 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
117
519
0
17 Aug 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
543
2,086
0
28 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,771
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
216
1,706
0
08 Jun 2020
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Edoardo Ponti
Goran Glavaš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
LRM
70
325
0
01 May 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
104
1,025
0
21 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
171
4,071
0
10 Apr 2020
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
86
340
0
26 Feb 2020
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
125
1,899
0
23 Apr 2019
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
85
1,218
0
18 Apr 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,159
0
20 Apr 2018
Group Normalization
Yuxin Wu
Kaiming He
231
3,660
0
22 Mar 2018
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.0K
23,354
0
03 Jun 2014
1