ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.07887
  4. Cited By
An Empirical Study of Mamba-based Language Models

An Empirical Study of Mamba-based Language Models

12 June 2024
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
Tri Dao
Albert Gu
Ali Hatamizadeh
Sudhakar Singh
Deepak Narayanan
Garvit Kulshreshtha
Vartika Singh
Jared Casper
Jan Kautz
Mohammad Shoeybi
Bryan Catanzaro
ArXivPDFHTML

Papers citing "An Empirical Study of Mamba-based Language Models"

50 / 58 papers shown
Title
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang
Mehdi Rezagholizadeh
Guihong Li
Vikram Appia
Emad Barsoum
50
0
0
22 May 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
368
0
0
28 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jing Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
139
10
0
18 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
119
3
0
18 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
459
0
0
14 Mar 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
158
1
0
16 Feb 2025
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez
Luca Zancato
Benjamin Bowman
Aditya Golatkar
Wei Xia
Stefano Soatto
158
4
0
17 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTD
VGen
VLM
DiffM
151
8
0
13 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
138
4
0
28 Nov 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
129
45
0
03 Oct 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
97
1
0
16 Sep 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms
  Through Structured State Space Duality
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
75
498
0
31 May 2024
Zamba: A Compact 7B SSM Hybrid Model
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
62
47
0
26 May 2024
Jamba: A Hybrid Transformer-Mamba Language Model
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber
Barak Lenz
Hofit Bata
Gal Cohen
Jhonathan Osin
...
Nir Ratner
N. Rozen
Erez Shwartz
Mor Zusman
Y. Shoham
74
219
0
28 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
89
129
0
29 Feb 2024
Nemotron-4 15B Technical Report
Nemotron-4 15B Technical Report
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
M. Patwary
Sandeep Subramanian
...
Ashwath Aithal
Oleksii Kuchaiev
Mohammad Shoeybi
Jonathan Cohen
Bryan Catanzaro
44
23
0
26 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
79
72
0
06 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
120
91
0
01 Feb 2024
Zoology: Measuring and Improving Recall in Efficient Language Models
Zoology: Measuring and Improving Recall in Efficient Language Models
Simran Arora
Sabri Eyuboglu
Aman Timalsina
Isys Johnson
Michael Poli
James Zou
Atri Rudra
Christopher Ré
73
75
0
08 Dec 2023
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
124
2,636
0
01 Dec 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
61
2,170
0
10 Oct 2023
Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models
Wenhan Xiong
Jingyu Liu
Igor Molybog
Hejia Zhang
Prajjwal Bhargava
...
Dániel Baráth
Sergey Edunov
Mike Lewis
Sinong Wang
Hao Ma
60
223
0
27 Sep 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
81
582
0
28 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
267
11,791
0
18 Jul 2023
Block-State Transformers
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
C. Pal
Pierre-Luc Bacon
Ross Goroshin
62
17
0
15 Jun 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
63
657
0
22 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.2K
14,289
0
15 Mar 2023
Diagonal State Space Augmented Transformers for Speech Recognition
Diagonal State Space Augmented Transformers for Speech Recognition
G. Saon
Ankit Gupta
Xiaodong Cui
AI4TS
47
27
0
27 Feb 2023
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
Mohammad Shoeybi
Bryan Catanzaro
AI4CE
108
266
0
10 May 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences
SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham
Elad Segal
Maor Ivgi
Avia Efrat
Ori Yoran
...
Ankit Gupta
Wenhan Xiong
Mor Geva
Jonathan Berant
Omer Levy
RALM
79
137
0
10 Jan 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
186
1,761
0
31 Oct 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
125
1,881
0
08 Sep 2021
MuSiQue: Multihop Questions via Single-hop Question Composition
MuSiQue: Multihop Questions via Single-hop Question Composition
H. Trivedi
Niranjan Balasubramanian
Tushar Khot
Ashish Sabharwal
LRM
72
267
0
02 Aug 2021
A Dataset of Information-Seeking Questions and Answers Anchored in
  Research Papers
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
Pradeep Dasigi
Kyle Lo
Iz Beltagy
Arman Cohan
Noah A. Smith
Matt Gardner
RALM
85
302
0
07 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
225
2,440
0
20 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
82
682
0
09 Apr 2021
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of
  Reasoning Steps
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Xanh Ho
A. Nguyen
Saku Sugawara
Akiko Aizawa
RALM
LRM
60
445
0
02 Nov 2020
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
146
1,115
0
14 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
157
4,377
0
07 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
702
41,736
0
28 May 2020
GLU Variants Improve Transformer
GLU Variants Improve Transformer
Noam M. Shazeer
120
989
0
12 Feb 2020
PIQA: Reasoning about Physical Commonsense in Natural Language
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
118
1,776
0
26 Nov 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
310
1,892
0
17 Sep 2019
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
362
883
0
13 Sep 2019
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee
Ming-Wei Chang
Kristina Toutanova
RALM
94
1,010
0
01 Jun 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
156
2,446
0
19 May 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question
  Answering
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
150
2,635
0
25 Sep 2018
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book
  Question Answering
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
96
1,516
0
08 Sep 2018
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
178
3,514
0
19 Aug 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
259
2,837
0
11 Jun 2018
12
Next