Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2310.02980
Cited By
v1
v2
v3
v4 (latest)
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
International Conference on Learning Representations (ICLR), 2023
4 October 2023
Ido Amos
Jonathan Berant
Ankit Gupta
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors"
23 / 23 papers shown
Title
Rethinking the long-range dependency in Mamba/SSM and transformer models
Cong Ma
Kayvan Najarian
Mamba
114
1
0
04 Sep 2025
Uncovering the Spectral Bias in Diagonal State Space Models
Rubén Solozabal
Velibor Bojkovic
Hilal AlQuabeh
Kentaro Inui
Martin Takáč
68
1
0
28 Aug 2025
Comba: Improving Bilinear RNNs with Closed-loop Control
Jiaxi Hu
Yongqi Pan
Jusen Du
Disen Lan
Xiaqiang Tang
Qingsong Wen
Yuxuan Liang
Weigao Sun
597
0
0
03 Jun 2025
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
342
2
0
27 May 2025
Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model
Peisong Niu
Ziqing Ma
Tian Zhou
Weiqi Chen
Lefei Shen
Rong Jin
Liang Sun
AI4CE
128
1
0
20 May 2025
Bridge the Domains: Large Language Models Enhanced Cross-domain Sequential Recommendation
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Qidong Liu
Xiangyu Zhao
Yejing Wang
Zijian Zhang
Howard Zhong
Chong Chen
Xiaochen Li
Wei Huang
Feng Tian
AI4TS
244
14
0
25 Apr 2025
A Robust Real-Time Lane Detection Method with Fog-Enhanced Feature Fusion for Foggy Conditions
Ronghui Zhang
Yuhang Ma
Tengfei Li
Ziyu Lin
Yueying Wu
Junzhou Chen
Lin Zhang
Jia Hu
Tony Z. Qiu
Konghui Guo
434
1
0
08 Apr 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
314
0
0
31 Dec 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Neural Information Processing Systems (NeurIPS), 2024
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
229
14
0
16 Nov 2024
Revealing and Mitigating the Local Pattern Shortcuts of Mamba
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Wangjie You
Zecheng Tang
Juntao Li
Lili Yao
Min Zhang
Mamba
116
0
0
21 Oct 2024
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman
Hrayr Harutyunyan
J. Zico Kolter
Sanjiv Kumar
Srinadh Bhojanapalli
Mamba
104
8
0
14 Oct 2024
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
Neural Information Processing Systems (NeurIPS), 2024
Harry Jake Cunningham
Giorgio Giannone
Mingtian Zhang
M. Deisenroth
236
3
0
18 Aug 2024
DDK: Distilling Domain Knowledge for Efficient Large Language Models
Jiaheng Liu
Chenchen Zhang
Jinyang Guo
Yuanxing Zhang
Haoran Que
...
Congnan Liu
Yuchi Xu
Jiamang Wang
Lin Qu
Bo Zheng
197
28
0
23 Jul 2024
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Mustafa Hajij
Marcos Vinícius Treviso
André F. T. Martins
Mamba
145
3
0
07 Jul 2024
Pretrained Hybrids with MAD Skills
Nicholas Roberts
Samuel Guo
Zhiqi Gao
Satya Sai Srinath Namburi
Sonia Cromp
Chengjun Wu
Chengyu Duan
Frederic Sala
Mamba
255
0
0
02 Jun 2024
Large Language Models Enhanced Sequential Recommendation for Long-tail User and Item
Qidong Liu
Xian Wu
Xiangyu Zhao
Yejing Wang
Zijian Zhang
Feng Tian
Yefeng Zheng
139
1
0
31 May 2024
Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints
Dániel Rácz
Mihaly Petreczky
Bálint Daróczy
370
1
0
30 May 2024
Low-rank finetuning for LLMs: A fairness perspective
Saswat Das
Marco Romanelli
Cuong Tran
Zarreen Reza
B. Kailkhura
Ferdinando Fioretto
138
4
0
28 May 2024
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li
Qianshan Wei
Chuanyi Zhang
Guilin Qi
Miaozeng Du
Yongrui Chen
Sheng Bi
Fan Liu
VLM
MU
344
28
0
21 May 2024
State-Free Inference of State-Space Models: The Transfer Function Approach
International Conference on Machine Learning (ICML), 2024
Rom N. Parnichkun
Stefano Massaroli
Alessandro Moro
Jimmy T.H. Smith
Ramin Hasani
...
Hajime Asama
Stefano Ermon
Taiji Suzuki
Atsushi Yamashita
Michael Poli
176
15
0
10 May 2024
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Tianlin Li
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
318
78
0
15 Apr 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
251
7
0
19 Feb 2024
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Gleb Gerasimov
Nikita Balagansky
Sofia Maria Lo Cicero Vaina
Boris Shaposhnikov
Alexey Gorbatovski
Daniil Gavrilov
KELM
205
9
0
16 Feb 2024
1