Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, resulting in a nonlinear recursive structure. In this paper, we first introduce the concept of Nonlinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Nonlinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates its superior performance and computation efficiency in both language and vision modeling.

View on arXiv

@article{hu2025_2506.02475,
  title={ Comba: Improving Bilinear RNNs with Closed-loop Control },
  author={ Jiaxi Hu and Yongqi Pan and Jusen Du and Disen Lan and Xiaqiang Tang and Qingsong Wen and Yuxuan Liang and Weigao Sun },
  journal={arXiv preprint arXiv:2506.02475},
  year={ 2025 }
}

Comments on this paper