m4: A Learned Flow-level Network Simulator

3 March 2025

Abstract

Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traffic as continuous flows with dynamically assigned transmission rates. While this abstraction enables orders-of-magnitude speedup, it is inaccurate by omitting critical packet-level effects such as queuing, congestion control, and retransmissions.We present m4, an accurate and scalable flow-level simulator that uses machine learning to learn the dynamics of the network of interest. At the core of m4 lies a novel ML architecture that decomposes state transition computations into distinct spatial and temporal components, each represented by a suitable neural network. To efficiently learn the underlying flow-level dynamics, m4 adds dense supervision signals by predicting intermediate network metrics such as remaining flow size and queue length during training. m4 achieves a speedup of up to 104 $\times$ over packet-level simulation. Relative to a traditional flow-level simulation, m4 reduces per-flow estimation errors by 45.3% (mean) and 53.0% (p90). For closed-loop applications, m4 accurately predicts network throughput under various congestion control schemes and workloads.

View on arXiv

@article{li2025_2503.01770,
  title={ m4: A Learned Flow-level Network Simulator },
  author={ Chenning Li and Anton A. Zabreyko and Arash Nasr-Esfahany and Kevin Zhao and Prateesh Goyal and Mohammad Alizadeh and Thomas Anderson },
  journal={arXiv preprint arXiv:2503.01770},
  year={ 2025 }
}

Comments on this paper