Based on recent studies, some COVID-19 symptoms can persist for months after infection, leading to what is termed long COVID. Factors such as vaccination timing, patient characteristics, and symptoms during the acute phase of infection may contribute to the prolonged effects and intensity of long COVID. Each patient, based on their unique combination of factors, develops a specific risk or intensity of long COVID. In this work, we aim to achieve two objectives: (1) conduct a statistical analysis to identify relationships between various factors and long COVID, and (2) perform predictive analysis of long COVID intensity using these factors. We benchmark and interpret various data-driven approaches, including linear models, random forests, gradient boosting, and neural networks, using data from the Lifelines COVID-19 cohort. Our results show that Neural Networks (NN) achieve the best performance in terms of MAPE, with predictions averaging 19\% error. Additionally, interpretability analysis reveals key factors such as loss of smell, headache, muscle pain, and vaccination timing as significant predictors, while chronic disease and gender are critical risk factors. These insights provide valuable guidance for understanding long COVID and developing targeted interventions.
View on arXiv@article{leyli-abadi2025_2504.20915, title={ Statistical and Predictive Analysis to Identify Risk Factors and Effects of Post COVID-19 Syndrome }, author={ Milad Leyli-abadi and Jean-Patrick Brunet and Axel Tahmasebimoradi }, journal={arXiv preprint arXiv:2504.20915}, year={ 2025 } }