EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting

27 October 2025

Main:7 Pages

10 Figures

Bibliography:1 Pages

7 Tables

Abstract

The immense success of the Transformer architecturein Natural Language Processing has led to its adoption in Time Se ries Forecasting (TSF), where superior performance has been shown.However, a recent important paper questioned their effectiveness bydemonstrating that a simple single layer linear model outperformsTransformer-based models. This was soon shown to be not as valid,by a better transformer-based model termed PatchTST. More re cently, TimeLLM demonstrated even better results by repurposing aLarge Language Model (LLM) for the TSF domain. Again, a followup paper challenged this by demonstrating that removing the LLMcomponent or replacing it with a basic attention layer in fact yieldsbetter performance. One of the challenges in forecasting is the factthat TSF data favors the more recent past, and is sometimes subjectto unpredictable events. Based upon these recent insights in TSF, wepropose a strong Mixture of Experts (MoE) framework. Our methodcombines the state-of-the-art (SOTA) models including xLSTM, en hanced Linear, PatchTST, and minGRU, among others. This set ofcomplimentary and diverse models for TSF are integrated in a Trans former based MoE gating network. Our proposed model outperformsall existing TSF models on standard benchmarks, surpassing even thelatest approaches based on MoE frameworks.

View on arXiv

Comments on this paper