ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2512.11251
276
0

Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

12 December 2025
Yunkai Zhang
Yawen Zhang
Ming Zheng
Kezhen Chen
Chongyang Gao
Ruian Ge
Siyuan Teng
Amine Jelloul
Jinmeng Rao
Xiaoyuan Guo
Chiang-Wei Fang
Zeyu Zheng
Jie Yang
    AI4TS
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)
Main:5 Pages
2 Figures
Bibliography:1 Pages
Appendix:4 Pages
Abstract

Time-series data is critical across many scientific and industrial domains, including environmental analysis, agriculture, transportation, and finance. However, mining insights from this data typically requires deep domain expertise, a process that is both time-consuming and labor-intensive. In this paper, we propose \textbf{Insight Miner}, a large-scale multimodal model (LMM) designed to generate high-quality, comprehensive time-series descriptions enriched with domain-specific knowledge. To facilitate this, we introduce \textbf{TS-Insights}\footnote{Available at \href{this https URL}{this https URL}.}, the first general-domain dataset for time series and language alignment. TS-Insights contains 100k time-series windows sampled from 20 forecasting datasets. We construct this dataset using a novel \textbf{agentic workflow}, where we use statistical tools to extract features from raw time series before synthesizing them into coherent trend descriptions with GPT-4. Following instruction tuning on TS-Insights, Insight Miner outperforms state-of-the-art multimodal models, such as LLaVA \citep{liu2023llava} and GPT-4, in generating time-series descriptions and insights. Our findings suggest a promising direction for leveraging LMMs in time series analysis, and serve as a foundational step toward enabling LLMs to interpret time series as a native input modality.

View on arXiv
Comments on this paper