ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.08282
  4. Cited By
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
v1v2 (latest)

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

14 January 2025
Hongyu Li
Jinyu Chen
Ziyu Wei
Shaofei Huang
Tianrui Hui
Jialin Gao
Xiaoming Wei
Si Liu
    AI4TS
ArXiv (abs)PDFHTML

Papers citing "LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding"

5 / 5 papers shown
Title
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
Fuwen Luo
Shengfeng Lou
C. L. Philip Chen
Ziyue Wang
Chenliang Li
...
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
AI4TSLRM
81
0
0
27 May 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao
You Li
Y. X. Wei
Lei Li
Shuhuai Ren
...
Sida Li
Dianbo Sui
Qi Liu
Yanzhe Zhang
Xu Sun
84
1
0
24 Apr 2025
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
M. Tami
Mohammed Elhenawy
Huthaifa I. Ashqar
83
0
0
21 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Yang Liu
Qi Wang
Fuzheng Zhang
VLM
145
2
0
10 Apr 2025
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko
S. Kim
Yumin Suh
Vijay Kumar B.G
Minseo Yoon
Manmohan Chandraker
Hyunwoo J. Kim
LRM
100
0
0
25 Mar 2025
1