MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM

22 May 2025

Main:9 Pages

7 Figures

Bibliography:3 Pages

4 Tables

Abstract

Recent advances in static 3D generation have intensified the demand for physically consistent dynamic 3D content. However, existing video generation models, including diffusion-based methods, often prioritize visual realism while neglecting physical plausibility, resulting in implausible object dynamics. Prior approaches for physics-aware dynamic generation typically rely on large-scale annotated datasets or extensive model fine-tuning, which imposes significant computational and data collection burdens and limits scalability across scenarios. To address these challenges, we present MAGIC, a training-free framework for single-image physical property inference and dynamic generation, integrating pretrained image-to-video diffusion models with iterative LLM-based reasoning. Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven LLM feedback loop that adaptively steers the diffusion model toward physics-relevant motion. To translate visual dynamics into controllable physical behavior, we further introduce a differentiable MPM simulator operating directly on 3D Gaussians reconstructed from the single image, enabling physically grounded, simulation-ready outputs without any supervision or model tuning. Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence than state-of-the-art video diffusion models.

View on arXiv

@article{meng2025_2505.16456,
  title={ MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM },
  author={ Siwei Meng and Yawei Luo and Ping Liu },
  journal={arXiv preprint arXiv:2505.16456},
  year={ 2025 }
}

Comments on this paper