ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22143
16
0

3D Question Answering via only 2D Vision-Language Models

28 May 2025
Fengyun Wang
Sicheng Yu
Jiawei Wu
Jinhui Tang
Hanwang Zhang
Qianru Sun
ArXiv (abs)PDFHTML
Main:8 Pages
10 Figures
Bibliography:3 Pages
9 Tables
Appendix:5 Pages
Abstract

Large vision-language models (LVLMs) have significantly advanced numerous fields. In this work, we explore how to harness their potential to address 3D scene understanding tasks, using 3D question answering (3D-QA) as a representative example. Due to the limited training data in 3D, we do not train LVLMs but infer in a zero-shot manner. Specifically, we sample 2D views from a 3D point cloud and feed them into 2D models to answer a given question. When the 2D model is chosen, e.g., LLAVA-OV, the quality of sampled views matters the most. We propose cdViews, a novel approach to automatically selecting critical and diverse Views for 3D-QA. cdViews consists of two key components: viewSelector prioritizing critical views based on their potential to provide answer-specific information, and viewNMS enhancing diversity by removing redundant views based on spatial overlap. We evaluate cdViews on the widely-used ScanQA and SQA benchmarks, demonstrating that it achieves state-of-the-art performance in 3D-QA while relying solely on 2D models without fine-tuning. These findings support our belief that 2D LVLMs are currently the most effective alternative (of the resource-intensive 3D LVLMs) for addressing 3D tasks.

View on arXiv
@article{wang2025_2505.22143,
  title={ 3D Question Answering via only 2D Vision-Language Models },
  author={ Fengyun Wang and Sicheng Yu and Jiawei Wu and Jinhui Tang and Hanwang Zhang and Qianru Sun },
  journal={arXiv preprint arXiv:2505.22143},
  year={ 2025 }
}
Comments on this paper