VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation

VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation

Papers citing "VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation"