
v1v2 (latest)
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Papers citing "UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling"
50 / 78 papers shown
Title |
---|
![]() Grounded Language-Image Pre-training Liunian Harold Li Pengchuan Zhang Haotian Zhang Jianwei Yang Chunyuan Li ...Lu Yuan Lei Zhang Lei Li Kai-Wei Chang Jianfeng Gao |