Title |
---|
![]() What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li Haoqin Tu Mude Hui Zeyu Wang Bingchen Zhao ...Jieru Mei Qing Liu Huangjie Zheng Yuyin Zhou Cihang Xie |
![]() GLaMM: Pixel Grounding Large Multimodal Model H. Rasheed Muhammad Maaz Sahal Shaji Mullappilly Abdelrahman M. Shaker Salman Khan Hisham Cholakkal Rao M. Anwer Erix Xing Ming-Hsuan Yang Fahad S. Khan |