Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.16500
Cited By
CogVLM2: Visual Language Models for Image and Video Understanding
29 August 2024
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
Yan Wang
Yean Cheng
Shiyu Huang
Junhui Ji
Zhao Xue
Lei Zhao
Zhuoyi Yang
Xiaotao Gu
Xiaohan Zhang
Guanyu Feng
Da Yin
Zihan Wang
Ji Qi
Xixuan Song
Peng Zhang
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (2356★)
Papers citing
"CogVLM2: Visual Language Models for Image and Video Understanding"
3 / 53 papers shown
Title
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
87
561
0
14 Apr 2017
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
85
234
0
28 Mar 2017
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
233
5,509
0
03 May 2015
Previous
1
2