
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
Papers citing "Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning"
50 / 96 papers shown
Title |
---|
![]() Egocentric Video-Language Pretraining Kevin Qinghong Lin Alex Jinpeng Wang Mattia Soldan Michael Wray Rui Yan ...Hongfa Wang Dima Damen Guohao Li Wei Liu Mike Zheng Shou |
![]() Grounded Language-Image Pre-training Liunian Harold Li Pengchuan Zhang Haotian Zhang Jianwei Yang Chunyuan Li ...Lu Yuan Lei Zhang Lei Li Kai-Wei Chang Jianfeng Gao |