Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.15720
Cited By
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
25 December 2023
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Set Prediction Guided by Semantic Concepts for Diverse Video Captioning"
15 / 15 papers shown
Title
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
50
1
0
30 Jul 2024
Variational Stacked Local Attention Networks for Diverse Video Captioning
Tonmoay Deb
Akib Sadmanee
Kishor Kumar
Ahsan Ali
M. Ashraful
Mahbubur Rahman
29
8
0
04 Jan 2022
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
59
86
0
09 Dec 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
61
241
0
25 Nov 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
64
181
0
17 Aug 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
150
876
0
26 Apr 2021
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan
Stefan Roth
33
42
0
02 Nov 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
57
271
0
26 Feb 2020
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
73
550
0
06 Apr 2019
Describing like humans: on diversity in image captioning
Qingzhong Wang
Antoni B. Chan
43
99
0
28 Mar 2019
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
DiffM
57
175
0
26 Nov 2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
194
7,961
0
22 May 2017
Video Captioning with Transferred Semantic Attributes
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
47
329
0
23 Nov 2016
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
219
4,451
0
20 Nov 2014
From Captions to Visual Concepts and Back
Hao Fang
Saurabh Gupta
F. Iandola
R. Srivastava
Li Deng
...
Xiaodong He
Margaret Mitchell
John C. Platt
C. L. Zitnick
Geoffrey Zweig
VLM
67
1,310
0
18 Nov 2014
1