Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.01518
Cited By
Generating Descriptions with Grounded and Co-Referenced People
5 April 2017
Anna Rohrbach
Marcus Rohrbach
Siyu Tang
Seong Joon Oh
Bernt Schiele
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Descriptions with Grounded and Co-Referenced People"
12 / 12 papers shown
Title
Interpretable Zero-shot Learning with Infinite Class Concepts
Zihan Ye
Shreyank N Gowda
Shiming Chen
Yaochu Jin
Kaizhu Huang
Xiaobo Jin
VLM
39
0
0
06 May 2025
Multimodal Coreference Resolution for Chinese Social Media Dialogues: Dataset and Benchmark Approach
Xingyu Li
Chen Gong
Guohong Fu
VGen
29
0
0
19 Apr 2025
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
52
21
0
22 Feb 2023
Who are you referring to? Coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
25
3
0
26 Nov 2022
Creating Multimedia Summaries Using Tweets and Videos
Anietie U Andy
Siyi Liu
Daphne Ippolito
Reno Kriz
Chris Callison-Burch
Derry Wijaya
23
0
0
16 Mar 2022
Video Face Clustering with Unknown Number of Clusters
Makarand Tapaswi
M. Law
Sanja Fidler
CVBM
27
60
0
09 Aug 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
26
86
0
07 Mar 2019
Self-Supervised Learning of Face Representations for Video Face Clustering
Vivek Sharma
Makarand Tapaswi
M. Sarfraz
Rainer Stiefelhagen
SSL
CVBM
14
49
0
03 Mar 2019
Learning To Follow Directions in Street View
Karl Moritz Hermann
Mateusz Malinowski
Piotr Wojciech Mirowski
Andras Banki-Horvath
Keith Anderson
R. Hadsell
SSL
24
66
0
01 Mar 2019
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
27
190
0
17 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
50
866
0
27 Nov 2018
Deeply learned face representations are sparse, selective, and robust
Yi Sun
Xiaogang Wang
Xiaoou Tang
CVBM
250
921
0
03 Dec 2014
1