Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.16159
Cited By
ComiCap: A VLMs pipeline for dense captioning of Comic Panels
24 September 2024
Emanuele Vivoli
Niccoló Biondi
Marco Bertini
Dimosthenis Karatzas
Re-assign community
ArXiv (abs)
PDF
HTML
Github (12★)
Papers citing
"ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
13 / 13 papers shown
Title
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
140
0
0
11 Mar 2025
Toward accessible comics for blind and low vision readers
Christophe Rigaud
J. Burie
Samuel Petit
74
3
0
11 Jul 2024
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
96
177
0
03 May 2024
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
105
170
0
10 Nov 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
245
1,200
0
27 Mar 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,602
0
29 Apr 2022
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
144
1,249
0
02 May 2017
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
131
1,170
0
24 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
528
62,377
0
04 Jun 2015
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
260
6,035
0
17 Nov 2014
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
P. Sermanet
David Eigen
Xiang Zhang
Michaël Mathieu
Rob Fergus
Yann LeCun
ObjD
155
5,008
0
21 Dec 2013
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross B. Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
ObjD
289
26,217
0
11 Nov 2013
1