ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.16159
  4. Cited By
ComiCap: A VLMs pipeline for dense captioning of Comic Panels

ComiCap: A VLMs pipeline for dense captioning of Comic Panels

24 September 2024
Emanuele Vivoli
Niccoló Biondi
Marco Bertini
Dimosthenis Karatzas
ArXiv (abs)PDFHTMLGithub (12★)

Papers citing "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"

13 / 13 papers shown
Title
ComicsPAP: understanding comic strips by picking the correct panel
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
140
0
0
11 Mar 2025
Toward accessible comics for blind and low vision readers
Toward accessible comics for blind and low vision readers
Christophe Rigaud
J. Burie
Samuel Petit
74
3
0
11 Jul 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
96
177
0
03 May 2024
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
105
170
0
10 Nov 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
248
1,200
0
27 Mar 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,602
0
29 Apr 2022
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
144
1,249
0
02 May 2017
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
131
1,170
0
24 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
528
62,377
0
04 Jun 2015
Show and Tell: A Neural Image Caption Generator
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
260
6,035
0
17 Nov 2014
OverFeat: Integrated Recognition, Localization and Detection using
  Convolutional Networks
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
P. Sermanet
David Eigen
Xiang Zhang
Michaël Mathieu
Rob Fergus
Yann LeCun
ObjD
155
5,008
0
21 Dec 2013
Rich feature hierarchies for accurate object detection and semantic
  segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross B. Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
ObjD
289
26,217
0
11 Nov 2013
1