Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.06949
Cited By
Dense Captioning with Joint Inference and Visual Context
21 November 2016
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dense Captioning with Joint Inference and Visual Context"
50 / 59 papers shown
Title
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
Zezhou Chen
Zhaoxiang Liu
Kai Wang
Kohou Wang
Shiguo Lian
55
0
0
25 Apr 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
52
0
0
14 Apr 2025
CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain
Jingchao Peng
Thomas Bashford-Rogers
Zhuang Shao
Haitao Zhao
Aru Ranjan Singh
Abhishek Goswami
Kurt Debattista
74
0
0
25 Nov 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
35
0
0
09 Aug 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
54
10
0
12 Mar 2024
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao
Yue Liu
Zonghao Guo
Weijia Wu
Chen Gong
Fang Wan
QiXiang Ye
26
5
0
31 Jan 2024
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
30
5
0
30 Jan 2024
Pixel Aligned Language Models
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
45
15
0
14 Dec 2023
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
31
18
0
01 Dec 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
27
1
0
30 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
22
1
0
18 Oct 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
33
47
0
12 Sep 2023
Guiding Image Captioning Models Toward More Specific Captions
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
29
15
0
31 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
85
224
0
07 Jul 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
26
27
0
28 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Jiaheng Liu
15
1
0
19 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
104
82
0
04 May 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
17
31
0
04 Mar 2023
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
32
17
0
02 Feb 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
27
4
0
26 Jan 2023
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjD
VLM
14
112
0
01 Dec 2022
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Pradip Pramanick
Chayan Sarkar
24
7
0
21 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
44
15
0
08 Oct 2022
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
107
87
0
22 Sep 2022
Persuasion Strategies in Advertisements
Yaman Kumar Singla
R. Jha
Arunim Gupta
Milan Aggarwal
Aditya Garg
Tushar Malyan
Ayush Bhardwaj
R. Shah
Balaji Krishnamurthy
Changyou Chen
DiffM
24
1
0
20 Aug 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
21
1
0
21 Jun 2022
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
Heng Wang
Chaoyi Zhang
Jianhui Yu
Weidong (Tom) Cai
3DPC
22
38
0
22 Apr 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yu-Gang Jiang
3DPC
19
46
0
10 Mar 2022
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs
Daniel Louzada Fernandes
Marcos Henrique Fonseca Ribeiro
F. Cerqueira
Michel Melo Silva
14
6
0
10 Feb 2022
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
21
29
0
02 Dec 2021
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization
A. Maharana
Joey Tianyi Zhou
27
57
0
21 Oct 2021
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Ling Cheng
Wei Wei
Feida Zhu
Yong-jin Liu
C. Miao
ViT
21
3
0
29 Sep 2021
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
72
44
0
21 Sep 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
21
63
0
25 Aug 2021
Caption Generation on Scenes with Seen and Unseen Object Categories
B. Demirel
R. G. Cinbis
VLM
17
1
0
13 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Dave Zhenyu Chen
A. Gholami
Matthias Nießner
Angel X. Chang
3DPC
23
157
0
03 Dec 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
29
27
0
08 Oct 2020
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA
Seonhoon Kim
Seohyeong Jeong
Eunbyul Kim
Inho Kang
Nojun Kwak
SSL
20
40
0
17 Sep 2020
Comprehensive Image Captioning via Scene Graph Decomposition
Yiwu Zhong
Liwei Wang
Jianshu Chen
Dong Yu
Yin Li
87
124
0
23 Jul 2020
Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts
Marzi Heidari
M. Ghatee
A. Nickabadi
Arash Pourhasan Nezhad
DiffM
MoE
32
1
0
07 Jul 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim
Zineng Tang
Joey Tianyi Zhou
27
31
0
13 May 2020
Context-Aware Group Captioning via Self-Attention and Contrastive Features
Zhuowan Li
Quan Hung Tran
Long Mai
Zhe-nan Lin
Alan Yuille
VLM
8
44
0
07 Apr 2020
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
28
0
0
02 Apr 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
31
215
0
01 Mar 2020
Contextual Sense Making by Fusing Scene Classification, Detections, and Events in Full Motion Video
Marc Bosch
Joseph Nassar
Ben Ortiz
Brendan Lammers
David Lindenbaum
J. Wahl
Robert Mangum
Margaret Smith
18
2
0
16 Jan 2020
Movienet: A Movie Multilayer Network Model using Visual and Textual Semantic Cues
Youssef Mourchid
B. Renoust
Olivier Roupin
Lê Văn
H. Cherifi
Mohammed El Hassouni
19
10
0
18 Oct 2019
Spatial Graph Convolutional Networks
Tomasz Danel
Przemysław Spurek
Jacek Tabor
Marek Śmieja
Lukasz Struski
Agnieszka Słowik
Lukasz Maziarka
GNN
32
10
0
11 Sep 2019
Image Captioning with Unseen Objects
B. Demirel
R. G. Cinbis
Nazli Ikizler-Cinbis
VLM
21
16
0
31 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
30
62
0
21 Jul 2019
1
2
Next