Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.06462
Cited By
Dual-Level Collaborative Transformer for Image Captioning
16 January 2021
Yunpeng Luo
Jiayi Ji
Xiaoshuai Sun
Liujuan Cao
Yongjian Wu
Feiyue Huang
Chia-Wen Lin
Rongrong Ji
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dual-Level Collaborative Transformer for Image Captioning"
32 / 32 papers shown
Title
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya J. Bajpai
M. Hanawal
80
0
0
02 Feb 2025
PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement
Chengyou Jia
Minnan Luo
Zhuohang Dang
Guangwen Dai
Xiao Chang
Yufei Guo
DiffM
56
1
0
31 Dec 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine
Kanta Kaneda
Shunya Nagashima
Ryosuke Korekata
Motonari Kambara
Komei Sugiura
43
6
0
26 Dec 2023
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang
Bo-Wen Yin
Yuming Chen
Zheng Lin
Yunheng Li
Qibin Hou
Ming-Ming Cheng
CLIP
DiffM
42
7
0
07 Dec 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling
Yu Zhao
Hao Fei
Yixin Cao
Bobo Li
Meishan Zhang
Jianguo Wei
Hao Fei
Tat-Seng Chua
19
13
0
09 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
42
6
0
02 Aug 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
37
21
0
25 May 2023
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
32
6
0
20 May 2023
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee
Bedrich Benes
ViT
32
0
0
28 Mar 2023
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Rongrong Ji
ViT
21
71
0
13 Feb 2023
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval
Yan Zhang
Zhong Ji
Dingrong Wang
Yanwei Pang
Xuelong Li
VLM
24
22
0
17 Jan 2023
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
22
23
0
04 Dec 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
29
1
0
20 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
36
10
0
21 Sep 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
36
106
0
20 Jul 2022
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
Wenqi Zhao
Liang Gao
ViT
19
27
0
10 Jul 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
528
0
13 Jun 2022
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang
Jungang Xu
Yingfei Sun
VLM
ViT
26
117
0
29 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
27
9
0
10 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
38
27
0
21 Feb 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
34
192
0
29 Nov 2021
Self-Annotated Training for Controllable Image Captioning
Zhangzi Zhu
Tianlei Wang
Hong Qu
27
2
0
16 Oct 2021
Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds
Fangcen Liu
Chenqiang Gao
Fangge Chen
Deyu Meng
W. Zuo
Xinbo Gao
ViT
39
37
0
29 Sep 2021
Cross Modification Attention Based Deliberation Model for Image Captioning
Zheng Lian
Yanan Zhang
Haichang Li
Rui Wang
Xiaohui Hu
24
4
0
17 Sep 2021
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Wentian Zhao
Yao Hu
Heda Wang
Xinxiao Wu
Jiebo Luo
23
47
0
26 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
135
189
0
19 Mar 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
300
10,225
0
16 Nov 2016
1