Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.10804
Cited By
CPTR: Full Transformer Network for Image Captioning
26 January 2021
Wei Liu
Sihan Chen
Longteng Guo
Xinxin Zhu
Jing Liu
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CPTR: Full Transformer Network for Image Captioning"
50 / 50 papers shown
Title
SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation
Tudor Jianu
Shayan Doust
Mengyun Li
Baoru Huang
Tuong Khanh Long Do
...
Karl Bates
Tung D. Ta
S. Fichera
Pierre Berthet-Rayne
Anh Nguyen
MedIm
35
0
0
08 Jan 2025
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
37
0
0
09 Nov 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Devank
Jayateja Kalla
Soma Biswas
34
1
0
06 Oct 2024
PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation
Blessing Agyei Kyem
Eugene Kofi Okrah Denteh
Joshua Kofi Asamoah
Armstrong Aboah
14
2
0
07 Aug 2024
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
42
0
0
25 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
32
9
0
21 May 2024
Generative Multi-modal Models are Good Class-Incremental Learners
Xusheng Cao
Haori Lu
Linlan Huang
Xialei Liu
Ming-Ming Cheng
CLL
49
10
0
27 Mar 2024
Image Captioning in news report scenario
Tianrui Liu
Qi Cai
Changxin Xu
Bo Hong
Jize Xiong
Yuxin Qiao
Tsungwei Yang
40
11
0
24 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming
Wei Chen
Yun Li
Yuan Tian
AI4CE
30
0
0
09 Mar 2024
Rule-driven News Captioning
Ning Xu
Tingting Zhang
Hongshuo Tian
An-An Liu
68
0
0
08 Mar 2024
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data
Nurbanu Aksoy
Nishant Ravikumar
Alejandro F Frangi
ViT
MedIm
19
8
0
18 Nov 2023
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation
Nurbanu Aksoy
Serge Sharoff
Selçuk Başer
Nishant Ravikumar
Alejandro F Frangi
MedIm
19
4
0
18 Nov 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen
Erik Cambria
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
19
18
0
06 Sep 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
29
3
0
22 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng
Li Lin
Junyan Lyu
Yijin Huang
Wenhan Luo
Xiaoying Tang
MedIm
37
45
0
24 Jul 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
56
74
0
10 Apr 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
27
8
0
26 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
85
159
0
21 Mar 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
32
29
0
16 Feb 2023
Adjacent-Level Feature Cross-Fusion With 3-D CNN for Remote Sensing Image Change Detection
Y. Ye
Mengmeng Wang
Liang Zhou
Guangyang Lei
Jianwei Fan
Yao Qin
3DPC
27
37
0
10 Feb 2023
Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
12
7
0
17 Jan 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Erik Cambria
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
21
52
0
06 Jan 2023
Exploring Efficient Few-shot Adaptation for Vision Transformers
C. Xu
Siqian Yang
Yabiao Wang
Zhanxiong Wang
Yanwei Fu
Xiangyang Xue
35
16
0
06 Jan 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
88
0
0
05 Jan 2023
Using Human Perception to Regularize Transfer Learning
Justin Dulay
Walter J. Scheirer
27
8
0
15 Nov 2022
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
24
57
0
26 Jul 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
55
6
0
09 Jul 2022
Are metrics measuring what they should? An evaluation of image captioning task metrics
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
21
9
0
04 Jul 2022
Automatic Generation of Product-Image Sequence in E-commerce
Xiaochuan Fan
Chi Zhang
Yong-Jie Yang
Yue Shang
Xueying Zhang
Zhen He
Yun Xiao
Bo Long
Lingfei Wu
28
4
0
26 Jun 2022
SYMBA: Symbolic Computation of Squared Amplitudes in High Energy Physics with Machine Learning
Abdulhakim Alnuqaydan
S. Gleyzer
Harrison B. Prosper
25
14
0
17 Jun 2022
Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection
Chao Zeng
Sam Kwong
ViT
32
25
0
07 Jun 2022
Causal Transformer for Estimating Counterfactual Outcomes
Valentyn Melnychuk
Dennis Frauen
Stefan Feuerriegel
CML
38
91
0
14 Apr 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
24
36
0
03 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
38
27
0
21 Feb 2022
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs
Daniel Louzada Fernandes
Marcos Henrique Fonseca Ribeiro
F. Cerqueira
Michel Melo Silva
16
6
0
10 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
25
89
0
31 Jan 2022
RelTR: Relation Transformer for Scene Graph Generation
Yuren Cong
M. Yang
Bodo Rosenhahn
ViT
100
134
0
27 Jan 2022
ClipCap: CLIP Prefix for Image Captioning
Ron Mokady
Amir Hertz
Amit H. Bermano
CLIP
VLM
17
656
0
18 Nov 2021
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network
Yuansan Liu
MD Abdullah Al Nasim
Sourav Saha
Faria Afrin
Raisa Mallik
Sathishkumar Samiappan
ViT
14
11
0
24 Oct 2021
Partially-Supervised Novel Object Captioning Leveraging Context from Paired Data
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
35
1
0
10 Sep 2021
Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos
C. Nwoye
Tong Yu
Cristians Gonzalez
B. Seeliger
Pietro Mascagni
Didier Mutter
J. Marescaux
N. Padoy
39
128
0
07 Sep 2021
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
39
77
0
21 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
18
55
0
24 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
28
472
0
05 May 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
187
689
0
22 Apr 2021
A Survey on Multimodal Disinformation Detection
Firoj Alam
S. Cresci
Tanmoy Chakraborty
Fabrizio Silvestri
Dimiter Dimitrov
Giovanni Da San Martino
Shaden Shaar
Hamed Firooz
Preslav Nakov
18
98
0
13 Mar 2021
Remote Sensing Image Change Detection with Transformers
Hao Chen
Zipeng Qi
Zhenwei Shi
ViT
42
942
0
27 Feb 2021
1