ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.10804
  4. Cited By
CPTR: Full Transformer Network for Image Captioning

CPTR: Full Transformer Network for Image Captioning

26 January 2021
Wei Liu
Sihan Chen
Longteng Guo
Xinxin Zhu
Jing Liu
    ViT
ArXivPDFHTML

Papers citing "CPTR: Full Transformer Network for Image Captioning"

50 / 50 papers shown
Title
SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation
SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation
Tudor Jianu
Shayan Doust
Mengyun Li
Baoru Huang
Tuong Khanh Long Do
...
Karl Bates
Tung D. Ta
S. Fichera
Pierre Berthet-Rayne
Anh Nguyen
MedIm
35
0
0
08 Jan 2025
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
37
0
0
09 Nov 2024
CoVLM: Leveraging Consensus from Vision-Language Models for
  Semi-supervised Multi-modal Fake News Detection
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Devank
Jayateja Kalla
Soma Biswas
34
1
0
06 Oct 2024
PaveCap: The First Multimodal Framework for Comprehensive Pavement
  Condition Assessment with Dense Captioning and PCI Estimation
PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation
Blessing Agyei Kyem
Eugene Kofi Okrah Denteh
Joshua Kofi Asamoah
Armstrong Aboah
16
2
0
07 Aug 2024
Figuring out Figures: Using Textual References to Caption Scientific
  Figures
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
42
0
0
25 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
32
10
0
21 May 2024
Generative Multi-modal Models are Good Class-Incremental Learners
Generative Multi-modal Models are Good Class-Incremental Learners
Xusheng Cao
Haori Lu
Linlan Huang
Xialei Liu
Ming-Ming Cheng
CLL
49
10
0
27 Mar 2024
Image Captioning in news report scenario
Image Captioning in news report scenario
Tianrui Liu
Qi Cai
Changxin Xu
Bo Hong
Jize Xiong
Yuxin Qiao
Tsungwei Yang
40
11
0
24 Mar 2024
CarbonNet: How Computer Vision Plays a Role in Climate Change?
  Application: Learning Geomechanics from Subsurface Geometry of CCS to
  Mitigate Global Warming
CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global Warming
Wei Chen
Yun Li
Yuan Tian
AI4CE
30
0
0
09 Mar 2024
Rule-driven News Captioning
Rule-driven News Captioning
Ning Xu
Tingting Zhang
Hongshuo Tian
An-An Liu
68
0
0
08 Mar 2024
Radiology Report Generation Using Transformers Conditioned with
  Non-imaging Data
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data
Nurbanu Aksoy
Nishant Ravikumar
Alejandro F Frangi
ViT
MedIm
19
8
0
18 Nov 2023
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
  Generation
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation
Nurbanu Aksoy
Serge Sharoff
Selçuk Başer
Nishant Ravikumar
Alejandro F Frangi
MedIm
19
4
0
18 Nov 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End
  3D Dense Captioning
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen
Erik Cambria
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
19
18
0
06 Sep 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
29
3
0
22 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
PRIOR: Prototype Representation Joint Learning from Medical Images and
  Reports
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng
Li Lin
Junyan Lyu
Yijin Huang
Wenhan Luo
Xiaoying Tang
MedIm
37
45
0
24 Jul 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via
  Word-Region Alignment
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
56
74
0
10 Apr 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
27
8
0
26 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
85
159
0
21 Mar 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
32
29
0
16 Feb 2023
Adjacent-Level Feature Cross-Fusion With 3-D CNN for Remote Sensing
  Image Change Detection
Adjacent-Level Feature Cross-Fusion With 3-D CNN for Remote Sensing Image Change Detection
Y. Ye
Mengmeng Wang
Liang Zhou
Guangyang Lei
Jianwei Fan
Yao Qin
3DPC
27
37
0
10 Feb 2023
Embodied Agents for Efficient Exploration and Smart Scene Description
Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
12
7
0
17 Jan 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Erik Cambria
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
21
52
0
06 Jan 2023
Exploring Efficient Few-shot Adaptation for Vision Transformers
Exploring Efficient Few-shot Adaptation for Vision Transformers
C. Xu
Siqian Yang
Yabiao Wang
Zhanxiong Wang
Yanwei Fu
Xiangyang Xue
35
16
0
06 Jan 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
88
0
0
05 Jan 2023
Using Human Perception to Regularize Transfer Learning
Using Human Perception to Regularize Transfer Learning
Justin Dulay
Walter J. Scheirer
27
8
0
15 Nov 2022
Retrieval-Augmented Transformer for Image Captioning
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
24
57
0
26 Jul 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
  Counterfactual Training for Robust Content-based Image Retrieval
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
55
6
0
09 Jul 2022
Are metrics measuring what they should? An evaluation of image
  captioning task metrics
Are metrics measuring what they should? An evaluation of image captioning task metrics
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
21
9
0
04 Jul 2022
Automatic Generation of Product-Image Sequence in E-commerce
Automatic Generation of Product-Image Sequence in E-commerce
Xiaochuan Fan
Chi Zhang
Yong-Jie Yang
Yue Shang
Xueying Zhang
Zhen He
Yun Xiao
Bo Long
Lingfei Wu
28
4
0
26 Jun 2022
SYMBA: Symbolic Computation of Squared Amplitudes in High Energy Physics
  with Machine Learning
SYMBA: Symbolic Computation of Squared Amplitudes in High Energy Physics with Machine Learning
Abdulhakim Alnuqaydan
S. Gleyzer
Harrison B. Prosper
28
14
0
17 Jun 2022
Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
  Object Detection
Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection
Chao Zeng
Sam Kwong
ViT
32
25
0
07 Jun 2022
Causal Transformer for Estimating Counterfactual Outcomes
Causal Transformer for Estimating Counterfactual Outcomes
Valentyn Melnychuk
Dennis Frauen
Stefan Feuerriegel
CML
38
91
0
14 Apr 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
24
36
0
03 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
38
27
0
21 Feb 2022
Describing image focused in cognitive and visual details for visually
  impaired people: An approach to generating inclusive paragraphs
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs
Daniel Louzada Fernandes
Marcos Henrique Fonseca Ribeiro
F. Cerqueira
Michel Melo Silva
16
6
0
10 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
25
89
0
31 Jan 2022
RelTR: Relation Transformer for Scene Graph Generation
RelTR: Relation Transformer for Scene Graph Generation
Yuren Cong
M. Yang
Bodo Rosenhahn
ViT
100
134
0
27 Jan 2022
ClipCap: CLIP Prefix for Image Captioning
ClipCap: CLIP Prefix for Image Captioning
Ron Mokady
Amir Hertz
Amit H. Bermano
CLIP
VLM
19
656
0
18 Nov 2021
Bangla Image Caption Generation through CNN-Transformer based
  Encoder-Decoder Network
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network
Yuansan Liu
MD Abdullah Al Nasim
Sourav Saha
Faria Afrin
Raisa Mallik
Sathishkumar Samiappan
ViT
16
11
0
24 Oct 2021
Partially-Supervised Novel Object Captioning Leveraging Context from
  Paired Data
Partially-Supervised Novel Object Captioning Leveraging Context from Paired Data
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
35
1
0
10 Sep 2021
Rendezvous: Attention Mechanisms for the Recognition of Surgical Action
  Triplets in Endoscopic Videos
Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos
C. Nwoye
Tong Yu
Cristians Gonzalez
B. Seeliger
Pietro Mascagni
Didier Mutter
J. Marescaux
N. Padoy
39
128
0
07 Sep 2021
Audio Captioning Transformer
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
39
77
0
21 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
18
55
0
24 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
28
473
0
05 May 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
ImageNet-21K Pretraining for the Masses
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
187
689
0
22 Apr 2021
A Survey on Multimodal Disinformation Detection
A Survey on Multimodal Disinformation Detection
Firoj Alam
S. Cresci
Tanmoy Chakraborty
Fabrizio Silvestri
Dimiter Dimitrov
Giovanni Da San Martino
Shaden Shaar
Hamed Firooz
Preslav Nakov
18
98
0
13 Mar 2021
Remote Sensing Image Change Detection with Transformers
Remote Sensing Image Change Detection with Transformers
Hao Chen
Zipeng Qi
Zhenwei Shi
ViT
45
942
0
27 Feb 2021
1