ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07841
  4. Cited By
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning

Multimodal Transformer with Multi-View Visual Representation for Image Captioning

20 May 2019
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
    ViT
ArXivPDFHTML

Papers citing "Multimodal Transformer with Multi-View Visual Representation for Image Captioning"

35 / 35 papers shown
Title
BrainNetMLP: An Efficient and Effective Baseline for Functional Brain Network Classification
BrainNetMLP: An Efficient and Effective Baseline for Functional Brain Network Classification
Jiacheng Hou
Zhenjie Song
Ercan Engin Kuruoglu
4
0
0
14 May 2025
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
Md Safwan Mondal
S. Ramasamy
Luca Russo
James D. Humann
James M. Dotterweich
Pranav A. Bhounsule
41
0
0
29 Apr 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
29
0
0
23 Apr 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
50
3
0
28 Jan 2025
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
54
10
0
12 Mar 2024
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
45
15
0
13 Nov 2023
LOIS: Looking Out of Instance Semantics for Visual Question Answering
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
25
4
0
26 Jul 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
26
5
0
28 May 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
Brandon Birmingham
A. Muscat
27
1
0
07 Feb 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open
  Vocabulary Instance Segmentation
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
89
31
0
02 Jan 2023
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
Wei Ji
Long Chen
Yin-wei Wei
Yiming Wu
Tat-Seng Chua
AI4TS
35
18
0
26 Dec 2022
Understanding and Mitigating Overfitting in Prompt Tuning for
  Vision-Language Models
Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Cheng Ma
Yang Liu
Jiankang Deng
Lingxi Xie
Weiming Dong
Changsheng Xu
VLM
VPVLM
43
44
0
04 Nov 2022
Multimodal Transformer for Parallel Concatenated Variational
  Autoencoders
Multimodal Transformer for Parallel Concatenated Variational Autoencoders
Stephen D. Liang
J. Mendel
ViT
27
5
0
28 Oct 2022
Multi-Attention Network for Compressed Video Referring Object
  Segmentation
Multi-Attention Network for Compressed Video Referring Object Segmentation
Weidong Chen
Dexiang Hong
Yuankai Qi
Zhenjun Han
Shuhui Wang
Laiyun Qing
Qingming Huang
Guorong Li
VOS
20
35
0
26 Jul 2022
MedFuse: Multi-modal fusion with clinical time-series data and chest
  X-ray images
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Nasir Hayat
Krzysztof J. Geras
Farah E. Shamout
MedIm
27
40
0
14 Jul 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
  Counterfactual Training for Robust Content-based Image Retrieval
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
55
6
0
09 Jul 2022
Deep Relation Learning for Regression and Its Application to Brain Age
  Estimation
Deep Relation Learning for Regression and Its Application to Brain Age Estimation
Sheng He
Yanfang Feng
P. E. Grant
Yangming Ou
3DH
38
30
0
13 Apr 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
29
15
0
11 Feb 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For
  UAV-View Geo-Localization
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
Ming Dai
Jian Hu
Jiedong Zhuang
E. Zheng
ViT
45
112
0
23 Jan 2022
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and
  Unpaired Text-based Image Captioning
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
26
46
0
13 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
35
45
0
29 Nov 2021
All-In-One: Artificial Association Neural Networks
All-In-One: Artificial Association Neural Networks
Seokjun Kim
Jaeeun Jang
Hyeoncheol Kim
32
0
0
31 Oct 2021
Deep Reinforcement Learning for Solving the Heterogeneous Capacitated
  Vehicle Routing Problem
Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem
Jingwen Li
Yining Ma
Ruize Gao
Zhiguang Cao
Andrew Lim
Wen Song
Jie Zhang
129
115
0
06 Oct 2021
Multimodality in Meta-Learning: A Comprehensive Survey
Multimodality in Meta-Learning: A Comprehensive Survey
Yao Ma
Shilin Zhao
Weixiao Wang
Yaoman Li
Irwin King
50
53
0
28 Sep 2021
VisGraphNet: a complex network interpretation of convolutional neural
  features
VisGraphNet: a complex network interpretation of convolutional neural features
J. Florindo
Young-Sup Lee
Kyungkoo Jun
Gwanggil Jeon
M. Albertini
FAtt
GNN
15
14
0
27 Aug 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
Learning Emergent Discrete Message Communication for Cooperative
  Reinforcement Learning
Learning Emergent Discrete Message Communication for Cooperative Reinforcement Learning
Sheng Li
Yutai Zhou
R. Allen
Mykel J. Kochenderfer
34
13
0
24 Feb 2021
Comparative evaluation of CNN architectures for Image Caption Generation
Comparative evaluation of CNN architectures for Image Caption Generation
Sulabh Katiyar
S. Borgohain
19
24
0
23 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
299
1,084
0
17 Feb 2021
Multiresolution and Multimodal Speech Recognition with Transformers
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
25
29
0
29 Apr 2020
Normalized and Geometry-Aware Self-Attention Network for Image
  Captioning
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
135
189
0
19 Mar 2020
A Semantics-Assisted Video Captioning Model Trained with Scheduled
  Sampling
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
25
47
0
31 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
73
1,917
0
09 Aug 2019
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,464
0
06 Jun 2016
1