Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.14231
Cited By
Image Captioning through Image Transformer
29 April 2020
Sen He
Wentong Liao
Hamed R. Tavakoli
M. Yang
Bodo Rosenhahn
N. Pugeault
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Image Captioning through Image Transformer"
30 / 30 papers shown
Title
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
50
3
0
28 Jan 2025
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
39
0
0
09 Nov 2024
M
2
^2
2
PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLM
VLM
LRM
30
16
0
24 Sep 2024
Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
Dinh Phu Tran
Dao Duy Hung
Daeyoung Kim
SupR
40
0
0
23 Jul 2024
β
β
β
-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Alberto Solera-Rico
Carlos Sanmiguel Vila
Miguel Gómez-López
Yuning Wang
Abdulrahman Almashjary
Scott T. M. Dawson
Ricardo Vinuesa
DRL
16
74
0
07 Apr 2023
Towards Universal Vision-language Omni-supervised Segmentation
Bowen Dong
Jiaxi Gu
Jianhua Han
Hang Xu
W. Zuo
VLM
36
1
0
12 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
33
14
0
07 Mar 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
42
40
0
14 Feb 2023
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
14
18
0
27 Dec 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
35
1
0
10 Nov 2022
A Spatio-Temporal Attentive Network for Video-Based Crowd Counting
Marco Avvenuti
Marco Bongiovanni
Luca Ciampi
Fabrizio Falchi
Claudio Gennaro
Nicola Messina
31
9
0
24 Aug 2022
Iterative Scene Graph Generation
Siddhesh Khandelwal
Leonid Sigal
OCL
29
29
0
27 Jul 2022
The impact of memory on learning sequence-to-sequence tasks
Alireza Seif
S. Loos
Gennaro Tucci
É. Roldán
Sebastian Goldt
31
5
0
29 May 2022
BodyMap: Learning Full-Body Dense Correspondence Map
A. Ianina
N. Sarafianos
Yuanlu Xu
Ignacio Rocco
Tony Tung
3DH
30
14
0
18 May 2022
Controllable Image Captioning
Luka Maxwell
33
0
0
28 Apr 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
27
89
0
31 Jan 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
39
2
0
28 Dec 2021
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Karl Lowenmark
C. Taal
S. Schnabel
Marcus Liwicki
Fredrik Sandin
27
7
0
11 Dec 2021
Explaining Face Presentation Attack Detection Using Natural Language
H. Mirzaalian
Mohamed E. Hussein
L. Spinoulas
Jonathan May
Wael AbdAlmageed
CVBM
FAtt
AAML
36
5
0
08 Nov 2021
Bornon: Bengali Image Captioning with Transformer-based Deep learning approach
Faisal Muhammad Shah
Mayeesha Humaira
Md Abidur Rahman Khan Jim
Amit Saha Ami
Shimul Paul
29
17
0
11 Sep 2021
Journalistic Guidelines Aware News Image Captioning
Xuewen Yang
Svebor Karaman
Joel R. Tetreault
Alex Jaimes
16
27
0
07 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
22
1
0
04 Sep 2021
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
79
66
0
05 Aug 2021
ReFormer: The Relational Transformer for Image Captioning
Xuewen Yang
Yingru Liu
Xin Wang
ViT
17
55
0
29 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
22
122
0
26 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
255
0
14 Jul 2021
Exploring Dynamic Context for Multi-path Trajectory Prediction
Hao Cheng
Wentong Liao
Xuejiao Tang
M. Yang
Monika Sester
Bodo Rosenhahn
43
32
0
30 Oct 2020
Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks
Chiara Plizzari
Marco Cannici
Matteo Matteucci
ViT
MedIm
25
300
0
17 Aug 2020
AMENet: Attentive Maps Encoder Network for Trajectory Prediction
Hao Cheng
Wentong Liao
M. Yang
Bodo Rosenhahn
Monika Sester
36
45
0
15 Jun 2020
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
200
434
0
27 Mar 2018
1