ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.14080
  4. Cited By
X-Linear Attention Networks for Image Captioning

X-Linear Attention Networks for Image Captioning

31 March 2020
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
ArXivPDFHTML

Papers citing "X-Linear Attention Networks for Image Captioning"

50 / 79 papers shown
Title
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
45
0
0
03 Apr 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLM
BDL
VLM
85
0
0
18 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
82
0
0
03 Mar 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
53
0
0
03 Jan 2025
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
207
1
0
19 Nov 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
39
1
0
16 Jul 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
54
10
0
12 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
39
14
0
06 Mar 2024
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
38
2
0
21 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
42
6
0
02 Aug 2023
Multimodal Prompt Learning for Product Title Generation with Extremely
  Limited Labels
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
36
5
0
05 Jul 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
37
21
0
25 May 2023
A request for clarity over the End of Sequence token in the
  Self-Critical Sequence Training
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
32
6
0
20 May 2023
Comparative study of Transformer and LSTM Network with attention
  mechanism on Image Captioning
Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning
Pranav Dandwate
Chaitanya Shahane
V. Jagtap
Shridevi C. Karande
14
8
0
05 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based
  Polishing
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
19
33
0
04 Mar 2023
Towards Local Visual Modeling for Image Captioning
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Rongrong Ji
ViT
21
71
0
13 Feb 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Erik Cambria
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
21
52
0
06 Jan 2023
Pearl Causal Hierarchy on Image Data: Intricacies & Challenges
Pearl Causal Hierarchy on Image Data: Intricacies & Challenges
Matej Zečević
Moritz Willig
Devendra Singh Dhami
Kristian Kersting
29
0
0
23 Dec 2022
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for
  Surgical Scene Understanding
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding
Mengya Xu
Mobarakol Islam
Ben Glocker
Hongliang Ren
31
1
0
22 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
30
62
0
06 Dec 2022
Controllable Image Captioning via Prompting
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
22
23
0
04 Dec 2022
Multilingual Communication System with Deaf Individuals Utilizing
  Natural and Visual Languages
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
15
0
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
18
10
0
30 Nov 2022
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu
Chuanxia Zheng
Heliang Zheng
Tat-Jen Cham
Chaoyue Wang
Zuopeng Yang
Dacheng Tao
Ponnuthurai Nagaratnam Suganthan
DiffM
20
23
0
27 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach
  to Cross-Modal Sarcasm Generation
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
29
1
0
20 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content
  Dilutions
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions
Gaurav Verma
Vishwa Vinay
Ryan A. Rossi
Srijan Kumar
VLM
AAML
11
8
0
04 Nov 2022
Prophet Attention: Predicting Attention with Future Attention for Image
  Captioning
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
24
46
0
19 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
40
10
0
04 Oct 2022
Word to Sentence Visual Semantic Similarity for Caption Generation:
  Lessons Learned
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned
Ahmed Sabir
19
0
0
26 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning
  in Wikipedia
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
36
10
0
21 Sep 2022
Every picture tells a story: Image-grounded controllable stylistic story
  generation
Every picture tells a story: Image-grounded controllable stylistic story generation
Holy Lovenia
Bryan Wilie
Romain Barraud
Samuel Cahyawijaya
Willy Chung
Pascale Fung
26
8
0
04 Sep 2022
A Medical Semantic-Assisted Transformer for Radiographic Report
  Generation
A Medical Semantic-Assisted Transformer for Radiographic Report Generation
Zhanyu Wang
Mingkang Tang
Lei Wang
Xiu Li
Luping Zhou
ViT
MedIm
24
57
0
22 Aug 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual
  Features
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
36
106
0
20 Jul 2022
Towards the Human Global Context: Does the Vision-Language Model Really
  Judge Like a Human Being?
Towards the Human Global Context: Does the Vision-Language Model Really Judge Like a Human Being?
Sangmyeong Woh
Jaemin Lee
Hoki Kim
Jinsuk Lee
21
0
0
18 Jul 2022
Dual Vision Transformer
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
148
75
0
11 Jul 2022
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical
  Expression Recognition
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
Wenqi Zhao
Liang Gao
ViT
19
27
0
10 Jul 2022
Exploring the sequence length bottleneck in the Transformer for Image
  Captioning
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
38
3
0
07 Jul 2022
Competence-based Multimodal Curriculum Learning for Medical Report
  Generation
Competence-based Multimodal Curriculum Learning for Medical Report Generation
Fenglin Liu
Shen Ge
Yuexian Zou
Xian Wu
MedIm
25
131
0
24 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
26
88
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
528
0
13 Jun 2022
Exploring Structure-aware Transformer over Interaction Proposals for
  Human-Object Interaction Detection
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection
Y. Zhang
Yingwei Pan
Ting Yao
Rui Huang
Tao Mei
C. Chen
ViT
32
68
0
13 Jun 2022
Prompt-based Learning for Unpaired Image Captioning
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chia-Ju Chen
VLM
25
31
0
26 May 2022
End-to-End Transformer Based Model for Image Captioning
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang
Jungang Xu
Yingfei Sun
VLM
ViT
26
117
0
29 Mar 2022
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
  Tags for Medical Report Generation
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation
Di You
Fenglin Liu
Shen Ge
Xiaoxia Xie
Jing Zhang
Xian Wu
ViT
MedIm
26
107
0
18 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual
  Storytelling
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
24
9
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
24
67
0
09 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
38
27
0
21 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
29
15
0
11 Feb 2022
12
Next