ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08024
  4. Cited By
Stacked Cross Attention for Image-Text Matching

Stacked Cross Attention for Image-Text Matching

21 March 2018
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
ArXivPDFHTML

Papers citing "Stacked Cross Attention for Image-Text Matching"

50 / 159 papers shown
Title
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
L. Yang
W. Zhang
Quan Z. Sheng
Weitong Chen
L. Yao
Weitong Chen
A. Shakeri
26
0
0
11 May 2025
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
Zehong Ma
Hao Chen
Wei Zeng
Limin Su
Shiliang Zhang
AI4TS
35
0
0
10 Apr 2025
VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
Zizhi Chen
Minghao Han
Xukun Zhang
Shuwei Ma
Tao Liu
Xing Wei
Li Zhang
44
0
0
25 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
52
0
0
13 Mar 2025
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning
Quanxing Zha
Xin Liu
Shu-Juan Peng
Y. Cheung
X. Xu
Nannan Wang
50
0
0
13 Mar 2025
A Light Perspective for 3D Object Detection
M. E. Pederiva
J. M. D. Martino
A. Zimmer
3DPC
55
0
0
10 Mar 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
171
0
0
21 Feb 2025
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
Tianyi Shang
Zhenyu Li
Pengjie Xu
Jinwei Qiao
Gang Chen
Zihan Ruan
Weijun Hu
59
0
0
20 Feb 2025
MASS: Overcoming Language Bias in Image-Text Matching
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
32
0
0
20 Jan 2025
TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval
TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval
Shuai Lyu
Zijing Tian
Zhonghong Ou
Yifan Zhu
Xiao Zhang
Qiankun Ha
Haoran Luo
Meina Song
37
0
0
19 Jan 2025
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
Yaxiong Wang
Y. Wang
Lianwei Wu
Lechao Cheng
Zhun Zhong
Meng Wang
VLM
30
0
0
23 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide
  Image Analysis
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
41
4
0
18 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
ComAlign: Compositional Alignment in Vision-Language Models
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
30
0
0
12 Sep 2024
Towards Deconfounded Image-Text Matching with Causal Inference
Towards Deconfounded Image-Text Matching with Causal Inference
Wenhui Li
Xinqi Su
Dan Song
Lanjun Wang
Kun Zhang
An-An Liu
BDL
CML
47
10
0
22 Aug 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue
  Understanding with Large Language Models
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
23
0
0
04 Jul 2024
GeoMFormer: A General Architecture for Geometric Molecular
  Representation Learning
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
38
5
0
24 Jun 2024
Composing Object Relations and Attributes for Image-Text Matching
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham
Chuong Huynh
Ser-Nam Lim
Abhinav Shrivastava
CoGe
41
3
0
17 Jun 2024
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing
  Image-Text Retrieval
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
Rui Yang
Shuang Wang
Yi Han
Yuanheng Li
Dong Zhao
Dou Quan
Yanhe Guo
Licheng Jiao
63
3
0
29 May 2024
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu
Andy Chia-Hao Chang
Chieh-Yu Chuang
Chun-Pei Chen
Yu-Lun Liu
Min-Hung Chen
Hou-Ning Hu
Yung-Yu Chuang
Yen-Yu Lin
VLM
46
9
0
05 Apr 2024
Multiscale Matching Driven by Cross-Modal Similarity Consistency for
  Audio-Text Retrieval
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval
Qian Wang
Jia-Chen Gu
Zhen-Hua Ling
35
2
0
15 Mar 2024
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained
  Ship Classification
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
Long Lan
Fengxiang Wang
Shuyan Li
Xiangtao Zheng
Zengmao Wang
Xinwang Liu
VLM
31
7
0
13 Mar 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual
  Context Extraction
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
38
2
0
28 Feb 2024
Continual Referring Expression Comprehension via Dual Modular
  Memorization
Continual Referring Expression Comprehension via Dual Modular Memorization
Hengtao Shen
Cheng Chen
Peng Wang
Lianli Gao
Hao Wu
Jingkuan Song
ObjD
25
3
0
25 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
35
4
0
21 Nov 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCL
VLM
24
3
0
26 Sep 2023
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Yunshui Li
Binyuan Hui
Zhaochao Yin
Wanwei He
Run Luo
Yuxing Long
Min Yang
Fei Huang
Yongbin Li
24
1
0
14 Sep 2023
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised
  Fine-Grained Alignment
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
Jiamin Zhuang
Jing Yu
Yang Ding
Xiangyang Qu
Yue Hu
32
9
0
27 Aug 2023
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document
  Image Classification
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
10
6
0
11 May 2023
Building Multimodal AI Chatbots
Building Multimodal AI Chatbots
Mingyu Lee
29
3
0
21 Apr 2023
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and
  Numerical Data Streams
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams
M. Tavakoli
Rohitash Chandra
Fengrui Tian
Cristián Bravo
21
8
0
21 Apr 2023
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval
Yang Yang
Zhongtian Fu
Xiangyu Wu
Wenjie Li
VLM
21
1
0
15 Apr 2023
Noisy Correspondence Learning with Meta Similarity Correction
Noisy Correspondence Learning with Meta Similarity Correction
Haocheng Han
Kaiyao Miao
Qinghua Zheng
Minnan Luo
32
28
0
13 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
16
1
0
10 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
40
7
0
29 Mar 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media
  Multimodal Classification
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
20
5
0
27 Mar 2023
LIMITR: Leveraging Local Information for Medical Image-Text
  Representation
LIMITR: Leveraging Local Information for Medical Image-Text Representation
Gefen Dawidowicz
Elad Hirsch
A. Tal
28
15
0
21 Mar 2023
Mining False Positive Examples for Text-Based Person Re-identification
Mining False Positive Examples for Text-Based Person Re-identification
Wenhao Xu
Zhiyin Shao
Changxing Ding
22
4
0
15 Mar 2023
The style transformer with common knowledge optimization for image-text
  retrieval
The style transformer with common knowledge optimization for image-text retrieval
Wenrui Li
Zhengyu Ma
Jinqiao Shi
Xiaopeng Fan
ViT
30
5
0
01 Mar 2023
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
  Retrieval
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval
Yan Zhang
Zhong Ji
Dingrong Wang
Yanwei Pang
Xuelong Li
VLM
24
21
0
17 Jan 2023
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A
  Reproducibility Study
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
31
5
0
12 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
22
52
0
05 Jan 2023
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
42
15
0
16 Dec 2022
Using Multiple Instance Learning to Build Multimodal Representations
Using Multiple Instance Learning to Build Multimodal Representations
Peiqi Wang
W. Wells
Seth Berkowitz
Steven Horng
Polina Golland
SSL
24
6
0
11 Dec 2022
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim
Nam-Won Kim
Suha Kwak
24
37
0
30 Nov 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
Masked Vision-Language Transformer in Fashion
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Christos Sakaridis
Luc Van Gool
21
25
0
27 Oct 2022
Visual Semantic Parsing: From Images to Abstract Meaning Representation
Visual Semantic Parsing: From Images to Abstract Meaning Representation
M. A. Abdelsalam
Zhan Shi
Federico Fancellu
Kalliopi Basioti
Dhaivat Bhatt
Vladimir Pavlovic
Afsaneh Fazly
GNN
37
4
0
26 Oct 2022
Image-Text Retrieval with Binary and Continuous Label Supervision
Image-Text Retrieval with Binary and Continuous Label Supervision
Zheng Li
Caili Guo
Zerun Feng
Jenq-Neng Hwang
Ying Jin
Yufeng Zhang
VLM
25
4
0
20 Oct 2022
CLIP-Driven Fine-grained Text-Image Person Re-identification
CLIP-Driven Fine-grained Text-Image Person Re-identification
Shuanglin Yan
Neng Dong
Liyan Zhang
Jinhui Tang
39
87
0
19 Oct 2022
1234
Next