Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06509
Cited By
Step-Wise Hierarchical Alignment Network for Image-Text Matching
11 June 2021
Zhong Ji
Kexin Chen
Haoran Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Step-Wise Hierarchical Alignment Network for Image-Text Matching"
17 / 17 papers shown
Title
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
484
0
0
21 Feb 2025
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
Hui Chen
Guiguang Ding
Xudong Liu
Zijia Lin
Ji Liu
Jungong Han
74
321
0
08 Mar 2020
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
76
211
0
11 Oct 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
70
304
0
12 Sep 2019
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction
Jingwen Wang
Lin Ma
Wenhao Jiang
76
182
0
11 Sep 2019
Episode-based Prototype Generating Network for Zero-Shot Learning
YunLong Yu
Zhong Ji
Zhongfei Zhang
Jungong Han
VLM
46
145
0
08 Sep 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
57
244
0
11 Jun 2019
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
86
67
0
03 Jun 2019
Unsupervised Image Captioning
Yang Feng
Lin Ma
Wei Liu
Jiebo Luo
VLM
SSL
74
202
0
27 Nov 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
84
1,151
0
21 Mar 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
121
4,216
0
25 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
95
667
0
02 Nov 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
202
5,478
0
03 May 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Lin Ma
Zhengdong Lu
Lifeng Shang
Hang Li
99
337
0
23 Apr 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
127
5,585
0
07 Dec 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
125
1,399
0
10 Nov 2014
1