ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.07391
  4. Cited By
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image
  Retrieval

Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval

16 May 2021
K. Ueki
ArXivPDFHTML

Papers citing "Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval"

26 / 26 papers shown
Title
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
553
28,659
0
26 Feb 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
242
2,463
0
04 Jan 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
398
41,106
0
28 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
58
1,927
0
13 Apr 2020
Graph Structured Network for Image-Text Matching
Graph Structured Network for Image-Text Matching
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
34
235
0
01 Apr 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
58
260
0
22 Jan 2020
Adversarial Representation Learning for Text-to-Image Matching
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
63
186
0
28 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
180
898
0
16 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
178
3,659
0
06 Aug 2019
Position Focused Attention Network for Image-Text Matching
Position Focused Attention Network for Image-Text Matching
Yaxiong Wang
Hao-Hsiang Yang
Xueming Qian
Lin Ma
Jing Lu
Biao Li
Xin Fan
13
171
0
23 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
819
93,936
0
11 Oct 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
54
1,148
0
21 Mar 2018
Learning Semantic Concepts and Order for Image and Sentence Matching
Learning Semantic Concepts and Order for Image and Sentence Matching
Yan Huang
Qi Wu
Liang Wang
VLM
27
303
0
06 Dec 2017
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval
  with Generative Models
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Jiuxiang Gu
Jianfei Cai
Shafiq Joty
Li Niu
G. Wang
VLM
45
361
0
17 Nov 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
95
4,201
0
25 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
331
129,831
0
12 Jun 2017
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
65
666
0
02 Nov 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
154
5,706
0
23 Feb 2016
Learning Deep Structure-Preserving Image-Text Embeddings
Learning Deep Structure-Preserving Image-Text Embeddings
Liwei Wang
Yin Li
Svetlana Lazebnik
60
782
0
19 Nov 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
140
2,461
0
01 Apr 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
104
1,237
0
20 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
49
5,569
0
07 Dec 2014
Show and Tell: A Neural Image Caption Generator
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
177
6,009
0
17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language
  Models
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
70
1,395
0
10 Nov 2014
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
A. Karpathy
Armand Joulin
Li Fei-Fei
VLM
53
935
0
22 Jun 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
212
43,290
0
01 May 2014
1