Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.05612
Cited By
v1
v2
v3
v4 (latest)
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
18 July 2017
Fartash Faghri
David J. Fleet
J. Kiros
Sanja Fidler
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
50 / 77 papers shown
Title
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Yuan. Yuan
Yangfan Zhan
Zhitong Xiong
VLM
87
47
0
24 Aug 2023
HomE: Homography-Equivariant Video Representation Learning
Anirudh Sriram
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
L. Fei-Fei
Ehsan Adeli
SSL
AI4TS
73
2
0
02 Jun 2023
ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Hongchen Tan
Baocai Yin
Kun Wei
Xiuping Liu
Xin Li
56
18
0
13 Apr 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
124
61
0
22 Feb 2023
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
98
15
0
22 Feb 2023
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
78
15
0
16 Dec 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
YunLong Yu
Zhong Ji
Errui Ding
Jingdong Wang
64
23
0
21 Aug 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
94
30
0
25 Apr 2022
Visually Grounded Concept Composition
Bowen Zhang
Hexiang Hu
Linlu Qiu
Peter Shaw
Fei Sha
CoGe
122
6
0
29 Sep 2021
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
80
178
0
25 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
307
347
0
05 Jan 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
340
158
0
02 Jan 2021
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
Li Ren
Keqin Li
Liqiang Wang
K. Hua
44
4
0
23 Oct 2020
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
Niluthpol Chowdhury Mithun
Karan Sikka
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
64
16
0
12 Sep 2020
Dual Encoding for Video Retrieval by Text
Jianfeng Dong
Xirong Li
Chaoxi Xu
Xun Yang
Gang Yang
Xun Wang
Meng Wang
82
2
0
10 Sep 2020
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
60
7
0
14 Aug 2020
Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification
Hongchen Tan
Yuhao Bian
Huasheng Wang
Xiuping Liu
Baocai Yin
124
71
0
10 Aug 2020
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Jie Wu
Tianshui Chen
Hefeng Wu
Zhi Yang
Guangchun Luo
Liang Lin
70
59
0
21 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen
Zhe Gan
Yu Cheng
Linjie Li
Lawrence Carin
Jingjing Liu
OT
115
152
0
26 Jun 2020
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
30
6
0
14 Jun 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
201
1,953
0
13 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
90
126
0
01 Apr 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
232
70
0
07 Mar 2020
Adversarial Ranking Attack and Defense
Mo Zhou
Zhenxing Niu
Le Wang
Qilin Zhang
G. Hua
150
39
0
26 Feb 2020
Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching
Tianlang Chen
Jiebo Luo
67
69
0
20 Feb 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
63
67
0
15 Jan 2020
MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
Geondo Park
Chihye Han
Wonjun Yoon
Dae-Shik Kim
19
18
0
11 Jan 2020
Semi-supervised Visual Feature Integration for Pre-trained Language Models
Lisai Zhang
Qingcai Chen
Dongfang Li
Buzhou Tang
VLM
25
1
0
01 Dec 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
75
33
0
24 Nov 2019
HUSE: Hierarchical Universal Semantic Embeddings
P. Narayana
Aniket Pednekar
A. Krishnamoorthy
Kazoo Sone
Sugato Basu
47
10
0
14 Nov 2019
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Ákos Kádár
Grzegorz Chrupała
Afra Alishahi
Desmond Elliott
116
1
0
09 Nov 2019
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Shizhe Chen
Yida Zhao
Yuqing Song
Qin Jin
Qi Wu
25
0
0
15 Oct 2019
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task
Alireza Mohammadshahi
R. Lebret
Karl Aberer
36
11
0
08 Oct 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
67
37
0
22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
105
309
0
12 Sep 2019
Language learning using Speech to Image retrieval
Danny Merkx
S. Frank
M. Ernestus
57
43
0
09 Sep 2019
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
117
188
0
28 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
Alex Schwing
Derek Hoiem
65
25
0
22 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
221
907
0
16 Aug 2019
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
SSL
46
41
0
15 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
91
391
0
31 Jul 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
121
1
0
17 Jun 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
72
247
0
11 Jun 2019
Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding
Arka Ujjal Dey
Suman K. Ghosh
Ernest Valveny
Gaurav Harit
51
23
0
25 May 2019
Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks
Karan Sikka
Lucas Van Bramer
Ajay Divakaran
85
2
0
17 May 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
69
89
0
20 Apr 2019
SoDeep: a Sorting Deep net to learn ranking loss surrogates
Martin Engilberge
Louis Chevallier
P. Pérez
Matthieu Cord
58
65
0
08 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
140
194
0
05 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
102
104
0
27 Mar 2019
1
2
Next