ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.05612
  4. Cited By
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
v1v2v3v4 (latest)

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

18 July 2017
Fartash Faghri
David J. Fleet
J. Kiros
Sanja Fidler
    VLM
ArXiv (abs)PDFHTML

Papers citing "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

50 / 77 papers shown
Title
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text
  Retrieval
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Yuan. Yuan
Yangfan Zhan
Zhitong Xiong
VLM
87
47
0
24 Aug 2023
HomE: Homography-Equivariant Video Representation Learning
HomE: Homography-Equivariant Video Representation Learning
Anirudh Sriram
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
L. Fei-Fei
Ehsan Adeli
SSLAI4TS
73
2
0
02 Jun 2023
ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Hongchen Tan
Baocai Yin
Kun Wei
Xiuping Liu
Xin Li
56
18
0
13 Apr 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of
  Wikipedia Entities
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
124
61
0
22 Feb 2023
Test-Time Distribution Normalization for Contrastively Learned
  Vision-language Models
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
98
15
0
22 Feb 2023
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
78
15
0
16 Dec 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
  Image-Text Retrieval
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
YunLong Yu
Zhong Ji
Errui Ding
Jingdong Wang
64
23
0
21 Aug 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo
  and Text
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
94
30
0
25 Apr 2022
Visually Grounded Concept Composition
Visually Grounded Concept Composition
Bowen Zhang
Hexiang Hu
Linlu Qiu
Peter Shaw
Fei Sha
CoGe
122
6
0
29 Sep 2021
Adversarial Text-to-Image Synthesis: A Review
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
80
178
0
25 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
307
347
0
05 Jan 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjDVLM
340
158
0
02 Jan 2021
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with
  Adversarial Discriminative Domain Regularization
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
Li Ren
Keqin Li
Liqiang Wang
K. Hua
44
4
0
23 Oct 2020
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
Niluthpol Chowdhury Mithun
Karan Sikka
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
64
16
0
12 Sep 2020
Dual Encoding for Video Retrieval by Text
Dual Encoding for Video Retrieval by Text
Jianfeng Dong
Xirong Li
Chaoxi Xu
Xun Yang
Gang Yang
Xun Wang
Meng Wang
82
2
0
10 Sep 2020
Weakly supervised cross-domain alignment with optimal transport
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
60
7
0
14 Aug 2020
Incomplete Descriptor Mining with Elastic Loss for Person
  Re-Identification
Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification
Hongchen Tan
Yuhao Bian
Huasheng Wang
Xiuping Liu
Baocai Yin
124
71
0
10 Aug 2020
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Jie Wu
Tianshui Chen
Hefeng Wu
Zhi Yang
Guangchun Luo
Liang Lin
70
59
0
21 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
Graph Optimal Transport for Cross-Domain Alignment
Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen
Zhe Gan
Yu Cheng
Linjie Li
Lawrence Carin
Jingjing Liu
OT
115
152
0
26 Jun 2020
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring
  Sequential Events Detection for Dense Video Captioning
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
30
6
0
14 Jun 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
201
1,953
0
13 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
90
126
0
01 Apr 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
232
70
0
07 Mar 2020
Adversarial Ranking Attack and Defense
Adversarial Ranking Attack and Defense
Mo Zhou
Zhenxing Niu
Le Wang
Qilin Zhang
G. Hua
150
39
0
26 Feb 2020
Expressing Objects just like Words: Recurrent Visual Embedding for
  Image-Text Matching
Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching
Tianlang Chen
Jiebo Luo
67
69
0
20 Feb 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
63
67
0
15 Jan 2020
MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
Geondo Park
Chihye Han
Wonjun Yoon
Dae-Shik Kim
19
18
0
11 Jan 2020
Semi-supervised Visual Feature Integration for Pre-trained Language
  Models
Semi-supervised Visual Feature Integration for Pre-trained Language Models
Lisai Zhang
Qingcai Chen
Dongfang Li
Buzhou Tang
VLM
25
1
0
01 Dec 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image
  Sequences
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
75
33
0
24 Nov 2019
HUSE: Hierarchical Universal Semantic Embeddings
HUSE: Hierarchical Universal Semantic Embeddings
P. Narayana
Aniket Pednekar
A. Krishnamoorthy
Kazoo Sone
Sugato Basu
47
10
0
14 Nov 2019
Bootstrapping Disjoint Datasets for Multilingual Multimodal
  Representation Learning
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Ákos Kádár
Grzegorz Chrupała
Afra Alishahi
Desmond Elliott
116
1
0
09 Nov 2019
Integrating Temporal and Spatial Attentions for VATEX Video Captioning
  Challenge 2019
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Shizhe Chen
Yida Zhao
Yuqing Song
Qin Jin
Qi Wu
25
0
0
15 Oct 2019
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task
Alireza Mohammadshahi
R. Lebret
Karl Aberer
36
11
0
08 Oct 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
67
37
0
22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
105
309
0
12 Sep 2019
Language learning using Speech to Image retrieval
Language learning using Speech to Image retrieval
Danny Merkx
S. Frank
M. Ernestus
57
43
0
09 Sep 2019
Adversarial Representation Learning for Text-to-Image Matching
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
117
188
0
28 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
Alex Schwing
Derek Hoiem
65
25
0
22 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
221
907
0
16 Aug 2019
Unpaired Cross-lingual Image Caption Generation with Self-Supervised
  Rewards
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
SSL
46
41
0
15 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
91
391
0
31 Jul 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text
  matching
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
121
1
0
17 Jun 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
72
247
0
11 Jun 2019
Beyond Visual Semantics: Exploring the Role of Scene Text in Image
  Understanding
Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding
Arka Ujjal Dey
Suman K. Ghosh
Ernest Valveny
Gaurav Harit
51
23
0
25 May 2019
Deep Unified Multimodal Embeddings for Understanding both Content and
  Users in Social Media Networks
Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks
Karan Sikka
Lucas Van Bramer
Ajay Divakaran
85
2
0
17 May 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
69
89
0
20 Apr 2019
SoDeep: a Sorting Deep net to learn ranking loss surrogates
SoDeep: a Sorting Deep net to learn ranking loss surrogates
Martin Engilberge
Louis Chevallier
P. Pérez
Matthieu Cord
58
65
0
08 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
140
194
0
05 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
102
104
0
27 Mar 2019
12
Next