ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.06350
  4. Cited By
VITR: Augmenting Vision Transformers with Relation-Focused Learning for
  Cross-Modal Information Retrieval

VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval

13 February 2023
Yansong Gong
Georgina Cosma
Axel Finke
    ViT
ArXivPDFHTML

Papers citing "VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval"

23 / 23 papers shown
Title
Universal Multimodal Representation for Language Understanding
Universal Multimodal Representation for Language Understanding
Zhuosheng Zhang
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Z. Li
Hai Zhao
SSL
55
22
0
09 Jan 2023
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
77
17
0
01 Jul 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
  Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
63
55
0
15 Apr 2022
Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Ran Chen
Hanli Wang
Lei Wang
Sam Kwong
33
9
0
10 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
490
4,324
0
28 Jan 2022
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
126
574
0
16 Dec 2021
CRIS: CLIP-Driven Referring Image Segmentation
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
98
369
0
30 Nov 2021
Multi-Level Visual Similarity Based Personalized Tourist Attraction
  Recommendation Using Geo-Tagged Photos
Multi-Level Visual Similarity Based Personalized Tourist Attraction Recommendation Using Geo-Tagged Photos
Ling Chen
Dandan Lyu
Shanshan Yu
Gencai Chen
48
10
0
17 Sep 2021
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIP
VLM
72
226
0
02 Aug 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
167
1,943
0
16 Jul 2021
Dual-stream Network for Visual Recognition
Dual-stream Network for Visual Recognition
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Peng Gao
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
53
66
0
31 May 2021
Similarity Reasoning and Filtration for Image-Text Matching
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
275
335
0
05 Jan 2021
Image-to-Image Retrieval by Learning Similarity between Scene Graphs
Image-to-Image Retrieval by Learning Similarity between Scene Graphs
Sangwoong Yoon
Woo-Young Kang
Sungwook Jeon
SeongEun Lee
C. Han
Jonghun Park
Eun-Sol Kim
3DH
66
43
0
29 Dec 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
61
377
0
30 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
56
494
0
11 Jun 2020
Graphical Contrastive Losses for Scene Graph Parsing
Graphical Contrastive Losses for Scene Graph Parsing
Ji Zhang
Kevin J. Shih
Ahmed Elgammal
Andrew Tao
Bryan Catanzaro
54
231
0
07 Mar 2019
Attention Models in Graphs: A Survey
Attention Models in Graphs: A Survey
J. B. Lee
Ryan A. Rossi
Sungchul Kim
Nesreen K. Ahmed
Eunyee Koh
GNN
55
162
0
20 Jul 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
74
1,151
0
21 Mar 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
111
4,208
0
25 Jul 2017
A simple neural network module for relational reasoning
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNN
NAI
150
1,612
0
05 Jun 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
285
2,367
0
20 Dec 2016
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
89
5,578
0
07 Dec 2014
Learning Phrase Representations using RNN Encoder-Decoder for
  Statistical Machine Translation
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
825
23,310
0
03 Jun 2014
1