ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11740
  4. Cited By
UNITER: UNiversal Image-TExt Representation Learning

UNITER: UNiversal Image-TExt Representation Learning

25 September 2019
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
    VLM
    OT
ArXivPDFHTML

Papers citing "UNITER: UNiversal Image-TExt Representation Learning"

50 / 124 papers shown
Title
Subject Information Extraction for Novelty Detection with Domain Shifts
Subject Information Extraction for Novelty Detection with Domain Shifts
Yangyang Qu
Dazhi Fu
Jicong Fan
OOD
59
0
0
30 Apr 2025
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning
Yassir Benhammou
Alessandro Tiberio
Gabriel Trautmann
Suman Kalyan
MLLM
VLM
46
0
0
21 Apr 2025
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for
  Referring Image Segmentation
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Zunnan Xu
Zhihong Chen
Yong Zhang
Yibing Song
Xiang Wan
Guanbin Li
VLM
35
47
0
21 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
28
5
0
06 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
30
3
0
03 Jul 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
33
8
0
24 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
46
13
0
10 May 2023
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval
Yang Yang
Zhongtian Fu
Xiangyu Wu
Wenjie Li
VLM
21
1
0
15 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
16
1
0
10 Apr 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
49
12
0
23 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
27
47
0
22 Mar 2023
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection
Linhao Zhang
Li Jin
Xian Sun
Guangluan Xu
Zequn Zhang
Xiaoyu Li
Nayu Liu
Qing Liu
Shiyao Yan
41
7
0
27 Feb 2023
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image
  Retrieval
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
Kuniaki Saito
Kihyuk Sohn
Xiang Zhang
Chun-Liang Li
Chen-Yu Lee
Kate Saenko
Tomas Pfister
30
108
0
06 Feb 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual
  Loss
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
26
7
0
18 Jan 2023
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A
  Reproducibility Study
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
31
5
0
12 Jan 2023
Position-guided Text Prompt for Vision-Language Pre-training
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
24
37
0
19 Dec 2022
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for
  Referring Image Segmentation
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
Zicheng Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
Wei Ke
33
29
0
04 Dec 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and
  Grounding
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
25
0
28 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
24
9
0
21 Nov 2022
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual
  Question Answering
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering
Yao Zhang
Haokun Chen
A. Frikha
Yezi Yang
Denis Krompass
Gengyuan Zhang
Jindong Gu
Volker Tresp
VLM
LRM
16
7
0
19 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
51
101
0
15 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space
  Alignment
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
31
9
0
14 Nov 2022
MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual
  Question Answering
MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering
Shanshan Song
Jiangyun Li
Junchang Wang
Yuan Cai
Wenkai Dong
11
0
0
11 Nov 2022
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary
  Object Detection
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Yanxin Long
Jianhua Han
Runhu Huang
Xu Hang
Yi Zhu
Chunjing Xu
Xiaodan Liang
VLM
ObjD
32
18
0
02 Nov 2022
MetaFormer Baselines for Vision
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
40
156
0
24 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
29
7
0
19 Oct 2022
Contrastive Language-Image Pre-Training with Knowledge Graphs
Contrastive Language-Image Pre-Training with Knowledge Graphs
Xuran Pan
Tianzhu Ye
Dongchen Han
S. Song
Gao Huang
VLM
CLIP
27
43
0
17 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual
  Representation Learning
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang
Yuyin Zhou
Shujun Wang
V. Vardhanabhuti
Lequan Yu
29
137
0
12 Oct 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
51
28
0
28 Sep 2022
LOViS: Learning Orientation and Visual Signals for Vision and Language
  Navigation
LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation
Yue Zhang
Parisa Kordjamshidi
33
11
0
26 Sep 2022
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of
  Vision-Language Models
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Felix Vogel
Nina Shvetsova
Leonid Karlinsky
Hilde Kuehne
VLM
63
7
0
12 Sep 2022
Disentangle and Remerge: Interventional Knowledge Distillation for
  Few-Shot Object Detection from A Conditional Causal Perspective
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective
Jiangmeng Li
Yanan Zhang
Wenwen Qiang
Hui Xiong
Chengbo Jiao
Xiaohui Hu
Changwen Zheng
Gang Hua
CML
34
28
0
26 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
Vision-Language Matching for Text-to-Image Synthesis via Generative
  Adversarial Networks
Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks
Qingrong Cheng
Keyu Wen
X. Gu
VLM
EGVM
32
16
0
20 Aug 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
24
48
0
26 Jul 2022
Don't Stop Learning: Towards Continual Learning for the CLIP Model
Don't Stop Learning: Towards Continual Learning for the CLIP Model
Yuxuan Ding
Lingqiao Liu
Chunna Tian
Jingyuan Yang
Haoxuan Ding
CLL
VLM
KELM
24
51
0
19 Jul 2022
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge
  Transfer
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Bo Ren
Shutao Xia
VLM
75
49
0
05 Jul 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
34
43
0
17 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
110
0
07 Jun 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
  Spreading Out Disinformation
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
21
12
0
25 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case
  Study in Self-Rationalization
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
18
4
0
24 May 2022
All You May Need for VQA are Image Captions
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
32
70
0
04 May 2022
Detection of Propaganda Techniques in Visuo-Lingual Metaphor in Memes
Detection of Propaganda Techniques in Visuo-Lingual Metaphor in Memes
Sunil Gundapu
R. Mamidi
20
2
0
03 May 2022
Leaner and Faster: Two-Stage Model Compression for Lightweight
  Text-Image Retrieval
Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren
Kenny Q. Zhu
VLM
27
7
0
29 Apr 2022
Training and challenging models for text-guided fashion image retrieval
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
18
8
0
23 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
30
0
0
17 Apr 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
29
74
0
18 Mar 2022
Language Matters: A Weakly Supervised Vision-Language Pre-training
  Approach for Scene Text Detection and Spotting
Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
Chuhui Xue
Wenqing Zhang
Yu Hao
Shijian Lu
Philip Torr
Song Bai
VLM
40
31
0
08 Mar 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
192
499
0
22 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
28
154
0
11 Feb 2022
123
Next