Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.07073
Cited By
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
16 August 2021
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration"
30 / 30 papers shown
Title
Anatomy-Aware Conditional Image-Text Retrieval
Meng Zheng
Jiajin Zhang
Benjamin Planche
Zhongpai Gao
Terrence Chen
Ziyan Wu
MedIm
54
0
0
10 Mar 2025
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
47
0
0
24 Dec 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features
Xin Wei
Yaling Tao
Changde Du
Gangming Zhao
Yizhou Yu
Jinpeng Li
33
0
0
24 Sep 2024
Large Model Based Referring Camouflaged Object Detection
Shupeng Cheng
Ge-Peng Ji
Pengda Qin
Deng-Ping Fan
Bowen Zhou
Peng-Tao Xu
ObjD
29
8
0
28 Nov 2023
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
20
1
0
15 May 2023
ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Zhou Yu
Lixiang Zheng
Zhou Zhao
A. Fedoseev
Jianping Fan
Kui Ren
Jun Yu
CoGe
40
13
0
04 May 2023
Click-Feedback Retrieval
Zeyu Wang
Yuehua Wu
27
0
0
28 Apr 2023
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining
Bingqian Lin
Zicong Chen
Mingjie Li
Haokun Lin
Hang Xu
...
Ling-Hao Chen
Xiaojun Chang
Yi Yang
L. Xing
Xiaodan Liang
LM&MA
MedIm
AI4CE
40
14
0
26 Apr 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Zhao Cao
VLM
25
1
0
09 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
31
202
0
20 Feb 2023
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology
Chaoyi Wu
Xiaoman Zhang
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
LM&MA
VLM
30
109
0
05 Jan 2023
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment
Rohan Pandey
Rulin Shao
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
26
12
0
20 Dec 2022
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
38
13
0
19 Nov 2022
FairCLIP: Social Bias Elimination based on Attribute Prototype Learning and Representation Neutralization
Junyan Wang
Yi Zhang
Jitao Sang
FaML
VLM
34
22
0
26 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
22
4
0
18 Oct 2022
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
Zhihong Chen
Guanbin Li
Xiang Wan
124
65
0
15 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
24
27
0
29 Aug 2022
Multi-Attention Network for Compressed Video Referring Object Segmentation
Weidong Chen
Dexiang Hong
Yuankai Qi
Zhenjun Han
Shuhui Wang
Laiyun Qing
Qingming Huang
Guorong Li
VOS
20
35
0
26 Jul 2022
Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models
Yi Zhang
Junyan Wang
Jitao Sang
19
27
0
03 Jul 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
43
64
0
17 Jun 2022
A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Zhihao Fan
Zhongyu Wei
Jingjing Chen
Siyuan Wang
Zejun Li
Jiarong Xu
Xuanjing Huang
CLL
9
6
0
11 Jun 2022
Image-text Retrieval: A Survey on Recent Research and Development
Min Cao
Shiping Li
Juntao Li
Liqiang Nie
Min Zhang
21
82
0
28 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
33
4
0
24 Mar 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
28
154
0
11 Feb 2022
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li
Zhihao Fan
Huaixiao Tou
Jingjing Chen
Zhongyu Wei
Xuanjing Huang
20
16
0
29 Jan 2022
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
24
12
0
17 Nov 2021
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Wang
Zhi Wang
Wenwu Zhu
30
47
0
16 Sep 2021
K-BERT: Enabling Language Representation with Knowledge Graph
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Qi Ju
Haotang Deng
Ping Wang
231
778
0
17 Sep 2019
1