Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16934
Cited By
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
30 June 2020
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph"
50 / 208 papers shown
Title
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
A. Luu
VLM
CLIP
24
2
0
05 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
30
2
0
02 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
24
0
0
02 Jul 2022
Endowing Language Models with Multimodal Knowledge Graph Representations
Ningyuan Huang
Y. Deshpande
Yibo Liu
Houda Alberts
Kyunghyun Cho
Clara Vania
Iacer Calixto
VLM
30
14
0
27 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
36
228
0
16 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
30
124
0
15 Jun 2022
A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Zhihao Fan
Zhongyu Wei
Jingjing Chen
Siyuan Wang
Zejun Li
Jiarong Xu
Xuanjing Huang
CLL
9
6
0
11 Jun 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLM
MLLM
28
212
0
24 May 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
29
17
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
26
38
0
23 May 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLM
VLM
170
137
0
22 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
Peihan Miao
Wei Su
Gaoang Wang
Xuewei Li
Xi Li
ObjD
20
9
0
21 Apr 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen
Chunyuan Li
Xiaowei Hu
Jianwei Yang
Yujia Xie
...
Ce Liu
Kurt Keutzer
Trevor Darrell
Anna Rohrbach
Jianfeng Gao
CLIP
VLM
36
83
0
20 Apr 2022
Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
Yan Ling
Jianfei Yu
Rui Xia
17
72
0
17 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
Gen Luo
Yiyi Zhou
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
21
10
0
17 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
30
0
0
17 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
27
54
0
15 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
33
63
0
15 Apr 2022
XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation
Wei Liu
Fangyue Liu
Fei Din
Qian He
Zili Yi
VLM
18
36
0
11 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
16
17
0
31 Mar 2022
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
Mengjun Cheng
Yipeng Sun
Long Wang
Xiongwei Zhu
Kun Yao
...
Guoli Song
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
24
60
0
31 Mar 2022
Image-text Retrieval: A Survey on Recent Research and Development
Min Cao
Shiping Li
Juntao Li
Liqiang Nie
Min Zhang
26
82
0
28 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
33
4
0
24 Mar 2022
Finding Structural Knowledge in Multimodal-BERT
Victor Milewski
Miryam de Lhoneux
Marie-Francine Moens
19
9
0
17 Mar 2022
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua-Hong Wu
Haifeng Wang
MLLM
13
21
0
17 Mar 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
22
90
0
12 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
S. Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
24
36
0
03 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
25
31
0
01 Mar 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
19
16
0
25 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
213
0
18 Feb 2022
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Licheng Yu
Jun Chen
Animesh Sinha
Mengjiao MJ Wang
Hugo Chen
Tamara L. Berg
Ning Zhang
VLM
31
39
0
15 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
28
154
0
11 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
53
850
0
07 Feb 2022
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li
Zhihao Fan
Huaixiao Tou
Jingjing Chen
Zhongyu Wei
Xuanjing Huang
20
16
0
29 Jan 2022
A Survey on Visual Transfer Learning using Knowledge Graphs
Sebastian Monka
Lavdim Halilaj
Achim Rettinger
19
23
0
27 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
21
123
0
13 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
31
207
0
07 Jan 2022
Self-Training Vision Language BERTs with a Unified Conditional Model
Xiaofeng Yang
Fengmao Lv
Fayao Liu
Guosheng Lin
SSL
VLM
54
13
0
06 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
Multi-Image Visual Question Answering
Harsh Raj
Janhavi Dadhania
Akhilesh Bhardwaj
Prabuchandran KJ
6
2
0
27 Dec 2021
Hateful Memes Challenge: An Enhanced Multimodal Framework
Aijing Gao
Bingjun Wang
Jiaqi Yin
Yating Tian
12
2
0
20 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
40
555
0
16 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLM
LRM
15
30
0
16 Dec 2021
Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
Diego Garcia-Olano
Yasumasa Onoe
Joydeep Ghosh
21
18
0
13 Dec 2021
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Xinyu Wang
Min Gui
Yong-jia Jiang
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
39
52
0
13 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
40
687
0
08 Dec 2021
General Facial Representation Learning in a Visual-Linguistic Manner
Yinglin Zheng
Hao Yang
Ting Zhang
Jianmin Bao
Dongdong Chen
Yangyu Huang
Lu Yuan
Dong Chen
Ming Zeng
Fang Wen
CVBM
146
163
0
06 Dec 2021
Previous
1
2
3
4
5
Next