Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,119 papers shown
Title
Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study
Ziyuan Qin
Huahui Yi
Qicheng Lao
Kang Li
VLM
105
71
0
30 Sep 2022
Domain-Unified Prompt Representations for Source-Free Domain Generalization
Hongjing Niu
Hanting Li
Feng Zhao
Bin Li
VLM
117
19
0
29 Sep 2022
Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis
Shivam Sharma
Mohd Khizir Siddiqui
Md. Shad Akhtar
Tanmoy Chakraborty
SSL
45
5
0
29 Sep 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
137
31
0
28 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
105
28
0
28 Sep 2022
A Dataset of Alt Texts from HCI Publications: Analyses and Uses Towards Producing More Descriptive Alt Texts of Data Visualizations in Scientific Papers
S. Chintalapati
Jonathan Bragg
Lucy Lu Wang
77
23
0
27 Sep 2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
147
26
0
27 Sep 2022
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
83
34
0
27 Sep 2022
RepsNet: Combining Vision with Language for Automated Medical Reports
A. Tanwani
Joelle Barral
Daniel Freedman
MedIm
93
23
0
27 Sep 2022
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned
Ahmed Sabir
137
0
0
26 Sep 2022
LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation
Yue Zhang
Parisa Kordjamshidi
82
10
0
26 Sep 2022
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline
Lichen Zhao
Daigang Cai
Jing Zhang
Lu Sheng
Dong Xu
Ruizhi Zheng
Yinjie Zhao
Lipeng Wang
Xibo Fan
71
27
0
24 Sep 2022
Visual representations in the human brain are aligned with large language models
Adrien Doerig
Tim C Kietzmann
Emily J. Allen
Yihan Wu
Thomas Naselaris
Kendrick Norris Kay
I. Charest
100
24
0
23 Sep 2022
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training
Rogerio Bonatti
Sai H. Vemprala
Shuang Ma
Felipe Vieira Frujeri
Shuhang Chen
Ashish Kapoor
94
23
0
22 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
156
22
0
21 Sep 2022
I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Yongqin Xian
Luc Van Gool
F. Tombari
VLM
95
38
0
21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
304
1,303
0
20 Sep 2022
The Ability of Image-Language Explainable Models to Resemble Domain Expertise
P. Werner
Anna Zapaishchykova
Ujjwal Ratan
96
2
0
19 Sep 2022
How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?
Lovisa Hagström
Richard Johansson
VLM
70
4
0
19 Sep 2022
Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising
Tan Yu
Jie Liu
Yi Yang
Yi Li
Hongliang Fei
Ping Li
67
1
0
19 Sep 2022
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
Guosheng Lin
VLM
195
56
0
15 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
152
153
0
15 Sep 2022
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
Zhihong Chen
Guanbin Li
Xiang Wan
178
73
0
15 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
97
7
0
15 Sep 2022
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Nanning Zheng
114
8
0
14 Sep 2022
Computational Sarcasm Analysis on Social Media: A Systematic Review
Faria Binte Kader
Nafisa Hossain Nujat
Tasmia Binte Sogir
Mohsinul Kabir
H. Mahmud
Md. Kamrul Hasan
60
5
0
13 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
193
29
0
12 Sep 2022
Instruction-driven history-aware policies for robotic manipulations
Pierre-Louis Guhur
Shizhe Chen
Ricardo Garcia Pinel
Makarand Tapaswi
Ivan Laptev
Cordelia Schmid
LM&Ro
198
109
0
11 Sep 2022
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
Tiancheng Zhao
Peng Liu
Kyusong Lee
VLM
MLLM
ObjD
42
5
0
10 Sep 2022
Pre-training image-language transformers for open-vocabulary tasks
A. Piergiovanni
Weicheng Kuo
A. Angelova
VLM
ViT
119
10
0
09 Sep 2022
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Jiong Wang
Zhou Zhao
Weike Jin
75
0
0
08 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
114
90
0
07 Sep 2022
Multi-Modal Experience Inspired AI Creation
Qian Cao
Xu Chen
Ruihua Song
Hao Jiang
Guangyan Yang
Bo Zhao
74
3
0
02 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
100
27
0
29 Aug 2022
Prompt Tuning with Soft Context Sharing for Vision-Language Models
Kun Ding
Ying Wang
Pengzhang Liu
Qiang Yu
Hao Zhang
Shiming Xiang
Chunhong Pan
VPVLM
VLM
79
15
0
29 Aug 2022
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective
Jiangmeng Li
Yanan Zhang
Jingyao Wang
Hui Xiong
Chengbo Jiao
Xiaohui Hu
Changwen Zheng
Gang Hua
CML
116
30
0
26 Aug 2022
AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in Educational Applications
Yusen Zhang
Zhongli Li
Qingyu Zhou
Ziyi Liu
Chao Li
Mina W. Ma
Yunbo Cao
Hongzhi Liu
93
1
0
26 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
123
167
0
25 Aug 2022
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
97
46
0
25 Aug 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
Stan Weixian Lei
Difei Gao
Jay Zhangjie Wu
Yuxuan Wang
Wei Liu
Meng Zhang
Mike Zheng Shou
81
38
0
24 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Yanbei Chen
Massimiliano Mancini
Xiatian Zhu
Zeynep Akata
157
121
0
24 Aug 2022
FashionVQA: A Domain-Specific Visual Question Answering System
Min Wang
A. Mahjoubfar
Anupama Joshi
106
4
0
24 Aug 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
56
0
0
23 Aug 2022
Semantic-Enhanced Image Clustering
Shao-Qian Cai
Li-qing Qiu
Xiaojun Chen
Qin Zhang
Long Chen
VLM
62
15
0
21 Aug 2022
SPOT: Knowledge-Enhanced Language Representations for Information Extraction
Jiacheng Li
Yannis Katsis
Tyler Baldwin
Ho-Cheol Kim
Andrew Bartko
Julian McAuley
Chun-Nan Hsu
85
17
0
20 Aug 2022
Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph
Yufen Huang
Zhuo Chen
Jiaoyan Chen
Jeff Z. Pan
Zhen Yao
Wen Zhang
60
7
0
19 Aug 2022
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
92
11
0
19 Aug 2022
VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media
Georgios Chochlakis
Tejas Srinivasan
Jesse Thomason
Shrikanth Narayanan
VLM
89
4
0
18 Aug 2022
The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs
C. Rockwell
Justin Johnson
David Fouhey
ViT
94
43
0
18 Aug 2022
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Xiujun Shu
Wei Wen
Haoqian Wu
Keyun Chen
Yi-Zhe Song
Ruizhi Qiao
Bohan Ren
Xiao Wang
105
100
0
18 Aug 2022
Previous
1
2
3
...
22
23
24
...
41
42
43
Next