Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,200 papers shown
Title
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
96
27
0
09 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
105
555
0
07 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
154
1,678
0
06 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
80
1
0
05 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
68
35
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
78
45
0
04 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
91
49
0
01 Mar 2023
Rethinking Efficient Tuning Methods from a Unified Perspective
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Yiliang Lv
Deli Zhao
Jingren Zhou
85
11
0
01 Mar 2023
Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
Xiaoman Zhang
Chaoyi Wu
Ya Zhang
Yanfeng Wang
Weidi Xie
MedIm
108
137
0
27 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
110
37
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
45
12
0
27 Feb 2023
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection
Linhao Zhang
Li Jin
Xian Sun
Guangluan Xu
Zequn Zhang
Xiaoyu Li
Nayu Liu
Qing Liu
Shiyao Yan
81
8
0
27 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
151
214
0
20 Feb 2023
Neural Attention Memory
Hyoungwook Nam
S. Seo
HAI
54
1
0
18 Feb 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
53
2
0
17 Feb 2023
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
Zhihong Chen
Shizhe Diao
Benyou Wang
Guanbin Li
Xiang Wan
MedIm
127
33
0
17 Feb 2023
Multimodal Subtask Graph Generation from Instructional Videos
Y. Jang
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Moontae Lee
Ho Hin Lee
72
10
0
17 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
80
29
0
16 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
141
46
0
14 Feb 2023
Large Scale Multi-Lingual Multi-Modal Summarization Dataset
Yash Verma
Anubhav Jangra
Raghvendra Kumar
S. Saha
30
14
0
13 Feb 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
Ryumei Nakada
Halil Ibrahim Gulluk
Zhun Deng
Wenlong Ji
James Zou
Linjun Zhang
SSL
VLM
106
41
0
13 Feb 2023
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
Bingqian Lin
Yi Zhu
Xiaodan Liang
Liang Lin
Jian-zhuo Liu
CoGe
LM&Ro
101
3
0
13 Feb 2023
HateProof: Are Hateful Meme Detection Systems really Robust?
Piush Aggarwal
Pranit Chawla
Mithun Das
Punyajoy Saha
Binny Mathew
Torsten Zesch
Animesh Mukherjee
AAML
63
9
0
11 Feb 2023
On Achieving Privacy-Preserving State-of-the-Art Edge Intelligence
Daphnee Chabal
Dolly Sapra
Z. Mann
56
5
0
10 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
150
34
0
10 Feb 2023
Is Multimodal Vision Supervision Beneficial to Language?
Avinash Madasu
Vasudev Lal
66
4
0
10 Feb 2023
Learning by Asking for Embodied Visual Navigation and Task Completion
Ying Shen
Ismini Lourentzou
79
1
0
09 Feb 2023
Prompting for Multimodal Hateful Meme Classification
Rui Cao
Roy Ka-wei Lee
Wen-Haw Chong
Jing Jiang
VLM
83
83
0
08 Feb 2023
SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor Segmentation in PET/CT Images
Gary Y. Li
Junyu Chen
Se-In Jang
Kuang Gong
Quanzheng Li
ViT
MedIm
87
14
0
08 Feb 2023
Self-Supervised Relation Alignment for Scene Graph Generation
Bicheng Xu
Renjie Liao
Leonid Sigal
69
0
0
02 Feb 2023
Multimodal Chain-of-Thought Reasoning in Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Hai Zhao
George Karypis
Alexander J. Smola
LRM
140
466
0
02 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
116
171
0
01 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
101
32
0
01 Feb 2023
Efficient Scopeformer: Towards Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection
Yassine Barhoumi
N. Bouaynaya
Ghulam Rasool
MedIm
41
5
0
01 Feb 2023
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning
Jian Zhu
Hanli Wang
Miaojing Shi
LRM
57
4
0
30 Jan 2023
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
Beier Zhu
Yulei Niu
Saeil Lee
Minhoe Hur
Hanwang Zhang
VLM
VPVLM
124
24
0
29 Jan 2023
Down the Rabbit Hole: Detecting Online Extremism, Radicalisation, and Politicised Hate Speech
Jarod Govers
Philip G. Feldman
Aaron Dant
Panos Patros
55
28
0
27 Jan 2023
Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?
Shivam Sharma
Atharva Kulkarni
Tharun Suresh
Himanshi Mathur
Preslav Nakov
Md. Shad Akhtar
Tanmoy Chakraborty
98
17
0
26 Jan 2023
Lexi: Self-Supervised Learning of the UI Language
Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
63
17
0
23 Jan 2023
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Aviad Aberdam
David Bensaid
Alona Golts
Roy Ganz
Oren Nuriel
Royee Tichauer
Shai Mazor
Ron Litman
VLM
CLIP
90
13
0
18 Jan 2023
Curriculum Script Distillation for Multilingual Visual Question Answering
Khyathi Chandu
A. Geramifard
71
0
0
17 Jan 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
62
2
0
15 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
84
9
0
11 Jan 2023
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture
P. Verschuuren
Jie Gao
A. V. Eeden
Stylianos Oikonomou
Anil Bandhakavi
66
2
0
09 Jan 2023
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology
Chaoyi Wu
Xiaoman Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MA
VLM
118
120
0
05 Jan 2023
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Da Yin
Feng Gao
Govind Thattai
Michael F. Johnston
Kai-Wei Chang
VLM
94
15
0
05 Jan 2023
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
Jiaming Zhang
Xingjun Ma
Qiaomin Yi
Jitao Sang
Yugang Jiang
Yaowei Wang
Changsheng Xu
93
26
0
31 Dec 2022
BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Haowen Hou
Xiaopeng Yan
Yigeng Zhang
Fengzong Lian
Zhanhui Kang
BDL
34
0
0
29 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
124
259
0
21 Dec 2022
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Matthieu Futeral
Cordelia Schmid
Ivan Laptev
Benoît Sagot
Rachel Bawden
106
31
0
20 Dec 2022
Previous
1
2
3
...
11
12
13
...
22
23
24
Next