Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,119 papers shown
Title
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
Edward Choi
MedIm
113
161
0
24 May 2021
Human-centric Relation Segmentation: Dataset and Solution
Si Liu
Zitian Wang
Yulu Gao
Lejian Ren
Yue Liao
Guanghui Ren
Bo Li
Shuicheng Yan
45
12
0
24 May 2021
Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
Kun Yan
Zied Bouraoui
Ping Wang
Shoaib Jameel
Steven Schockaert
59
24
0
21 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
85
133
0
20 May 2021
Pathdreamer: A World Model for Indoor Navigation
Jing Yu Koh
Honglak Lee
Yinfei Yang
Jason Baldridge
Peter Anderson
96
87
0
18 May 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
109
41
0
18 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
187
507
0
18 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
138
142
0
17 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
52
4
0
16 May 2021
Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich
Cordelia Schmid
Chen Sun
LM&Ro
116
197
0
13 May 2021
Video Corpus Moment Retrieval with Contrastive Learning
Hao Zhang
Aixin Sun
Wei Jing
Guoshun Nan
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
108
88
0
13 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
157
25
0
12 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
86
4
0
12 May 2021
Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal
C. Kennington
LM&Ro
58
0
0
10 May 2021
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Mathew Monfort
SouYoung Jin
Alexander H. Liu
David Harwath
Rogerio Feris
James Glass
Aude Oliva
56
60
0
10 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Min Zhang
225
280
0
10 May 2021
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
81
18
0
02 May 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
28
2
0
30 Apr 2021
Comparing Visual Reasoning in Humans and AI
Shravan Murlidaran
Wenjie Wang
Miguel P. Eckstein
63
1
0
29 Apr 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
97
19
0
29 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
89
157
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
308
898
0
26 Apr 2021
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
70
106
0
25 Apr 2021
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
116
37
0
24 Apr 2021
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Tianrui Guan
Jun Wang
Shiyi Lan
Rohan Chandra
Zuxuan Wu
Larry S. Davis
Tianyi Zhou
ViT
3DPC
94
123
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
154
56
0
23 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
143
1,274
0
22 Apr 2021
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
Kanishk Jain
Vineet Gandhi
78
19
0
21 Apr 2021
Understanding Synonymous Referring Expressions via Contrastive Features
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
ObjD
76
4
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
60
42
0
19 Apr 2021
BM-NAS: Bilevel Multimodal Neural Architecture Search
Yihang Yin
Siyu Huang
Xiang Zhang
84
27
0
19 Apr 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Yiheng Xu
Tengchao Lv
Lei Cui
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Furu Wei
MLLM
VLM
121
130
0
18 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
269
1,597
0
18 Apr 2021
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales
Jacob Andreas
Gašper Beguš
M. Bronstein
R. Diamant
Denley Delaney
...
D. Tchernov
P. Tønnesen
Antonio Torralba
Daniel M. Vogt
Robert J. Wood
60
10
0
17 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
111
348
0
17 Apr 2021
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
Te-Lin Wu
Cheng-rong Li
Mingyang Zhang
Tao Chen
Spurthi Amba Hombaiah
Michael Bendersky
79
14
0
16 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
117
170
0
16 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
147
30
0
16 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
67
20
0
16 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge Belongie
Ser-Nam Lim
82
13
0
15 Apr 2021
Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training
Hassan Shahmohammadi
Hendrik P. A. Lensch
R. Baayen
58
19
0
15 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
102
163
0
13 Apr 2021
Disentangled Motif-aware Graph Learning for Phrase Grounding
Zongshen Mu
Siliang Tang
Jie Tan
Qiang Yu
Yueting Zhuang
GNN
105
35
0
13 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
156
465
0
12 Apr 2021
FreSaDa: A French Satire Data Set for Cross-Domain Satire Detection
Radu Tudor Ionescu
Adrian-Gabriel Chifu
50
11
0
10 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
84
69
0
09 Apr 2021
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning
Vikram Shree
B. Asfora
Rachel Zheng
Samantha Hong
Jacopo Banfi
M. Campbell
49
10
0
08 Apr 2021
Video Question Answering with Phrases via Semantic Roles
Arka Sadhu
Kan Chen
Ram Nevatia
51
16
0
08 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
63
28
0
08 Apr 2021
Previous
1
2
3
...
35
36
37
...
41
42
43
Next