Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,118 papers shown
Title
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
55
5
0
13 Oct 2020
Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph
Jingkang Yang
Weirong Chen
Xue Jiang
Xiaopeng Yan
Huabin Zheng
Wayne Zhang
NoLa
77
13
0
12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
71
5
0
10 Oct 2020
comp-syn: Perceptually Grounded Word Embeddings with Color
Bhargav Srinivasa Desikan
Tasker Hull
E. Nadler
Douglas Guilbeault
Aabir Abubaker Kar
Mark Chu
Donald Ruggiero Lo Sardo
34
7
0
08 Oct 2020
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&Ro
LLMAG
165
450
0
08 Oct 2020
Multi-label classification of promotions in digital leaflets using textual and visual information
R. Arroyo
David Jiménez-Cabello
Javier Martínez-Cebrián
59
3
0
07 Oct 2020
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization
Tzuf Paz-Argaman
Yuval Atzmon
Gal Chechik
Reut Tsarfaty
VLM
64
32
0
07 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
78
22
0
06 Oct 2020
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
112
249
0
06 Oct 2020
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
140
24
0
06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
43
2
0
05 Oct 2020
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
141
44
0
02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
234
774
0
02 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
79
21
0
30 Sep 2020
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Tao Yu
Chien-Sheng Wu
Xi Lin
Bailin Wang
Y. Tan
Xinyi Yang
Dragomir R. Radev
R. Socher
Caiming Xiong
LMTD
109
256
0
29 Sep 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
Lefei Zhang
Jianfeng Gao
Zicheng Liu
VLM
110
57
0
28 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLM
MLLM
95
102
0
23 Sep 2020
Preserving Integrity in Online Social Networks
A. Halevy
Cristian Canton Ferrer
Hao Ma
Umut Ozertem
Patrick Pantel
Marzieh Saeidi
Fabrizio Silvestri
Ves Stoyanov
88
59
0
22 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
OOD
62
142
0
18 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi
A. Alam
Muhammad Numan Khan
Jawad Khan
Young-Koo Lee
55
41
0
17 Sep 2020
Multi-modal Summarization for Video-containing Documents
Xiyan Fu
Jun Wang
Zhenglu Yang
64
24
0
17 Sep 2020
Machine Learning for Temporal Data in Finance: Challenges and Opportunities
J. Wittenbach
Learning McLean
Virginia Brian
AI4TS
28
1
0
11 Sep 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Khyathi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffM
VLM
88
3
0
10 Sep 2020
Investigating Gender Bias in BERT
Rishabh Bhardwaj
Navonil Majumder
Soujanya Poria
85
108
0
10 Sep 2020
Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations
Meng-Jiun Chiou
Roger Zimmermann
Jiashi Feng
111
1
0
10 Sep 2020
Video Moment Retrieval via Natural Language Queries
Xinli Yu
Mohsen Malmir
C. He
Yue Liu
Rex Wu
27
1
0
04 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
70
67
0
03 Sep 2020
Practical Cross-modal Manifold Alignment for Grounded Language
A. Nguyen
Luke E. Richards
Gaoussou Youssouf Kebe
Edward Raff
Kasra Darvish
Frank Ferraro
Cynthia Matuszek
22
4
0
01 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
60
8
0
31 Aug 2020
A Survey of Visual Analytics Techniques for Machine Learning
Jun Yuan
Changjian Chen
Weikai Yang
Mengchen Liu
Jiazhi Xia
Shixia Liu
95
223
0
21 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
63
13
0
18 Aug 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Leilei Gan
OOD
81
88
0
16 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Leilei Gan
71
36
0
16 Aug 2020
Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
Shamane Siriwardhana
Andrew Reis
Rivindu Weerasekera
Suranga Nanayakkara
90
112
0
15 Aug 2020
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
60
7
0
14 Aug 2020
A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning
Mathieu Seurin
Florian Strub
Philippe Preux
Olivier Pietquin
49
5
0
07 Aug 2020
Polysemy Deciphering Network for Robust Human-Object Interaction Detection
Xubin Zhong
Changxing Ding
X. Qu
Dacheng Tao
124
59
0
07 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
137
163
0
06 Aug 2020
Word meaning in minds and machines
Brenden M. Lake
G. Murphy
NAI
112
118
0
04 Aug 2020
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
122
162
0
04 Aug 2020
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
Md. Mofijul Islam
Tariq Iqbal
56
81
0
03 Aug 2020
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
Liu Yang
VLM
60
5
0
02 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
160
30
0
31 Jul 2020
Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Aneeshan Sain
A. Bhunia
Yongxin Yang
Tao Xiang
Yi-Zhe Song
77
50
0
29 Jul 2020
Pre-training for Video Captioning Challenge 2020 Summary
Yingwei Pan
Jun Xu
Yehao Li
Ting Yao
Tao Mei
20
1
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
105
29
0
26 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
100
86
0
23 Jul 2020
Analogical Reasoning for Visually Grounded Language Acquisition
Bo Wu
Haoyu Qin
Alireza Zareian
Carl Vondrick
Shih-Fu Chang
46
9
0
22 Jul 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
126
99
0
19 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
Previous
1
2
3
...
39
40
41
42
43
Next