Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.07490
Cited By
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
20 August 2019
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LXMERT: Learning Cross-Modality Encoder Representations from Transformers"
50 / 1,512 papers shown
Title
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
32
10
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
30
274
0
09 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
36
35
0
08 Jun 2021
Check It Again: Progressive Visual Question Answering via Visual Entailment
Q. Si
Zheng Lin
Mingyu Zheng
Peng Fu
Weiping Wang
25
48
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
24
216
0
08 Jun 2021
Counterfactual Maximum Likelihood Estimation for Training Deep Networks
Xinyi Wang
Wenhu Chen
Michael Stephen Saxon
Wenjie Wang
OOD
CML
BDL
23
8
0
07 Jun 2021
BERTGEN: Multi-task Generation through BERT
Faidon Mitzalis
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
VLM
27
7
0
07 Jun 2021
SelfDoc: Self-Supervised Document Representation Learning
Peizhao Li
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
R. Jain
Varun Manjunatha
Hongfu Liu
ViT
SSL
28
160
0
07 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
13
189
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
36
374
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
31
118
0
03 Jun 2021
TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data
Pengda Qin
Yuhong Li
Kefeng Deng
Qiang Wu
21
1
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
26
45
0
03 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAML
VLM
28
70
0
01 Jun 2021
Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble
Kshitij Gupta
Devansh Gautam
R. Mamidi
27
15
0
01 Jun 2021
Dual-stream Network for Visual Recognition
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Peng Gao
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
28
63
0
31 May 2021
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering
Zujie Liang
Haifeng Hu
Jiaying Zhu
45
38
0
29 May 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers
Zhu Zhang
Jianxin Ma
Chang Zhou
Rui Men
Zhikang Li
Ming Ding
Jie Tang
Jingren Zhou
Hongxia Yang
27
46
0
29 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
28
37
0
28 May 2021
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
S. McCrae
Kehan Wang
A. Zakhor
36
15
0
26 May 2021
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
Heng-Da Xu
Zhongli Li
Qingyu Zhou
Chao Li
Zizhen Wang
Yunbo Cao
Heyan Huang
Xian-Ling Mao
46
94
0
26 May 2021
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Tao Tu
Q. Ping
Govind Thattai
Gokhan Tur
Premkumar Natarajan
28
18
0
24 May 2021
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
Edward Choi
MedIm
29
151
0
24 May 2021
Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
Kun Yan
Zied Bouraoui
Ping Wang
Shoaib Jameel
Steven Schockaert
22
21
0
21 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
26
129
0
20 May 2021
Social Behaviour Understanding using Deep Neural Networks: Development of Social Intelligence Systems
Ethan Lim Ding Feng
Zhi-Wei Neo
Aaron William De Silva
Kellie Sim
Hong-Ray Tan
T. Nguyen
K. Koh
Wenru Wang
Hoang D. Nguyen
22
2
0
20 May 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
23
40
0
18 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
31
140
0
17 May 2021
Video Corpus Moment Retrieval with Contrastive Learning
Hao Zhang
Aixin Sun
Wei Jing
Guoshun Nan
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
44
81
0
13 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
40
25
0
12 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
36
10
0
11 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Min Zhang
63
270
0
10 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
33
0
0
10 May 2021
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries
Qingyi Dong
Zhuowen Tu
Haofu Liao
Yuting Zhang
Vijay Mahadevan
Stefano Soatto
ViT
21
38
0
05 May 2021
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention
Wei Suo
Mengyang Sun
Peng Wang
Qi Wu
ObjD
30
13
0
05 May 2021
ISTR: End-to-End Instance Segmentation with Transformers
Jie Hu
Liujuan Cao
Yao Lu
Shengchuan Zhang
Yan Wang
Ke Li
Feiyue Huang
Ling Shao
Rongrong Ji
ISeg
31
93
0
03 May 2021
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
47
18
0
02 May 2021
CAT: Cross-Attention Transformer for One-Shot Object Detection
Weidong Lin
Yuyang Deng
Yang Gao
Ning Wang
Jinghao Zhou
Lingqiao Liu
Lei Zhang
Peng Wang
ViT
27
9
0
30 Apr 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
18
2
0
30 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
36
153
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
93
864
0
26 Apr 2021
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Tianrui Guan
Jun Wang
Shiyi Lan
Rohan Chandra
Zuxuan Wu
Larry S. Davis
Tianyi Zhou
ViT
3DPC
37
119
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
111
54
0
23 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
35
24
0
20 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
30
41
0
19 Apr 2021
BM-NAS: Bilevel Multimodal Neural Architecture Search
Yihang Yin
Siyu Huang
Xiang Zhang
34
27
0
19 Apr 2021
A recipe for annotating grounded clarifications
Luciana Benotti
P. Blackburn
39
17
0
18 Apr 2021
Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models
Tejas Srinivasan
Yonatan Bisk
VLM
32
56
0
18 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
31
164
0
16 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
33
20
0
16 Apr 2021
Previous
1
2
3
...
24
25
26
...
29
30
31
Next