ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSL
    VLM
ArXivPDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,094 papers shown
Title
SelfDoc: Self-Supervised Document Representation Learning
SelfDoc: Self-Supervised Document Representation Learning
Peizhao Li
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
R. Jain
Varun Manjunatha
Hongfu Liu
ViT
SSL
28
160
0
07 Jun 2021
Oriented Object Detection with Transformer
Oriented Object Detection with Transformer
Teli Ma
Mingyuan Mao
Honghui Zheng
Peng Gao
Xiaodi Wang
Shumin Han
Errui Ding
Baochang Zhang
David Doermann
ViT
27
40
0
06 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
13
189
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
36
374
0
04 Jun 2021
Human-Adversarial Visual Question Answering
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OOD
EgoV
AAML
26
60
0
04 Jun 2021
Scalable Transformers for Neural Machine Translation
Scalable Transformers for Neural Machine Translation
Peng Gao
Shijie Geng
Ping Luo
Xiaogang Wang
Jifeng Dai
Hongsheng Li
31
13
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual
  Learning
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
31
118
0
03 Jun 2021
TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data
TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data
Pengda Qin
Yuhong Li
Kefeng Deng
Qiang Wu
21
1
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of
  the state of the art
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
26
45
0
03 Jun 2021
More Identifiable yet Equally Performant Transformers for Text
  Classification
More Identifiable yet Equally Performant Transformers for Text Classification
Rishabh Bhardwaj
Navonil Majumder
Soujanya Poria
Eduard H. Hovy
16
6
0
02 Jun 2021
Towards Efficient Cross-Modal Visual Textual Retrieval using
  Transformer-Encoder Deep Features
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
Claudio Gennaro
Stéphane Marchand-Maillet
22
7
0
01 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA
  Models
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAML
VLM
28
70
0
01 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Wei Lin
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
39
24
0
31 May 2021
Dual-stream Network for Visual Recognition
Dual-stream Network for Visual Recognition
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Peng Gao
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
28
63
0
31 May 2021
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal
  Numerical Reasoning
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
Jiaqi Chen
Jianheng Tang
Jinghui Qin
Xiaodan Liang
Lingbo Liu
Eric Xing
Liang Lin
AIMat
22
160
0
30 May 2021
Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation
Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation
Shuhe Wang
Yuxian Meng
Xiaofei Sun
Fei Wu
Rongbin Ouyang
Rui Yan
Tianwei Zhang
Jiwei Li
23
15
0
30 May 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
  via Non-Autoregressive Generative Transformers
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers
Zhu Zhang
Jianxin Ma
Chang Zhou
Rui Men
Zhikang Li
Ming Ding
Jie Tang
Jingren Zhou
Hongxia Yang
27
46
0
29 May 2021
Maintaining Common Ground in Dynamic Environments
Maintaining Common Ground in Dynamic Environments
Takuma Udagawa
Akiko Aizawa
27
11
0
29 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
28
37
0
28 May 2021
Maria: A Visual Experience Powered Conversational Agent
Maria: A Visual Experience Powered Conversational Agent
Zujie Liang
Huang Hu
Can Xu
Chongyang Tao
Xiubo Geng
Yining Chen
Fan Liang
Daxin Jiang
25
29
0
27 May 2021
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
S. McCrae
Kehan Wang
A. Zakhor
36
15
0
26 May 2021
Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Jingwen Fu
Xiaoyi Zhang
Yuwang Wang
Wenjun Zeng
Sam Yang
Grayson Hilliard
34
14
0
25 May 2021
Enhance Multimodal Model Performance with Data Augmentation: Facebook
  Hateful Meme Challenge Solution
Enhance Multimodal Model Performance with Data Augmentation: Facebook Hateful Meme Challenge Solution
Yang Li
Zi-xin Zhang
Hutchin Huang
27
1
0
25 May 2021
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic
  Representation
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Tao Tu
Q. Ping
Govind Thattai
Gokhan Tur
Premkumar Natarajan
28
18
0
24 May 2021
Multi-modal Understanding and Generation for Medical Images and Text via
  Vision-Language Pre-Training
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
Edward Choi
MedIm
29
151
0
24 May 2021
Human-centric Relation Segmentation: Dataset and Solution
Human-centric Relation Segmentation: Dataset and Solution
Si Liu
Zitian Wang
Yulu Gao
Lejian Ren
Yue Liao
Guanghui Ren
Bo Li
Shuicheng Yan
13
10
0
24 May 2021
Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
Kun Yan
Zied Bouraoui
Ping Wang
Shoaib Jameel
Steven Schockaert
22
21
0
21 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
26
129
0
20 May 2021
Pathdreamer: A World Model for Indoor Navigation
Pathdreamer: A World Model for Indoor Navigation
Jing Yu Koh
Honglak Lee
Yinfei Yang
Jason Baldridge
Peter Anderson
31
81
0
18 May 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
23
40
0
18 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
45
448
0
18 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
34
140
0
17 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image
  Retrieval
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
23
3
0
16 May 2021
Episodic Transformer for Vision-and-Language Navigation
Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich
Cordelia Schmid
Chen Sun
LM&Ro
45
194
0
13 May 2021
Video Corpus Moment Retrieval with Contrastive Learning
Video Corpus Moment Retrieval with Contrastive Learning
Hao Zhang
Aixin Sun
Wei Jing
Guoshun Nan
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
44
81
0
13 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention
  Traces
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
40
25
0
12 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language
  Matching
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
29
4
0
12 May 2021
Language Acquisition is Embodied, Interactive, Emotive: a Research
  Proposal
Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal
C. Kennington
LM&Ro
46
0
0
10 May 2021
Spoken Moments: Learning Joint Audio-Visual Representations from Video
  Descriptions
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Mathew Monfort
SouYoung Jin
Alexander H. Liu
David Harwath
Rogerio Feris
James Glass
Aude Oliva
22
59
0
10 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic
  Survey
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Min Zhang
63
270
0
10 May 2021
A survey on VQA_Datasets and Approaches
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
50
18
0
02 May 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
18
2
0
30 Apr 2021
Comparing Visual Reasoning in Humans and AI
Comparing Visual Reasoning in Humans and AI
Shravan Murlidaran
Wenjie Wang
Miguel P. Eckstein
29
1
0
29 Apr 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual
  Explanations
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
37
18
0
29 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
36
153
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
93
864
0
26 Apr 2021
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and
  Images
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
23
104
0
25 Apr 2021
MusCaps: Generating Captions for Music Audio
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
35
36
0
24 Apr 2021
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object
  Detection with Transformers
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Tianrui Guan
Jun Wang
Shiyi Lan
Rohan Chandra
Zuxuan Wu
Larry S. Davis
Tianyi Zhou
ViT
3DPC
37
119
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
111
54
0
23 Apr 2021
Previous
123...343536...404142
Next