ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,119 papers shown
Title
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLMVPVLMVLM
304
224
0
24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLMSSL
129
11
0
24 Sep 2021
Transferring Knowledge from Vision to Language: How to Achieve it and
  how to Measure it?
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
Tobias Norlund
Lovisa Hagström
Richard Johansson
72
25
0
23 Sep 2021
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and
  Benchmark
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark
Xun Gao
Yin Zhao
Jie Zhang
Longjun Cai
58
6
0
23 Sep 2021
Cross-Modal Coherence for Text-to-Image Retrieval
Cross-Modal Coherence for Text-to-Image Retrieval
Malihe Alikhani
Fangda Han
Hareesh Ravi
Mubbasir Kapadia
Vladimir Pavlovic
Matthew Stone
65
9
0
22 Sep 2021
Caption Enriched Samples for Improving Hateful Memes Detection
Caption Enriched Samples for Improving Hateful Memes Detection
Efrat Blaier
Itzik Malkiel
Lior Wolf
VLM
96
24
0
22 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with
  real images
COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin
Shivanshu Gupta
Matt Gardner
Jonathan Berant
CoGe
105
29
0
22 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object
  Knowledge Distillation
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIPVLM
110
29
0
22 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLMViT
125
45
0
21 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
224
373
0
17 Sep 2021
An End-to-End Transformer Model for 3D Object Detection
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra
Rohit Girdhar
Armand Joulin
3DPCViT
153
489
0
16 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
123
47
0
16 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
103
61
0
15 Sep 2021
What Vision-Language Models `See' when they See Scenes
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
99
13
0
15 Sep 2021
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Da Yin
Liunian Harold Li
Ziniu Hu
Nanyun Peng
Kai-Wei Chang
159
56
0
14 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the
  Dataset into Explicit Training Examples for Visual Question Answering
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
116
20
0
13 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
117
62
0
13 Sep 2021
TEASEL: A Transformer-Based Speech-Prefixed Language Model
TEASEL: A Transformer-Based Speech-Prefixed Language Model
Mehdi Arjmand
M. Dousti
H. Moradi
69
18
0
12 Sep 2021
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
Mert Inan
P. Sharma
Baber Khalid
Radu Soricut
Matthew Stone
Malihe Alikhani
EGVM
50
13
0
11 Sep 2021
A Survey on Multi-modal Summarization
A Survey on Multi-modal Summarization
Anubhav Jangra
Sourajit Mukherjee
Adam Jatowt
S. Saha
M. Hasanuzzaman
73
63
0
11 Sep 2021
MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their
  Targets
MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Shraman Pramanick
Shivam Sharma
Dimitar Dimitrov
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
77
131
0
11 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
299
423
0
10 Sep 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
146
23
0
10 Sep 2021
We went to look for meaning and all we got were these lousy
  representations: aspects of meaning representation for computational
  semantics
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
66
0
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question
  Answering System by Knowledge Distillation
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
57
14
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in
  Multimodal Transformers
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
86
82
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
33
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
118
38
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense
  in Text Generation Models
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
87
13
0
08 Sep 2021
Self-supervised Contrastive Cross-Modality Representation Learning for
  Spoken Question Answering
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Chenyu You
Nuo Chen
Yuexian Zou
SSL
102
64
0
08 Sep 2021
Learning grounded word meaning representations on similarity graphs
Learning grounded word meaning representations on similarity graphs
Mariella Dimiccoli
H. Wendt
Pau Batlle
41
1
0
07 Sep 2021
CTRL-C: Camera calibration TRansformer with Line-Classification
CTRL-C: Camera calibration TRansformer with Line-Classification
Jinwoo Lee
Hyun-Young Go
Hyunjoon Lee
Sunghyun Cho
Minhyuk Sung
Junho Kim
ViT
87
36
0
06 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
98
79
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
82
20
0
05 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by
  Image and Caption Generation
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
56
1
0
04 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
85
19
0
04 Sep 2021
Supervised Contrastive Learning for Multimodal Unreliable News Detection
  in COVID-19 Pandemic
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic
Wenjia Zhang
Lin Gui
Yulan He
71
34
0
04 Sep 2021
Multimodal Conditionality for Natural Language Generation
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
73
10
0
02 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Point-of-Interest Type Prediction using Text and Images
Danae Sánchez Villegas
Nikolaos Aletras
118
14
0
01 Sep 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
91
87
0
01 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
  Representations
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
73
19
0
01 Sep 2021
On the Significance of Question Encoder Sequence Model in the
  Out-of-Distribution Performance in Visual Question Answering
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
77
0
0
28 Aug 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual
  Pre-training
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
101
20
0
25 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
111
66
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
183
801
0
24 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
120
140
0
23 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language
  Modeling Network
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
94
154
0
22 Aug 2021
Multimodal Breast Lesion Classification Using Cross-Attention Deep
  Networks
Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks
Hung Q. Vo
Pengyu Yuan
T. He
Stephen T. C. Wong
H. Nguyen
56
3
0
21 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
62
10
0
21 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
59
144
0
20 Aug 2021
Previous
123...323334...414243
Next