ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,119 papers shown
Title
Counterfactually Measuring and Eliminating Social Bias in
  Vision-Language Pre-training Models
Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models
Yi Zhang
Junyan Wang
Jitao Sang
99
28
0
03 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context
  Augmented Dialogue System: A Review
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
79
2
0
02 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for
  Vision-Language Representation Learning and Retrieval
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
80
0
0
02 Jul 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with
  Objects, Attributes and Relations
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLMCoGeMLLM
130
99
0
01 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
127
17
0
01 Jul 2022
Improving Visual Grounding by Encouraging Consistent Gradient-based
  Explanations
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
Ziyan Yang
Kushal Kafle
Franck Dernoncourt
Vicente Ordónez Román
VLM
104
25
0
30 Jun 2022
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
Yangyang Guo
Liqiang Nie
Yongkang Wong
Yebin Liu
Zhiyong Cheng
Mohan S. Kankanhalli
123
40
0
30 Jun 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
103
0
0
28 Jun 2022
Endowing Language Models with Multimodal Knowledge Graph Representations
Endowing Language Models with Multimodal Knowledge Graph Representations
Ningyuan Huang
Y. Deshpande
Yibo Liu
Houda Alberts
Kyunghyun Cho
Clara Vania
Iacer Calixto
VLM
72
16
0
27 Jun 2022
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Chuwei Luo
Guozhi Tang
Qi Zheng
Cong Yao
Lianwen Jin
Chenliang Li
Yang Xue
Luo Si
91
18
0
27 Jun 2022
Automatic Generation of Product-Image Sequence in E-commerce
Automatic Generation of Product-Image Sequence in E-commerce
Xiaochuan Fan
Chi Zhang
Yong-Jie Yang
Yue Shang
Xueying Zhang
Zhen He
Yun Xiao
Bo Long
Lingfei Wu
70
4
0
26 Jun 2022
Towards Unsupervised Content Disentanglement in Sentence Representations
  via Syntactic Roles
Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles
G. Felhi
Joseph Le Roux
Djamé Seddah
DRL
49
5
0
22 Jun 2022
ProtoCLIP: Prototypical Contrastive Language Image Pretraining
ProtoCLIP: Prototypical Contrastive Language Image Pretraining
Delong Chen
Zhao Wu
Fan Liu
Zaiquan Yang
Huaxi Huang
Ying Tan
Erjin Zhou
VLMCLIP
105
28
0
22 Jun 2022
Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia
  Image-Caption Matching
Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching
Nicola Messina
D. Coccomini
Andrea Esuli
Fabrizio Falchi
39
6
0
21 Jun 2022
Probing Visual-Audio Representation for Video Highlight Detection via
  Hard-Pairs Guided Contrastive Learning
Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning
Shuaicheng Li
Feng Zhang
Kunlin Yang
Lin-Na Liu
Shinan Liu
Jun Hou
Shuai Yi
100
9
0
21 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
141
136
0
18 Jun 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLMCLL
107
76
0
18 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
91
44
0
17 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjDVLMMLLM
173
412
0
17 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
  Retrieval
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
79
11
0
17 Jun 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language
  Representation Learning
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
103
69
0
17 Jun 2022
Local Slot Attention for Vision-and-Language Navigation
Local Slot Attention for Vision-and-Language Navigation
Yifeng Zhuang
Qiang Sun
Yanwei Fu
Lifeng Chen
Xiangyang Xue
120
2
0
17 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Xinze Wang
LM&RoCoGe
95
63
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
127
90
0
16 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
166
240
0
16 Jun 2022
Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
  Analysis in Videos
Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos
Lianyang Ma
Yu Yao
Tao Liang
Tongliang Liu
62
4
0
16 Jun 2022
Multimodal Dialogue State Tracking
Multimodal Dialogue State Tracking
Hung Le
Nancy F. Chen
Guosheng Lin
75
9
0
16 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal
  Learners
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLMAI4CE
99
17
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLMObjD
119
130
0
15 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language
  Modeling
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLMVLM
92
84
0
14 Jun 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
  Transformer
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
113
57
0
14 Jun 2022
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning
  Tasks
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
Tuan Dinh
Yuchen Zeng
Ruisu Zhang
Ziqian Lin
Michael Gira
Shashank Rajput
Jy-yong Sohn
Dimitris Papailiopoulos
Kangwook Lee
LMTD
181
140
0
14 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
  Learning
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
111
246
0
13 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
Peng Xu
Xiatian Zhu
David Clifton
ViT
241
579
0
13 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
80
102
0
13 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
80
2
0
13 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjDVLM
106
304
0
12 Jun 2022
Bootstrapping Multi-view Representations for Fake News Detection
Bootstrapping Multi-view Representations for Fake News Detection
Qichao Ying
Xiaoxiao Hu
Yangming Zhou
Zhenxing Qian
Dan Zeng
Shiming Ge
81
51
0
12 Jun 2022
A Unified Continuous Learning Framework for Multi-modal Knowledge
  Discovery and Pre-training
A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Zhihao Fan
Zhongyu Wei
Jingjing Chen
Siyuan Wang
Zejun Li
Jiarong Xu
Xuanjing Huang
CLL
59
6
0
11 Jun 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
  MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMeMoE
107
70
0
09 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
96
115
0
07 Jun 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge
  Distillation
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
81
4
0
07 Jun 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture
  of Experts
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Basil Mustafa
C. Riquelme
J. Puigcerver
Rodolphe Jenatton
N. Houlsby
VLMMoE
172
205
0
06 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
80
37
0
06 Jun 2022
ContraCLIP: Interpretable GAN generation driven by pairs of contrasting
  sentences
ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences
Christos Tzelepis
James Oldfield
Georgios Tzimiropoulos
Ioannis Patras
53
16
0
05 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
88
556
0
03 Jun 2022
Style-Content Disentanglement in Language-Image Pretraining
  Representations for Zero-Shot Sketch-to-Image Synthesis
Style-Content Disentanglement in Language-Image Pretraining Representations for Zero-Shot Sketch-to-Image Synthesis
Jan Zuiderveld
DRL
69
1
0
03 Jun 2022
VL-BEiT: Generative Vision-Language Pretraining
VL-BEiT: Generative Vision-Language Pretraining
Hangbo Bao
Wenhui Wang
Li Dong
Furu Wei
VLM
86
45
0
02 Jun 2022
CLIP4IDC: CLIP for Image Difference Captioning
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo
Tong Wang
Jorma T. Laaksonen
VLM
76
30
0
01 Jun 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal
  Pre-training
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
107
32
0
01 Jun 2022
Previous
123...242526...414243
Next