ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,119 papers shown
Title
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Hantao Yao
Rui Zhang
Changsheng Xu
VLMVPVLM
206
227
0
23 Mar 2023
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image
  Person Retrieval
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
Ding Jiang
Mang Ye
111
157
0
22 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
123
51
0
22 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
60
12
0
21 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning
  Evaluation
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
93
60
0
21 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedImViT
65
1
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
111
33
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLMVLM
74
31
0
20 Mar 2023
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D
  Recognition
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Deepti Hegde
Jeya Maria Jose Valanarasu
Vishal M. Patel
CLIP
122
68
0
20 Mar 2023
MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds
MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds
Xiaoting Wang
Yanxiang Zhang
Yongzhen Zhang
91
6
0
20 Mar 2023
Audio-Text Models Do Not Yet Leverage Natural Language
Audio-Text Models Do Not Yet Leverage Natural Language
Ho-Hsiang Wu
Oriol Nieto
J. P. Bello
Justin Salamon
VLM
81
33
0
19 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
101
6
0
18 Mar 2023
MultiModal Bias: Introducing a Framework for Stereotypical Bias
  Assessment beyond Gender and Race in Vision Language Models
MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models
Sepehr Janghorbani
Gerard de Melo
VLM
117
12
0
16 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
105
28
0
16 Mar 2023
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
Kunyang Han
Yong-Jin Liu
Jun Hao Liew
Henghui Ding
Yunchao Wei
...
Yitong Wang
Yansong Tang
Yujiu Yang
Jiashi Feng
Yao-Min Zhao
VLM
103
40
0
16 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
164
15
0
14 Mar 2023
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
93
71
0
13 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLMMoE
81
68
0
13 Mar 2023
Evaluating Visual Number Discrimination in Deep Neural Networks
Evaluating Visual Number Discrimination in Deep Neural Networks
Ivana Kajić
Aida Nematzadeh
26
0
0
13 Mar 2023
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
Yueming Lyu
Tianwei Lin
Fu Li
Dongliang He
Jing Dong
Tien-Ping Tan
93
41
0
11 Mar 2023
Single-branch Network for Multimodal Training
Single-branch Network for Multimodal Training
M. S. Saeed
Shah Nawaz
M. H. Khan
M. Zaheer
Karthik Nandakumar
Muhammad Haroon Yousaf
Arif Mahmood
42
13
0
10 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal
  Pre-training
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
61
1
0
09 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
89
2
0
09 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
99
27
0
09 Mar 2023
Sample Efficient Multimodal Semantic Augmentation for Incremental
  Summarization
Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Sumanta Bhattacharyya
R. Manuvinakurike
Sahisnu Mazumder
Saurav Sahay
VLM
63
0
0
08 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
120
554
0
07 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
  Using Scene Object Spectrum Grounding
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
93
21
0
07 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
166
1,679
0
06 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIPVLM
72
46
0
06 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
80
1
0
05 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLMMLLM
145
25
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
87
45
0
04 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
104
4
0
04 Mar 2023
Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer
Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer
Wen Zhang
Yushan Zhu
Yin Hua
Yuxia Geng
Yufen Huang
Yajing Xu
Wenting Song
Hua-zeng Chen
85
27
0
03 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
203
11
0
03 Mar 2023
The style transformer with common knowledge optimization for image-text
  retrieval
The style transformer with common knowledge optimization for image-text retrieval
Wenrui Li
Zhengyu Ma
Jinqiao Shi
Xiaopeng Fan
ViT
59
5
0
01 Mar 2023
Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment
Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment
Dandan Shan
Zihan Li
Wentao Chen
Qingde Li
Jie Tian
Qingqi Hong
96
9
0
01 Mar 2023
Meta Learning to Bridge Vision and Language Models for Multimodal
  Few-Shot Learning
Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning
Ivona Najdenkoska
Xiantong Zhen
Marcel Worring
VLM
141
20
0
28 Feb 2023
VQA with Cascade of Self- and Co-Attention Blocks
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
44
1
0
28 Feb 2023
TextIR: A Simple Framework for Text-based Editable Image Restoration
TextIR: A Simple Framework for Text-based Editable Image Restoration
Yun-Hao Bai
Cairong Wang
Shuzhao Xie
Chao Dong
Chun Yuan
Zhi Wang
DiffM
120
15
0
28 Feb 2023
Multi-Layer Attention-Based Explainability via Transformers for Tabular
  Data
Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
Andrea Trevino Gavito
Diego Klabjan
J. Utke
LMTD
58
3
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
185
242
0
27 Feb 2023
Aligning Bag of Regions for Open-Vocabulary Object Detection
Aligning Bag of Regions for Open-Vocabulary Object Detection
Size Wu
Wenwei Zhang
Sheng Jin
Wentao Liu
Chen Change Loy
VLMObjD
97
116
0
27 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
118
37
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language
  Pre-training Model
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
52
12
0
27 Feb 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu
Hanzhuo Tan
Jing Li
Piji Li
86
8
0
26 Feb 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Percy Liang
LM&RoSSL
138
156
0
24 Feb 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Mengde Xu
Zheng Zhang
Fangyun Wei
Han Hu
Xiang Bai
VLM
89
273
0
23 Feb 2023
X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval
  Augmentation
X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation
Tom van Sonsbeek
M. Worring
58
14
0
22 Feb 2023
Connecting Vision and Language with Video Localized Narratives
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
143
23
0
22 Feb 2023
Previous
123...171819...414243
Next