ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,118 papers shown
Title
Driving Referring Video Object Segmentation with Vision-Language
  Pre-trained Models
Driving Referring Video Object Segmentation with Vision-Language Pre-trained Models
Zikun Zhou
Wentao Xiong
Li Zhou
Xin Li
Zhenyu He
Yaowei Wang
VOSVLM
66
0
0
17 May 2024
Self-supervised vision-langage alignment of deep learning
  representations for bone X-rays analysis
Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis
A. Englebert
Anne-Sophie Collin
O. Cornu
Christophe De Vleeschouwer
78
1
0
14 May 2024
Alignment Helps Make the Most of Multimodal Data
Alignment Helps Make the Most of Multimodal Data
Christian Arnold
Andreas Küpfer
129
2
0
14 May 2024
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual
  Question Answering
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
90
1
0
13 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
79
2
0
12 May 2024
Similarity Guided Multimodal Fusion Transformer for Semantic Location
  Prediction in Social Media
Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media
Zhizhen Zhang
Ning Wang
Haojie Li
Zhihui Wang
68
0
0
09 May 2024
Exploring Vision Transformers for 3D Human Motion-Language Models with
  Motion Patches
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu
Mikihiro Tanaka
Kent Fujiwara
ViT
66
6
0
08 May 2024
POV Learning: Individual Alignment of Multimodal Models using Human Perception
POV Learning: Individual Alignment of Multimodal Models using Human Perception
Simon Werner
Katharina Christ
Laura Bernardy
Marion G. Müller
Achim Rettinger
33
0
0
07 May 2024
Language-Image Models with 3D Understanding
Language-Image Models with 3D Understanding
Jang Hyun Cho
Boris Ivanovic
Yulong Cao
Edward Schmerling
Yue Wang
...
Boyi Li
Yurong You
Philipp Krahenbuhl
Yan Wang
Marco Pavone
LRM
72
19
0
06 May 2024
Language-Enhanced Latent Representations for Out-of-Distribution
  Detection in Autonomous Driving
Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving
Zhenjiang Mao
Dong-You Jhong
Ao Wang
Ivan Ruchkin
OODD
90
2
0
02 May 2024
Transitive Vision-Language Prompt Learning for Domain Generalization
Transitive Vision-Language Prompt Learning for Domain Generalization
Liyuan Wang
Yan Jin
Zhen Chen
Jinlin Wu
Mengke Li
Yang Lu
Hanzi Wang
VLMVPVLMLRM
119
0
0
29 Apr 2024
Enhancing Interactive Image Retrieval With Query Rewriting Using Large
  Language Models and Vision Language Models
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models
Hongyi Zhu
Jia-Hong Huang
Stevan Rudinac
Evangelos Kanoulas
73
9
0
29 Apr 2024
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question
  Answering by Understanding Vietnamese Text in Images
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Huy Quang Pham
Thang Kien-Bao Nguyen
Quan Van Nguyen
Dan Quang Tran
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
97
4
0
29 Apr 2024
Spatio-Temporal Side Tuning Pre-trained Foundation Models for
  Video-based Pedestrian Attribute Recognition
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition
Tianlin Li
Qian Zhu
Jiandong Jin
Jun Zhu
Futian Wang
Bowei Jiang
Yaowei Wang
Yonghong Tian
ViT
89
4
0
27 Apr 2024
Medical Vision-Language Pre-Training for Brain Abnormalities
Medical Vision-Language Pre-Training for Brain Abnormalities
Masoud Monajatipoor
Zi-Yi Dou
Aichi Chien
Nanyun Peng
Kai-Wei Chang
VLM
108
0
0
27 Apr 2024
A review of deep learning-based information fusion techniques for
  multimodal medical image classification
A review of deep learning-based information fusion techniques for multimodal medical image classification
Yi-Hsuan Li
Mostafa EL HABIB DAHO
Pierre-Henri Conze
Rachid Zeghlache
Hugo Le Boité
R. Tadayoni
B. Cochener
M. Lamard
G. Quellec
67
49
0
23 Apr 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and
  Question Answering
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
65
1
0
22 Apr 2024
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking
  Enhances Visual Commonsense Reasoning
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
Mingjie Ma
Zhihuan Yu
Yichao Ma
Guohui Li
LRM
75
1
0
22 Apr 2024
General Item Representation Learning for Cold-start Content
  Recommendations
General Item Representation Learning for Cold-start Content Recommendations
Jooeun Kim
Jinri Kim
Kwangeun Yeo
Eungi Kim
Kyoung-Woon On
Jonghwan Mun
Joonseok Lee
VLM
57
1
0
22 Apr 2024
Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Konstantinos Vilouras
Pedro Sanchez
Alison Q. OÑeil
Sotirios A. Tsaftaris
MedIm
187
3
0
19 Apr 2024
Pre-trained Vision-Language Models Learn Discoverable Visual Concepts
Pre-trained Vision-Language Models Learn Discoverable Visual Concepts
Yuan Zang
Tian Yun
Hao Tan
Trung Bui
Chen Sun
VLMCoGe
111
10
0
19 Apr 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
161
7
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
161
20
0
18 Apr 2024
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI
  Agent
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
Wei Chen
Zhiyuan Li
LLMAG
46
5
0
17 Apr 2024
Improving Composed Image Retrieval via Contrastive Learning with Scaling
  Positives and Negatives
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Zhangchi Feng
Richong Zhang
Zhijie Nie
148
10
0
17 Apr 2024
Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition
Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition
Carlos Peñarrubia
Carlos Garrido-Munoz
J. J. Valero-Mas
Jorge Calvo-Zaragoza
207
2
0
17 Apr 2024
Consistency and Uncertainty: Identifying Unreliable Responses From
  Black-Box Vision-Language Models for Selective Visual Question Answering
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Zaid Khan
Yun Fu
AAML
83
10
0
16 Apr 2024
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
Jintao Sun
Zhedong Zheng
Gangyi Ding
Gangyi Ding
127
9
0
16 Apr 2024
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
177
5
0
16 Apr 2024
AIGeN: An Adversarial Approach for Instruction Generation in VLN
AIGeN: An Adversarial Approach for Instruction Generation in VLN
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
GAN
90
4
0
15 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
99
11
0
15 Apr 2024
Bridging Vision and Language Spaces with Assignment Prediction
Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
VLM
99
7
0
15 Apr 2024
Multimodal Cross-Document Event Coreference Resolution Using Linear
  Semantic Transfer and Mixed-Modality Ensembles
Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
Abhijnan Nath
Huma Jamil
Shafiuddin Rehan Ahmed
George Baker
Rahul Ghosh
James H. Martin
Nathaniel Blanchard
Nikhil Krishnaswamy
65
2
0
13 Apr 2024
Calibration & Reconstruction: Deep Integrated Language for Referring
  Image Segmentation
Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
Yichen Yan
Xingjian He
Sihan Chen
Jing Liu
ObjD
59
1
0
12 Apr 2024
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient
  Federated Learning
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen
J. P. Muñoz
Ali Jannesari
VLM
77
9
0
12 Apr 2024
Connecting NeRFs, Images, and Text
Connecting NeRFs, Images, and Text
Francesco Ballerini
Pierluigi Zama Ramirez
Roberto Mirabella
Samuele Salti
Luigi Di Stefano
116
5
0
11 Apr 2024
MedRG: Medical Report Grounding with Multi-modal Large Language Model
MedRG: Medical Report Grounding with Multi-modal Large Language Model
K. Zou
Yang Bai
Zhihao Chen
Yang Zhou
Yidi Chen
Kai Ren
Meng Wang
Xuedong Yuan
Xiaojing Shen
Huazhu Fu
MedIm
101
4
0
10 Apr 2024
Unified Multi-modal Diagnostic Framework with Reconstruction
  Pre-training and Heterogeneity-combat Tuning
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
91
1
0
09 Apr 2024
Contextual Chart Generation for Cyber Deception
Contextual Chart Generation for Cyber Deception
David D. Nguyen
David Liebowitz
Surya Nepal
S. Kanhere
Sharif Abuadbba
99
0
0
07 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
187
10
0
07 Apr 2024
Vision Transformers in Domain Adaptation and Generalization: A Study of
  Robustness
Vision Transformers in Domain Adaptation and Generalization: A Study of Robustness
Shadi Alijani
Jamil Fayyad
Homayoun Najjaran
OOD
118
1
0
05 Apr 2024
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
  Matching
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Dongzhi Jiang
Guanglu Song
Xiaoshi Wu
Renrui Zhang
Dazhong Shen
Zhuofan Zong
Yu Liu
Hongsheng Li
VLM
132
28
0
04 Apr 2024
DeViDe: Faceted medical knowledge for improved medical vision-language
  pre-training
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Haozhe Luo
Ziyu Zhou
Corentin Royer
Anjany Sekuboyina
Bjoern Menze
VLMViTMedIm
104
7
0
04 Apr 2024
Is CLIP the main roadblock for fine-grained open-world perception?
Is CLIP the main roadblock for fine-grained open-world perception?
Lorenzo Bianchi
F. Carrara
Nicola Messina
Fabrizio Falchi
VLM
81
4
0
04 Apr 2024
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities
  for Human Identification
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
Rui Wang
Chuanfu Shen
M. Marín-Jiménez
George Q. Huang
Shiqi Yu
CVBM
100
6
0
04 Apr 2024
Diverse and Tailored Image Generation for Zero-shot Multi-label
  Classification
Diverse and Tailored Image Generation for Zero-shot Multi-label Classification
Kai Zhang
Zhixiang Yuan
Tao Huang
VLM
82
4
0
04 Apr 2024
3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
Seung-bum Chung
Joohyun Park
Hyewon Kan
Hyeongyeop Kang
CLIP
77
1
0
03 Apr 2024
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by
  Cross-Modal Contrastive Learning
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Mengfei Du
Binhao Wu
Jiwen Zhang
Zhihao Fan
Zejun Li
Ruipu Luo
Xuanjing Huang
Zhongyu Wei
69
3
0
02 Apr 2024
Dialogue with Robots: Proposals for Broadening Participation and
  Research in the SLIVAR Community
Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community
Casey Kennington
Malihe Alikhani
Heather Pon-Barry
Katherine Atwell
Yonatan Bisk
...
Jivko Sinapov
Angela Stewart
Matthew Stone
Stefanie Tellex
Tom Williams
104
0
0
01 Apr 2024
SyncMask: Synchronized Attentional Masking for Fashion-centric
  Vision-Language Pretraining
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song
Taebaek Hwang
Jooyoung Yoon
Shunghyun Choi
Yeong Hyeon Gu
50
5
0
01 Apr 2024
Previous
123...678...414243
Next