ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.08530
  4. Cited By
VL-BERT: Pre-training of Generic Visual-Linguistic Representations

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

22 August 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
    VLM
    MLLM
    SSL
ArXivPDFHTML

Papers citing "VL-BERT: Pre-training of Generic Visual-Linguistic Representations"

50 / 1,012 papers shown
Title
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
27
9
0
25 Oct 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
32
3
0
25 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A
  Robust Approach for Information Extraction in Visually-Rich Documents
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
16
0
0
25 Oct 2023
Video Referring Expression Comprehension via Transformer with
  Content-conditioned Query
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
24
6
0
25 Oct 2023
GD-COMET: A Geo-Diverse Commonsense Inference Model
GD-COMET: A Geo-Diverse Commonsense Inference Model
Mehar Bhatia
Vered Shwartz
21
6
0
23 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained
  Multimodal Models
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
18
9
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
27
3
0
23 Oct 2023
GeoLM: Empowering Language Models for Geospatially Grounded Language
  Understanding
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding
Zekun Li
Wenxuan Zhou
Yao-Yi Chiang
Muhao Chen
SyDa
34
26
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
18
0
0
22 Oct 2023
Semi-supervised multimodal coreference resolution in image narrations
Semi-supervised multimodal coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
38
3
0
20 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question
  Answering
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
30
4
0
19 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
  Interactions
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
22
1
0
18 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
30
5
0
17 Oct 2023
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
Tianyi Wei
Dongdong Chen
Wenbo Zhou
Jing Liao
Weiming Zhang
Gang Hua
Neng H. Yu
34
12
0
16 Oct 2023
A Recent Survey of Heterogeneous Transfer Learning
A Recent Survey of Heterogeneous Transfer Learning
Runxue Bao
Yiming Sun
Yuhe Gao
Jindong Wang
Qiang Yang
Zhi-Hong Mao
Ye Ye
30
4
0
12 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided
  Image Editing
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
Yue Jiang
Yingya Zhang
Jing Dong
31
2
0
12 Oct 2023
Multimodal Graph Learning for Generative Tasks
Multimodal Graph Learning for Generative Tasks
Minji Yoon
Jing Yu Koh
Bryan Hooi
Ruslan Salakhutdinov
33
7
0
11 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized
  Instructions
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
27
11
0
08 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and
  Decoupling
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
29
12
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
45
25
0
07 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
25
10
0
02 Oct 2023
Social Media Fashion Knowledge Extraction as Captioning
Social Media Fashion Knowledge Extraction as Captioning
Yifei Yuan
Wenxuan Zhang
Yang Deng
Wai Lam
19
1
0
28 Sep 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan Kankanhalli
VLM
27
3
0
28 Sep 2023
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal
  Sponsored Search
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search
Yuanmin Tang
Daling Wang
Keke Gai
Wenfang Wu
Yifei Zhang
Gang Xiong
Qi Wu
29
4
0
28 Sep 2023
Context-I2W: Mapping Images to Context-dependent Words for Accurate
  Zero-Shot Composed Image Retrieval
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jiahao Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Yue Hu
Qi Wu
33
33
0
28 Sep 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via
  Multi-Modal Causal Attention
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Z. Yao
Xiaoxia Wu
Conglong Li
Minjia Zhang
Heyang Qi
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
36
11
0
25 Sep 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
23
26
0
25 Sep 2023
Survey of Social Bias in Vision-Language Models
Survey of Social Bias in Vision-Language Models
Nayeon Lee
Yejin Bang
Holy Lovenia
Samuel Cahyawijaya
Wenliang Dai
Pascale Fung
VLM
47
16
0
24 Sep 2023
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li
Dongze Lian
Zhihe Lu
Jiawang Bai
Zhibo Chen
Xinchao Wang
VLM
43
61
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
Multi-modal Domain Adaptation for REG via Relation Transfer
Multi-modal Domain Adaptation for REG via Relation Transfer
Yifan Ding
Liqiang Wang
Boqing Gong
25
0
0
23 Sep 2023
DimCL: Dimensional Contrastive Learning For Improving Self-Supervised
  Learning
DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning
Thanh Nguyen
T. Pham
Chaoning Zhang
Tung M. Luu
Thang Vu
Chang D. Yoo
27
9
0
21 Sep 2023
StructChart: Perception, Structuring, Reasoning for Visual Chart
  Understanding
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
Renqiu Xia
Bo-Wen Zhang
Hao Peng
Hancheng Ye
Xiangchao Yan
Peng Ye
Botian Shi
Yu Qiao
Junchi Yan
14
0
0
20 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
LM&MA
VLM
20
64
0
18 Sep 2023
Code quality assessment using transformers
Code quality assessment using transformers
Mosleh Mahamud
Isak Samsten
ViT
13
0
0
17 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
33
3
0
16 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng-Wei Zhang
Han Hu
Dongdong Chen
Baining Guo
DiffM
VLM
51
93
0
07 Sep 2023
Interpretable Visual Question Answering via Reasoning Supervision
Interpretable Visual Question Answering via Reasoning Supervision
Maria Parelli
Dimitrios Mallis
Markos Diomataris
Vassilis Pitsikalis
LRM
30
2
0
07 Sep 2023
Parameter and Computation Efficient Transfer Learning for
  Vision-Language Pre-trained Models
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
26
7
0
04 Sep 2023
Unified Pre-training with Pseudo Texts for Text-To-Image Person
  Re-identification
Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
Zhiyin Shao
Xinyu Zhang
Changxing Ding
Jian Wang
Jingdong Wang
31
17
0
04 Sep 2023
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
  Vision-Language Models
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models
Cheng Shi
Sibei Yang
VLM
21
21
0
03 Sep 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Zhengyuan Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
28
8
0
31 Aug 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
32
5
0
30 Aug 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for
  Multimodal Machine Translation
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
39
9
0
29 Aug 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
22
21
0
28 Aug 2023
A Multi-Task Semantic Decomposition Framework with Task-specific
  Pre-training for Few-Shot NER
A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NER
Guanting Dong
Zechen Wang
Jinxu Zhao
Gang Zhao
Daichi Guo
...
Keqing He
Xuefeng Li
Liwen Wang
Xinyue Cui
Weiran Xu
37
19
0
28 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
50
796
0
24 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt
  interaction tasks
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
19
4
0
24 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Multi-event Video-Text Retrieval
Multi-event Video-Text Retrieval
Gengyuan Zhang
Jisen Ren
Jindong Gu
Volker Tresp
19
13
0
22 Aug 2023
Previous
12345...192021
Next