ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03557
  4. Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
    VLM
ArXivPDFHTML

Papers citing "VisualBERT: A Simple and Performant Baseline for Vision and Language"

50 / 1,180 papers shown
Title
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
34
3
0
03 Jul 2023
S-Omninet: Structured Data Enhanced Universal Multimodal Learning
  Architecture
S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture
Ye Xue
Diego Klabjan
J. Utke
18
0
0
01 Jul 2023
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
Zhenyu Zhang
Wenhao Chai
Zhongyu Jiang
Tianbo Ye
Xiuming Zhang
Lei Li
Gaoang Wang
3DH
31
4
0
29 Jun 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual
  Question Answering
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
45
7
0
29 Jun 2023
Seeing in Words: Learning to Classify through Language Bottlenecks
Seeing in Words: Learning to Classify through Language Bottlenecks
Khalid Saifullah
Yuxin Wen
Jonas Geiping
Micah Goldblum
Tom Goldstein
VLM
21
2
0
29 Jun 2023
Reconstructing the Hemodynamic Response Function via a Bimodal
  Transformer
Reconstructing the Hemodynamic Response Function via a Bimodal Transformer
Yoni Choukroun
Lior Golgher
P. Blinder
L. Wolf
MedIm
21
0
0
28 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
45
136
0
28 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Yiyi Zhou
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLM
VPVLM
27
2
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
50
702
0
26 Jun 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust
  Instruction Tuning
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
40
244
0
26 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching
  Attention and Input
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
25
4
0
25 Jun 2023
Exploring the Role of Audio in Video Captioning
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
31
2
0
21 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
32
54
0
20 Jun 2023
Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction
Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction
Qian Li
Shu Guo
Cheng Ji
Xutan Peng
Shiyao Cui
Jianxin Li
34
13
0
19 Jun 2023
M3PT: A Multi-Modal Model for POI Tagging
M3PT: A Multi-Modal Model for POI Tagging
Jingsong Yang
Guanzhou Han
Deqing Yang
Jingping Liu
Yanghua Xiao
Xiang Xu
Baohua Wu
Shenghua Ni
50
3
0
16 Jun 2023
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal
  Contrastive Training
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training
Chong Liu
Yuqi Zhang
Hongsong Wang
Weihua Chen
F. Wang
Yan Huang
Yixing Shen
Liang Wang
19
25
0
15 Jun 2023
Improving Selective Visual Question Answering by Learning from Your
  Peers
Improving Selective Visual Question Answering by Learning from Your Peers
Corentin Dancette
Spencer Whitehead
Rishabh Maheshwary
Ramakrishna Vedantam
Stefan Scherer
Xinlei Chen
Matthieu Cord
Marcus Rohrbach
AAML
OOD
40
16
0
14 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Ziqiao Ma
Jiayi Pan
J. Chai
ObjD
VLM
29
9
0
14 Jun 2023
Multi-modal Pre-training for Medical Vision-language Understanding and
  Generation: An Empirical Study with A New Benchmark
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark
Li Xu
Bo Liu
Ameer Hamza Khan
Lu Fan
Xiao-Ming Wu
LM&MA
35
9
0
10 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIP
VLM
38
18
0
09 Jun 2023
Read, look and detect: Bounding box annotation from image-caption pairs
Read, look and detect: Bounding box annotation from image-caption pairs
E. Sanchez
ObjD
38
0
0
09 Jun 2023
Modular Visual Question Answering via Code Generation
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
29
47
0
08 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
19
9
0
08 Jun 2023
Object Detection with Transformers: A Review
Object Detection with Transformers: A Review
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViT
MU
26
28
0
07 Jun 2023
On the Generalization of Multi-modal Contrastive Learning
On the Generalization of Multi-modal Contrastive Learning
Qi Zhang
Yifei Wang
Yisen Wang
22
24
0
07 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
27
0
0
06 Jun 2023
Leverage Points in Modality Shifts: Comparing Language-only and
  Multimodal Word Representations
Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations
Aleksey Tikhonov
Lisa Bylinina
Denis Paperno
11
2
0
04 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order
  Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
46
1
0
04 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained
  Vision-Language Models
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLM
VLM
45
17
0
03 Jun 2023
LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in
  Remote Sensing
LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing
Leonard Hackel
Kai Norman Clasen
Mahdyar Ravanbakhsh
Begum Demir Technische Universitat Berlin
17
10
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
16
0
0
01 Jun 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
34
0
0
31 May 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL
  Models
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Yikang Shen
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
62
53
0
31 May 2023
Scalable Performance Analysis for Vision-Language Models
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
35
1
0
30 May 2023
TotalDefMeme: A Multi-Attribute Meme dataset on Total Defence in
  Singapore
TotalDefMeme: A Multi-Attribute Meme dataset on Total Defence in Singapore
Nirmalendu Prakash
Ming Shan Hee
Roy Ka-Wei Lee
24
13
0
29 May 2023
Deeply Coupled Cross-Modal Prompt Learning
Deeply Coupled Cross-Modal Prompt Learning
Xuejing Liu
Wei Tang
Jinghui Lu
Rui Zhao
Zhaojun Guo
Fei Tan
VLM
28
17
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image
  Captions
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
26
27
0
28 May 2023
MemeGraphs: Linking Memes to Knowledge Graphs
MemeGraphs: Linking Memes to Knowledge Graphs
Vasiliki Kougia
Simon Fetzel
Thomas Kirchmair
Erion cCano
Sina Moayed Baharlou
Sahand Sharifzadeh
Benjamin Roth
32
10
0
28 May 2023
Decoding the Underlying Meaning of Multimodal Hateful Memes
Decoding the Underlying Meaning of Multimodal Hateful Memes
Ming Shan Hee
Wen-Haw Chong
Roy Ka-Wei Lee
38
33
0
28 May 2023
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge
  Interaction Graph for Lightweight Text-Image Retrieval
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval
Jiapeng Wang
Chengyu Wang
Xiaodan Wang
Jun Huang
Lianwen Jin
VLM
37
4
0
28 May 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
33
2
0
27 May 2023
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in
  Language Models
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models
Yao Yao
Z. Li
Hai Zhao
ReLM
LRM
13
16
0
26 May 2023
LANISTR: Multimodal Learning from Structured and Unstructured Data
LANISTR: Multimodal Learning from Structured and Unstructured Data
Sayna Ebrahimi
Sercan Ö. Arik
Yihe Dong
Tomas Pfister
20
4
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
37
21
0
25 May 2023
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
Xuming Hu
Zhijiang Guo
Zhiyang Teng
I. King
Philip S. Yu
33
17
0
25 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
40
52
0
25 May 2023
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched
  Contextualization
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization
Shivam Sharma
S Ramaneswaran
Udit Arora
Md. Shad Akhtar
Tanmoy Chakraborty
38
9
0
25 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
30
1
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Chenyu You
LRM
27
100
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
33
8
0
24 May 2023
Previous
123...8910...222324
Next