ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03557
  4. Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
    VLM
ArXivPDFHTML

Papers citing "VisualBERT: A Simple and Performant Baseline for Vision and Language"

50 / 1,177 papers shown
Title
GeoMM: On Geodesic Perspective for Multi-modal Learning
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
22
0
0
16 May 2025
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Pengfei Wang
Guohai Xu
Weinong Wang
Junjie Yang
Jie Lou
Yunhua Xue
24
0
0
15 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yangyi Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
28
0
0
13 May 2025
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Aishwarya Venkataramanan
P. Bodesheim
Joachim Denzler
BDL
VLM
64
0
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
88
0
0
30 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
89
0
0
29 Apr 2025
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
Shuanglin Yan
Neng Dong
Shuang Li
Rui Yan
Hao Tang
Jing Qin
139
0
0
25 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
56
0
0
25 Apr 2025
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
41
0
0
24 Apr 2025
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
Hariseetharam Gunduboina
Muhammad Haris Khan
Biplab Banerjee
VLM
47
0
0
23 Apr 2025
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
Ali Anaissi
Junaid Akram
Kunal Chaturvedi
Ali Braytee
33
0
0
23 Apr 2025
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Songtao Jiang
Yuan Wang
Sibo Song
Wenjie Qu
Zijie Meng
Bohan Lei
Jian Wu
Jimeng Sun
Zuozhu Liu
MedIm
VLM
42
0
0
20 Apr 2025
TSAL: Few-shot Text Segmentation Based on Attribute Learning
TSAL: Few-shot Text Segmentation Based on Attribute Learning
Chenming Li
Chengxu Liu
Yuanting Fan
Xiao Jin
Xingsong Hou
Xueming Qian
VLM
45
0
0
15 Apr 2025
Audio and Multiscale Visual Cues Driven Cross-modal Transformer for Idling Vehicle Detection
Audio and Multiscale Visual Cues Driven Cross-modal Transformer for Idling Vehicle Detection
Xiwen Li
Ross T. Whitaker
Tolga Tasdizen
30
0
0
15 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Liang Zhan
MedIm
38
0
0
09 Apr 2025
DiffusionCom: Structure-Aware Multimodal Diffusion Model for Multimodal Knowledge Graph Completion
DiffusionCom: Structure-Aware Multimodal Diffusion Model for Multimodal Knowledge Graph Completion
Wei Huang
M. Liang
Peining Li
Xu Hou
Yawen Li
Junping Du
Zhe Xue
Zeli Guan
DiffM
38
0
0
09 Apr 2025
A Lightweight Large Vision-language Model for Multimodal Medical Images
A Lightweight Large Vision-language Model for Multimodal Medical Images
Belal Alsinglawi
Chris McCarthy
Sara Webb
Christopher Fluke
Navid Toosy Saidy
LM&MA
47
0
0
08 Apr 2025
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Runnan Fang
Xiaobin Wang
Yuan Liang
Shuofei Qiao
Jialong Wu
...
N. Zhang
Yong Jiang
Pengjun Xie
Fei Huang
Hongyu Chen
LLMAG
71
0
0
04 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
47
0
0
02 Apr 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yijia Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
85
5
0
25 Mar 2025
FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments
FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments
Sree Bhargavi Balija
FedML
41
0
0
25 Mar 2025
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Ziming Wei
Bingqian Lin
Yunshuang Nie
Jiaqi Chen
Shikui Ma
Hang Xu
Xiaodan Liang
56
0
0
23 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
53
1
0
21 Mar 2025
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Pengyu Liu
Guohua Dong
D. Guo
Kun Li
Fengling Li
Xun Yang
Meng Wang
Xiaomin Ying
AI4CE
41
0
0
20 Mar 2025
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
Jiadong Wang
Weiwei Song
Hao Chen
J. Ren
Huimin Zhao
62
0
0
18 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
71
0
0
13 Mar 2025
Anatomy-Aware Conditional Image-Text Retrieval
Meng Zheng
Jiajin Zhang
Benjamin Planche
Zhongpai Gao
Terrence Chen
Ziyan Wu
MedIm
57
0
0
10 Mar 2025
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction
Zongzheng Zhang
Xinrun Li
Sizhe Zou
Guoxuan Chi
Siqi Li
...
Guoliang Wang
Guantian Zheng
Leichen Wang
Hang Zhao
Hao Zhao
62
0
0
10 Mar 2025
Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations
Khoi Anh Nguyen
Linh Yen Vu
Thang Dinh Duong
Thuan Nguyen Duong
Huy Thanh Nguyen
V. Q. Dinh
35
3
0
05 Mar 2025
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang
Hao Wu
Huazhu Fu
Daoqiang Zhang
VLM
Presented at ResearchTrend Connect | VLM on 28 Mar 2025
135
0
0
04 Mar 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
80
1
0
25 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
89
2
0
24 Feb 2025
ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems
Haibo Xing
Kanefumi Matsuyama
Hao Deng
Jinxin Hu
Yu Zhang
Xiaoyi Zeng
43
0
0
22 Feb 2025
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
Kimia Ramezan
Alireza Amiri Bavandpour
Yifei Yuan
Clemencia Siro
Mohammad Aliannejadi
31
0
0
17 Feb 2025
Learning Generalizable Prompt for CLIP with Class Similarity Knowledge
Learning Generalizable Prompt for CLIP with Class Similarity Knowledge
Sehun Jung
Hyang-won Lee
VLM
VPVLM
55
0
0
17 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-Wei Lee
VLM
83
0
0
16 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
70
3
0
11 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
93
0
0
09 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
Dawei Yin
62
1
0
09 Feb 2025
Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection
Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection
Zehong Yan
Peng Qi
W. Hsu
M. Lee
47
0
0
24 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
32
0
0
20 Jan 2025
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification
Shijing Chen
Mohamed Reda Bouadjenek
Shoaib Jameel
Usman Naseem
Basem Suleiman
Flora Salim
Hakim Hacid
Imran Razzak
39
1
0
12 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
11
0
06 Jan 2025
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi
Shivam Sharma
Tanmoy Chakraborty
36
1
0
31 Dec 2024
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking
  in Escort-Advertisement Data
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data
V. Saxena
Benjamin Bashpole
Gijs van Dijck
Gerasimos Spanakis
72
0
0
18 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Bringing Multimodality to Amazon Visual Search System
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
74
3
0
17 Dec 2024
Does VLM Classification Benefit from LLM Description Semantics?
Does VLM Classification Benefit from LLM Description Semantics?
Pingchuan Ma
Lennart Rietdorf
Dmytro Kotovenko
Vincent Tao Hu
Bjorn Ommer
VLM
74
1
0
16 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
86
0
0
13 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Unified Framework for Open-World Compositional Zero-shot Learning
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
64
0
0
05 Dec 2024
1234...222324
Next