ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03557
  4. Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
    VLM
ArXivPDFHTML

Papers citing "VisualBERT: A Simple and Performant Baseline for Vision and Language"

50 / 1,177 papers shown
Title
A Novel Fusion Architecture for PD Detection Using Semi-Supervised
  Speech Embeddings
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
Tariq Adnan
Abdelrahman Abdelkader
Zipei Liu
Ekram Hossain
Sooyong Park
Md. Saiful Islam
Ehsan Hoque
35
2
0
21 May 2024
Resolving Word Vagueness with Scenario-guided Adapter for Natural
  Language Inference
Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference
Y. Liu
Mengyu Li
Di Liang
Ximing Li
Fausto Giunchiglia
Lan Huang
Xiaoyue Feng
Renchu Guan
39
3
0
21 May 2024
Enhancing Fine-Grained Image Classifications via Cascaded Vision
  Language Models
Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models
Canshi Wei
VLM
32
0
0
18 May 2024
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based
  Inferencing
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
Siddhant Agarwal
Shivam Sharma
Preslav Nakov
Tanmoy Chakraborty
24
4
0
18 May 2024
Review of Deep Representation Learning Techniques for Brain-Computer
  Interfaces and Recommendations
Review of Deep Representation Learning Techniques for Brain-Computer Interfaces and Recommendations
Pierre Guetschel
Sara Ahmadi
Michael Tangermann
35
0
0
17 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B Tenenbaum
Chuang Gan
38
177
0
15 May 2024
Self-supervised vision-langage alignment of deep learning
  representations for bone X-rays analysis
Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis
A. Englebert
Anne-Sophie Collin
O. Cornu
Christophe De Vleeschouwer
34
1
0
14 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
35
2
0
12 May 2024
Realizing Visual Question Answering for Education: GPT-4V as a
  Multimodal AI
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
Gyeong-Geon Lee
Xiaoming Zhai
43
5
0
12 May 2024
Similarity Guided Multimodal Fusion Transformer for Semantic Location
  Prediction in Social Media
Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media
Zhizhen Zhang
Ning Wang
Haojie Li
Zhihui Wang
34
0
0
09 May 2024
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question
  Answering by Understanding Vietnamese Text in Images
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Huy Quang Pham
Thang Kien-Bao Nguyen
Quan Van Nguyen
Dan Quang Tran
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
38
3
0
29 Apr 2024
Medical Vision-Language Pre-Training for Brain Abnormalities
Medical Vision-Language Pre-Training for Brain Abnormalities
Masoud Monajatipoor
Zi-Yi Dou
Aichi Chien
Nanyun Peng
Kai-Wei Chang
VLM
37
0
0
27 Apr 2024
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
Chunyi Li
Tengchuan Kou
...
Qi Yan
Youran Qu
Xiaohui Zeng
Lele Wang
Renjie Liao
58
29
0
25 Apr 2024
What Makes Multimodal In-Context Learning Work?
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
40
18
0
24 Apr 2024
Leveraging Speech for Gesture Detection in Multimodal Communication
Leveraging Speech for Gesture Detection in Multimodal Communication
E. Ghaleb
I. Burenko
Marlou Rasenberg
Wim Pouw
Ivan Toni
Peter Uhrig
Anna Wilson
Judith Holler
Asli Ozyurek
Raquel Fernández
SLR
30
4
0
23 Apr 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based
  Visual Question Answering
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
43
1
0
19 Apr 2024
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language
  Pre-training Models
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
Shouwei Ruan
Yinpeng Dong
Hanqing Liu
Yao Huang
Hang Su
Xingxing Wei
VLM
50
1
0
18 Apr 2024
Variational Multi-Modal Hypergraph Attention Network for Multi-Modal
  Relation Extraction
Variational Multi-Modal Hypergraph Attention Network for Multi-Modal Relation Extraction
Qian Li
Cheng Ji
Shu Guo
Yong Zhao
Qianren Mao
Shangguang Wang
Yuntao Wei
Jianxin Li
21
1
0
18 Apr 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
35
4
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
30
16
0
18 Apr 2024
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI
  Agent
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
Wei Chen
Zhiyuan Li
LLMAG
30
3
0
17 Apr 2024
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
Jintao Sun
Zhedong Zheng
Gangyi Ding
Gangyi Ding
40
7
0
16 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
30
10
0
15 Apr 2024
Conditional Prototype Rectification Prompt Learning
Conditional Prototype Rectification Prompt Learning
Haoxing Chen
Yaohui Li
Zizheng Huang
Yan Hong
Zhuoer Xu
Zhangxuan Gu
Jun Lan
Huijia Zhu
Weiqiang Wang
VLM
50
3
0
15 Apr 2024
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao
Renjie Pi
Jianhua Han
Xiaodan Liang
Hang Xu
Wei Zhang
Zhenguo Li
Dan Xu
VLM
ObjD
53
20
0
14 Apr 2024
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic
  Segmentation
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
Jiannan Ge
Lingxi Xie
Hongtao Xie
Pandeng Li
Xiaopeng Zhang
Yongdong Zhang
Qi Tian
VLM
26
3
0
08 Apr 2024
Contextual Chart Generation for Cyber Deception
Contextual Chart Generation for Cyber Deception
David D. Nguyen
David Liebowitz
Surya Nepal
S. Kanhere
Sharif Abuadbba
49
0
0
07 Apr 2024
Vision Transformers in Domain Adaptation and Generalization: A Study of
  Robustness
Vision Transformers in Domain Adaptation and Generalization: A Study of Robustness
Shadi Alijani
Jamil Fayyad
H. Najjaran
OOD
35
1
0
05 Apr 2024
DeViDe: Faceted medical knowledge for improved medical vision-language
  pre-training
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Haozhe Luo
Ziyu Zhou
Corentin Royer
Anjany Sekuboyina
Bjoern H. Menze
VLM
ViT
MedIm
48
7
0
04 Apr 2024
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
  Interleaved Visual-Textual Tokens
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Deyao Zhu
Jian Ding
Mohamed Elhoseiny
VLM
47
67
0
04 Apr 2024
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities
  for Human Identification
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
Rui Wang
Chuanfu Shen
M. Marín-Jiménez
George Q. Huang
Shiqi Yu
CVBM
53
4
0
04 Apr 2024
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and
  Multilingual Exploration of Persuasion in Memes
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes
Amirhossein Abaskohi
AmirHossein Dabiri Aghdam
Lele Wang
Giuseppe Carenini
15
1
0
03 Apr 2024
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Mamadou Keita
W. Hamidouche
Hessen Bougueffa Eutamene
Abdenour Hadid
Abdelmalik Taleb-Ahmed
69
7
0
02 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
50
1
0
01 Apr 2024
LLaMA-Excitor: General Instruction Tuning via Indirect Feature
  Interaction
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
47
6
0
01 Apr 2024
Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open
  Domain Generalization
Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha
Ankit Jha
Shirsha Bose
Ashwin Nair
Moloud Abdar
Biplab Banerjee
VLM
60
10
0
31 Mar 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via
  Negations
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
34
14
0
29 Mar 2024
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint
  Textual and Visual Clues
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
Shuang Li
Jiahua Wang
Lijie Wen
LRM
31
0
0
29 Mar 2024
Semantic Map-based Generation of Navigation Instructions
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
34
2
0
28 Mar 2024
Scaling Vision-and-Language Navigation With Offline RL
Scaling Vision-and-Language Navigation With Offline RL
Valay Bundele
Mahesh Bhupati
Biplab Banerjee
Aditya Grover
OffRL
29
1
0
27 Mar 2024
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement
Yuxuan Wang
Xiaoyuan Liu
VLM
57
0
0
24 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
49
1
0
22 Mar 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
46
14
0
22 Mar 2024
Grounding Spatial Relations in Text-Only Language Models
Grounding Spatial Relations in Text-Only Language Models
Gorka Azkune
Ander Salaberria
Eneko Agirre
42
0
0
20 Mar 2024
As Firm As Their Foundations: Can open-sourced foundation models be used
  to create adversarial examples for downstream tasks?
As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
Anjun Hu
Jindong Gu
Francesco Pinto
Konstantinos Kamnitsas
Philip Torr
AAML
SILM
40
5
0
19 Mar 2024
Modality-Agnostic fMRI Decoding of Vision and Language
Modality-Agnostic fMRI Decoding of Vision and Language
Mitja Nikolaus
Milad Mozafari
Nicholas Asher
Leila Reddy
Rufin VanRullen
35
3
0
18 Mar 2024
Prioritized Semantic Learning for Zero-shot Instance Navigation
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xander Sun
Louis Lau
Hoyard Zhi
Ronghe Qiu
Junwei Liang
40
8
0
18 Mar 2024
Deciphering Hate: Identifying Hateful Memes and Their Targets
Deciphering Hate: Identifying Hateful Memes and Their Targets
E. Hossain
Omar Sharif
M. M. Hoque
S. Preum
52
4
0
16 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
62
3
0
15 Mar 2024
PosSAM: Panoptic Open-vocabulary Segment Anything
PosSAM: Panoptic Open-vocabulary Segment Anything
VS Vibashan
Shubhankar Borse
Hyojin Park
Debasmit Das
Vishal M. Patel
Munawar Hayat
Fatih Porikli
VLM
MLLM
43
6
0
14 Mar 2024
Previous
12345...222324
Next