ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03557
  4. Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
    VLM
ArXivPDFHTML

Papers citing "VisualBERT: A Simple and Performant Baseline for Vision and Language"

50 / 1,177 papers shown
Title
C${^2}$RL: Content and Context Representation Learning for Gloss-free
  Sign Language Translation and Retrieval
C2{^2}2RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval
Zhigang Chen
Benjia Zhou
Yiqing Huang
Jun Wan
Yibo Hu
Hailin Shi
Yanyan Liang
Zhen Lei
Du Zhang
VLM
SLR
40
1
0
19 Aug 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A
  Single-Branch Approach
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach
Muhammad Saad Saeed
Shah Nawaz
Muhammad Zaigham Zaheer
Muhammad Haris Khan
Karthik Nandakumar
Muhammad Haroon Yousaf
Hassan Sajjad
Tom De Schepper
Markus Schedl
30
0
0
14 Aug 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
54
13
0
09 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
47
6
0
02 Aug 2024
Towards Flexible Evaluation for Generative Visual Question Answering
Towards Flexible Evaluation for Generative Visual Question Answering
Huishan Ji
Q. Si
Zheng Lin
Weiping Wang
28
1
0
01 Aug 2024
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal
  Nuances
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances
Mieko Ochi
Ziwei Gong
D. Komura
Pengyuan Shi
Kaan Donbekci
Julia Hirschberg
44
10
0
31 Jul 2024
PIXELMOD: Improving Soft Moderation of Visual Misleading Information on
  Twitter
PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter
Pujan Paudel
Chen Ling
Jeremy Blackburn
Gianluca Stringhini
33
0
0
30 Jul 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
41
1
0
28 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
41
0
0
28 Jul 2024
HAPFI: History-Aware Planning based on Fused Information
HAPFI: History-Aware Planning based on Fused Information
Sujin Jeon
Suyeon Shin
Byoung-Tak Zhang
39
0
0
23 Jul 2024
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming
  Product Retrieval
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
Xiaowan Hu
Yiyi Chen
Yan Li
Minquan Wang
Haoqian Wang
Quan Chen
Han Li
Peng Jiang
AI4TS
37
0
0
23 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
21
1
0
23 Jul 2024
MuTT: A Multimodal Trajectory Transformer for Robot Skills
MuTT: A Multimodal Trajectory Transformer for Robot Skills
Claudius Kienle
Benjamin Alt
Onur Celik
P. Becker
Darko Katic
Rainer Jäkel
Gerhard Neumann
38
2
0
22 Jul 2024
Assessing Brittleness of Image-Text Retrieval Benchmarks from
  Vision-Language Models Perspective
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
54
0
0
21 Jul 2024
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models
  Through 3D Reconstruction
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction
Zaiqiao Meng
Hao Zhou
Yifang Chen
37
4
0
19 Jul 2024
Towards Zero-Shot Multimodal Machine Translation
Towards Zero-Shot Multimodal Machine Translation
Matthieu Futeral
Cordelia Schmid
Benoît Sagot
Rachel Bawden
35
3
0
18 Jul 2024
NavGPT-2: Unleashing Navigational Reasoning Capability for Large
  Vision-Language Models
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou
Yicong Hong
Zun Wang
Xin Eric Wang
Qi Wu
LM&Ro
45
19
0
17 Jul 2024
RepVF: A Unified Vector Fields Representation for Multi-task 3D
  Perception
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
Chunliang Li
Wencheng Han
Junbo Yin
Sanyuan Zhao
Jianbing Shen
32
3
0
15 Jul 2024
How and where does CLIP process negation?
How and where does CLIP process negation?
Vincent Quantmeyer
Pablo Mosteiro
Albert Gatt
CoGe
29
6
0
15 Jul 2024
How to Make Cross Encoder a Good Teacher for Efficient Image-Text
  Retrieval?
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen
Zongyang Ma
Ziqi Zhang
Zhongang Qi
Chunfeng Yuan
Bing Li
Junfu Pu
Ying Shan
Xiaojuan Qi
Weiming Hu
41
2
0
10 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
42
15
0
08 Jul 2024
AI as a Tool for Fair Journalism: Case Studies from Malta
AI as a Tool for Fair Journalism: Case Studies from Malta
Dylan Seychell
Gabriel Hili
Jonathan Attard
Konstantinos Makantatis
21
3
0
08 Jul 2024
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal
  Prompt Learning
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
Mainak Singha
Ankit Jha
Divyam Gupta
Pranav Singla
Biplab Banerjee
VLM
32
0
0
05 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
41
10
0
03 Jul 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for
  Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Ju-Seung Byun
Jiyun Chun
Jihyung Kil
Andrew Perrault
ReLM
LRM
39
1
0
25 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
33
1
0
24 Jun 2024
Towards Natural Language-Driven Assembly Using Foundation Models
Towards Natural Language-Driven Assembly Using Foundation Models
O. Joglekar
Tal Lancewicki
Shir Kozlovsky
Vladimir Tchuiev
Zohar Feldman
Dotan Di Castro
LM&Ro
37
0
0
23 Jun 2024
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote
  Sensing Image Understanding
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding
Linrui Xu
Ling Zhao
Wang Guo
Qiujun Li
Kewang Long
Kaiqi Zou
Yuhan Wang
Haifeng Li
AI4TS
33
7
0
18 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
51
26
0
16 Jun 2024
Multimodal Large Language Models with Fusion Low Rank Adaptation for
  Device Directed Speech Detection
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Shruti Palaskar
Oggi Rudovic
Sameer Dharur
Florian Pesce
G. Krishna
Aswin Sivaraman
Jack Berkowitz
Ahmed Hussen Abdelaziz
Saurabh N. Adya
Ahmed H. Tewfik
VLM
60
0
0
13 Jun 2024
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
Samar Fares
Klea Ziu
Toluwani Aremu
N. Durasov
Martin Takáč
Pascal Fua
Karthik Nandakumar
Ivan Laptev
VLM
AAML
40
4
0
13 Jun 2024
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
Kam Woh Ng
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
37
2
0
12 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent
  Compression Learning
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
47
5
0
11 Jun 2024
Learning Domain-Invariant Features for Out-of-Context News Detection
Learning Domain-Invariant Features for Out-of-Context News Detection
Yimeng Gu
Mengqi Zhang
Ignacio Castro
Shu Wu
Gareth Tyson
45
2
0
11 Jun 2024
Aligning Human Knowledge with Visual Concepts Towards Explainable
  Medical Image Classification
Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification
Yunhe Gao
Difei Gu
Mu Zhou
Dimitris N. Metaxas
41
4
0
08 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Interpretable Multimodal Out-of-context Detection with Soft Logic
  Regularization
Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization
Huanhuan Ma
Jinghao Zhang
Qiang Liu
Shu Wu
Liang Wang
51
1
0
07 Jun 2024
ArMeme: Propagandistic Content in Arabic Memes
ArMeme: Propagandistic Content in Arabic Memes
Firoj Alam
A. Hasnat
Fatema Ahmed
Md. Arid Hasan
Maram Hasanain
56
7
0
06 Jun 2024
Multimodal Reasoning with Multimodal Knowledge Graph
Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee
Yequan Wang
Jing Li
Min Zhang
44
15
0
04 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
56
0
0
04 Jun 2024
Hard Cases Detection in Motion Prediction by Vision-Language Foundation
  Models
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
Yi Yang
Qingwen Zhang
Kei Ikemura
Nazre Batool
John Folkesson
VLM
33
1
0
31 May 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action
  Anticipation using Large Video-Language Models
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
44
14
0
30 May 2024
Enhancing Large Vision Language Models with Self-Training on Image
  Comprehension
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
44
37
0
30 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
65
5
0
29 May 2024
FinEmbedDiff: A Cost-Effective Approach of Classifying Financial
  Documents with Vector Sampling using Multi-modal Embedding Models
FinEmbedDiff: A Cost-Effective Approach of Classifying Financial Documents with Vector Sampling using Multi-modal Embedding Models
Anjanava Biswas
Wrick Talukdar
21
1
0
28 May 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a
  Game-Theoretic View
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
48
2
0
27 May 2024
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning
Neha Kalibhat
Priyatham Kattakinda
Arman Zarei
Nikita Seleznev
Sam Sharpe
Samuel Sharpe
Senthil Kumar
S. Feizi
ViT
34
0
0
26 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
67
33
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
43
0
23 May 2024
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering
  in Pituitary Surgery
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
Runlong He
Mengya Xu
Adrito Das
Danyal Z. Khan
Sophia Bano
Hani J. Marcus
Danail Stoyanov
Matthew J. Clarkson
Mobarakol Islam
53
7
0
22 May 2024
Previous
123456...222324
Next