ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03557
  4. Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
    VLM
ArXivPDFHTML

Papers citing "VisualBERT: A Simple and Performant Baseline for Vision and Language"

50 / 1,180 papers shown
Title
A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
  Estimation
A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation
Qi-jun Zhao
Ce Zheng
Mengyuan Liu
Cheng Chen
41
14
0
06 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation
  for Grounding-Based Vision and Language Models
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
41
1
0
05 Nov 2023
Emotion Detection for Misinformation: A Review
Emotion Detection for Misinformation: A Review
Zhiwei Liu
Tianlin Zhang
Kailai Yang
Paul Thompson
Zeping Yu
Sophia Ananiadou
29
28
0
01 Nov 2023
Form follows Function: Text-to-Text Conditional Graph Generation based
  on Functional Requirements
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
Peter Zachares
Vahan Hovhannisyan
Alan Mosca
Yarin Gal
29
1
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
46
36
0
01 Nov 2023
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain
  Data
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
Antonis Antoniades
Yiyi Yu
Joseph Canzano
William Wang
Spencer L. Smith
AI4CE
40
11
0
31 Oct 2023
Partial Tensorized Transformers for Natural Language Processing
Partial Tensorized Transformers for Natural Language Processing
Subhadra Vadlamannati
Ryan Solgi
32
0
0
30 Oct 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner
  from Backbone
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
35
15
0
30 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural
  Languages
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
31
0
0
26 Oct 2023
A Survey on Transferability of Adversarial Examples across Deep Neural
  Networks
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
Jindong Gu
Xiaojun Jia
Pau de Jorge
Wenqain Yu
Xinwei Liu
...
Anjun Hu
Ashkan Khakzar
Zhijiang Li
Xiaochun Cao
Philip Torr
AAML
31
27
0
26 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
33
9
0
25 Oct 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
32
3
0
25 Oct 2023
Density of States Prediction of Crystalline Materials via Prompt-guided
  Multi-Modal Transformer
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
Namkyeong Lee
Heewoong Noh
Sungwon Kim
Dongmin Hyun
Gyoung S. Na
Chanyoung Park
29
5
0
24 Oct 2023
Multimodal Representations for Teacher-Guided Compositional Visual
  Reasoning
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
26
0
0
24 Oct 2023
Learning with Noisy Labels Using Collaborative Sample Selection and
  Contrastive Semi-Supervised Learning
Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning
Qing Miao
Xiaohe Wu
Chao Xu
Yanli Ji
Wangmeng Zuo
Yiwen Guo
Zhaopeng Meng
NoLa
40
3
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
41
50
0
23 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained
  Multimodal Models
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
21
9
0
23 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
26
9
0
22 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
23
0
0
22 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question
  Answering
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
30
4
0
19 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Keli Zhang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
118
0
16 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
21
3
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
30
1
0
15 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
49
1
0
14 Oct 2023
Mapping Memes to Words for Multimodal Hateful Meme Classification
Mapping Memes to Words for Multimodal Hateful Meme Classification
Giovanni Burbi
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
27
12
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
36
0
0
12 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long
Zewei Shi
Penghao Jiang
Yidong Gan
36
0
0
11 Oct 2023
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer
  for Document Question Answering
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering
Nianlong Gu
Yingqiang Gao
Richard H. R. Hahnloser
RALM
41
0
0
10 Oct 2023
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal
  Information Extraction
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Yusheng Huang
Zhouhan Lin
47
5
0
10 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and
  Decoupling
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
29
12
0
08 Oct 2023
Understanding the Robustness of Multi-modal Contrastive Learning to
  Distribution Shift
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
Yihao Xue
Siddharth Joshi
Dang Nguyen
Baharan Mirzasoleiman
VLM
31
4
0
08 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video
  Understanding Tasks
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
37
0
0
07 Oct 2023
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based
  Question Answering
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Bruno Souza
Marius Aasan
Hélio Pedrini
Adín Ramirez Rivera
SSL
37
2
0
03 Oct 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
29
18
0
28 Sep 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan Kankanhalli
VLM
30
3
0
28 Sep 2023
Rapid Network Adaptation: Learning to Adapt Neural Networks Using
  Test-Time Feedback
Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback
Teresa Yeo
Oğuzhan Fatih Kar
Zahra Sodagar
Amir Zamir
TTA
OOD
31
3
0
27 Sep 2023
Survey of Social Bias in Vision-Language Models
Survey of Social Bias in Vision-Language Models
Nayeon Lee
Yejin Bang
Holy Lovenia
Samuel Cahyawijaya
Wenliang Dai
Pascale Fung
VLM
47
16
0
24 Sep 2023
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li
Dongze Lian
Zhihe Lu
Jiawang Bai
Zhibo Chen
Xinchao Wang
VLM
43
61
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
Improving Multimodal Classification of Social Media Posts by Leveraging
  Image-Text Auxiliary Tasks
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae Sánchez Villegas
Daniel Preoctiuc-Pietro
Nikolaos Aletras
36
2
0
14 Sep 2023
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
Anh Pham Thi Minh
An Duc Nguyen
Georgios Tzimiropoulos
VPVLM
VLM
25
3
0
14 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
51
3
0
13 Sep 2023
Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed
  Hinglish Memes
Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Shreyash Mishra
S. Suryavardan
Megha Chakraborty
Parth Patwa
Anku Rani
...
Amitava Das
A. Sheth
Manoj Kumar Chinnakotla
Asif Ekbal
Srijan Kumar
22
5
0
12 Sep 2023
Multi-modal Extreme Classification
Multi-modal Extreme Classification
Anshul Mittal
Kunal Dahiya
Shreya Malani
Janani Ramaswamy
Seba Kuruvilla
Jitendra Ajmera
Keng-hao Chang
Sumeet Agarwal
Purushottam Kar
Manik Varma
29
8
0
10 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
25
0
08 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
31
2
0
06 Sep 2023
Parameter and Computation Efficient Transfer Learning for
  Vision-Language Pre-trained Models
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
26
7
0
04 Sep 2023
Unified Pre-training with Pseudo Texts for Text-To-Image Person
  Re-identification
Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
Zhiyin Shao
Xinyu Zhang
Changxing Ding
Jian Wang
Jingdong Wang
31
17
0
04 Sep 2023
Multimodal Contrastive Learning with Hard Negative Sampling for Human
  Activity Recognition
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
Irfan Essa
SSL
23
3
0
03 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Fengxiang Bie
Yibo Yang
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
Shuaiwen Leon Song
EGVM
33
19
0
02 Sep 2023
Previous
123...678...222324
Next