Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,119 papers shown
Title
Vision-Language Navigation with Embodied Intelligence: A Survey
Peng Gao
Peng Wang
Feng Gao
Fei Wang
Ruyue Yuan
LM&Ro
101
3
0
22 Feb 2024
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Wonjoong Kim
S. Park
Yeonjun In
Seokwon Han
Chanyoung Park
LRM
ReLM
88
4
0
22 Feb 2024
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
82
10
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
139
64
0
19 Feb 2024
Cobra Effect in Reference-Free Image Captioning Metrics
Zheng Ma
Changxin Wang
Yawen Ouyang
Fei Zhao
Jianbing Zhang
Shujian Huang
Jiajun Chen
92
2
0
18 Feb 2024
Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability
Yejun Yoon
Seunghyun Yoon
Kunwoo Park
108
1
0
17 Feb 2024
Strong hallucinations from negation and how to fix them
Nicholas Asher
Swarnadeep Bhar
ReLM
LRM
54
5
0
16 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
116
16
0
15 Feb 2024
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
100
6
0
14 Feb 2024
The Future of Cognitive Strategy-enhanced Persuasive Dialogue Agents: New Perspectives and Trends
Mengqi Chen
Bin Guo
Hao Wang
Haoyu Li
Qian Zhao
Jingqi Liu
Yasan Ding
Yan Pan
Zhiwen Yu
LLMAG
84
2
0
07 Feb 2024
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
Xiaohan Yu
Li Zhang
Xin Zhao
Yue Wang
Zhongrui Ma
76
11
0
07 Feb 2024
A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation
Zhengbo Wang
Jian Liang
Lijun Sheng
Ran He
Zilei Wang
Tieniu Tan
VLM
105
23
0
06 Feb 2024
Multimodal Rationales for Explainable Visual Question Answering
Kun Li
G. Vosselman
Michael Ying Yang
132
2
0
06 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
79
3
0
04 Feb 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
80
3
0
31 Jan 2024
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
66
5
0
30 Jan 2024
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Ivana Beňová
Jana Kosecka
Michal Gregor
Martin Tamajka
Marcel Veselý
Marian Simko
61
1
0
29 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
139
7
0
25 Jan 2024
LanDA: Language-Guided Multi-Source Domain Adaptation
Zhenbin Wang
Lei Zhang
Lituan Wang
Minjuan Zhu
91
10
0
25 Jan 2024
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models
Hongzhan Lin
Ziyang Luo
Wei Gao
Jing Ma
Bo Wang
Ruichao Yang
66
16
0
24 Jan 2024
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Ci-Siang Lin
Chien-Yi Wang
Yu-Chiang Frank Wang
Min-Hung Chen
VLM
252
0
0
22 Jan 2024
Collaborative Position Reasoning Network for Referring Image Segmentation
Jianjian Cao
Beiya Dai
Yulin Li
Xiameng Qin
Jingdong Wang
110
0
0
22 Jan 2024
MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts
Haoqiang Guo
Sendong Zhao
Hao Wang
Yanrui Du
Bing Qin
AI4CE
81
8
0
21 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
108
4
0
19 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
183
41
0
16 Jan 2024
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering
Qing Li
Lei Li
Yu Li
LM&MA
AI4MH
91
7
0
15 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang
Bohan Zhuang
Qi Wu
66
12
0
12 Jan 2024
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
Guiming Cao
Kaize Shi
Hong Fu
Huaiwen Zhang
Guandong Xu
VLM
75
2
0
12 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
84
0
0
11 Jan 2024
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Yatong Bai
Utsav Garg
Apaar Shanker
Haoming Zhang
Samyak Parajuli
...
Eugenia D Fomitcheva
E. Branson
Aerin Kim
Somayeh Sojoudi
Kyunghyun Cho
60
2
0
09 Jan 2024
VGA: Vision and Graph Fused Attention Network for Rumor Detection
Lin Bai
Caiyan Jia
Ziying Song
Chaoqun Cui
55
0
0
03 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
82
30
0
03 Jan 2024
Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation
Change Che
Qunwei Lin
Xinyu Zhao
Jiaxin Huang
Liqiang Yu
VLM
39
39
0
02 Jan 2024
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao
Long Zhao
Vijay Kumar B.G
Yumin Suh
Dimitris N. Metaxas
Manmohan Chandraker
S. Schulter
ObjD
VLM
123
6
0
29 Dec 2023
Towards a Unified Multimodal Reasoning Framework
Abhinav Arun
Dipendra Singh Mal
Mehul Soni
Tomohiro Sawada
LRM
39
0
0
22 Dec 2023
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining
Bumsoo Kim
Yeonsik Jo
Jinhyung Kim
S. Kim
VLM
99
8
0
19 Dec 2023
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
103
4
0
19 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
167
36
0
19 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
101
26
0
19 Dec 2023
Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion
Tianlin Li
Jiandong Jin
Chenglong Li
Jin Tang
Cheng Zhang
Wei Wang
VLM
72
16
0
17 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
Gabriel Loaiza-Ganem
M. Volkovs
127
3
0
15 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
100
15
0
15 Dec 2023
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
132
4
0
15 Dec 2023
Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning
Md Golam Moula Mehedi Hasan
S. Sami
Nasser M. Nasrabadi
67
6
0
14 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
59
4
0
14 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&Ro
AI4CE
138
75
0
14 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedIm
LM&MA
86
23
0
13 Dec 2023
Multimodal Pretraining of Medical Time Series and Notes
Ryan N. King
Tianbao Yang
Bobak J. Mortazavi
64
14
0
11 Dec 2023
Medical Vision Language Pretraining: A survey
Prashant Shrestha
Sanskar Amgain
Bidur Khanal
Cristian A. Linte
Binod Bhattarai
VLM
100
17
0
11 Dec 2023
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun
Dohoon Kim
Taesup Moon
VLM
83
6
0
11 Dec 2023
Previous
1
2
3
...
8
9
10
...
41
42
43
Next