Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,118 papers shown
Title
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Batuhan Ozyurt
Roya Arkhmammadova
Deniz Yuret
77
2
0
04 Apr 2025
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles
Chen Wei Kuo
Kevin Chu
Nouar Aldahoul
Hazem Ibrahim
Talal Rahwan
Yasir Zaki
SyDa
156
0
0
04 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
181
0
0
03 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
126
4
0
03 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
102
0
0
02 Apr 2025
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning
Alexander Vogel
Omar Moured
Yufan Chen
Jiaming Zhang
Rainer Stiefelhagen
122
0
0
29 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
183
3
0
27 Mar 2025
VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
Zizhi Chen
Minghao Han
Xukun Zhang
Shuwei Ma
Tao Liu
Xing Wei
Li Zhang
178
0
0
25 Mar 2025
VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs
Kelaiti Xiao
Liang Yang
Paerhati Tulajiang
Hongfei Lin
MLLM
124
0
0
25 Mar 2025
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Ziming Wei
Bingqian Lin
Yunshuang Nie
Jiaqi Chen
Shikui Ma
Hang Xu
Xiaodan Liang
151
1
0
23 Mar 2025
A Language Anchor-Guided Method for Robust Noisy Domain Generalization
Zilin Dai
Lehong Wang
Fangzhou Lin
Yidong Wang
Zhigang Li
Kazunori D Yamada
Ziming Zhang
Wang Lu
423
0
0
21 Mar 2025
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Pengyu Liu
Guohua Dong
D. Guo
Kun Li
Fengling Li
Xun Yang
Meng Wang
Xiaomin Ying
AI4CE
92
0
0
20 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
90
1
0
18 Mar 2025
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Minghan Li
...
Yuxuan Zhou
Jingdong Sun
Qi Dai
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
87
0
0
18 Mar 2025
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li
Liang Wang
Chao Wang
Jing Jiang
Yan Peng
Guodong Long
VLM
141
1
0
17 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
116
0
0
17 Mar 2025
Learning Privacy from Visual Entities
Alessio Xompero
Andrea Cavallaro
SSL
GNN
114
0
0
16 Mar 2025
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models
Xirui Zhou
Lianlei Shan
Xiaolin Gui
95
0
0
14 Mar 2025
Can LLMs Understand Time Series Anomalies?
Zihao Zhou
Rose Yu
AI4TS
172
15
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
150
1
0
13 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Yu Su
Chenglong Wang
93
1
0
13 Mar 2025
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework
Zhuo Zhi
Chen Feng
Adam Daneshmend
Mine Orlu
Andreas Demosthenous
L. Yin
Da Li
Ziquan Liu
Miguel R. D. Rodrigues
LRM
123
1
0
11 Mar 2025
Anatomy-Aware Conditional Image-Text Retrieval
Meng Zheng
Jiajin Zhang
Benjamin Planche
Zhongpai Gao
Terrence Chen
Ziyan Wu
MedIm
89
0
0
10 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
106
0
0
10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
129
1
0
09 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
198
1
0
07 Mar 2025
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
Biao Ouyang
Yingying Zhang
Hanyin Cheng
Yang Shu
Chenjuan Guo
Bin Yang
Qingsong Wen
L. Fan
Christian S. Jensen
100
1
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
412
0
0
05 Mar 2025
Perceptual Visual Quality Assessment: Principles, Methods, and Future Directions
Wei Zhou
Hadi Amirpour
Christian Timmerer
Guangtao Zhai
P. Callet
Alan C. Bovik
88
0
0
01 Mar 2025
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series
Yanan Niu
Roy Sarkis
D. Psaltis
Mario Paolone
Christophe Moser
Luisa Lambertini
131
0
0
28 Feb 2025
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
ObjD
VLM
80
0
0
28 Feb 2025
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems
Faisal Mohammad
Duksan Ryu
94
0
0
28 Feb 2025
Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in finance
Sarmistha Das
Basha Mujavarsheik
R E Zera Lyngkhoi
Sriparna Saha
Alka Maurya
55
0
0
26 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
130
1
0
25 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
123
2
0
24 Feb 2025
Are Large Language Models Good Data Preprocessors?
Elyas Meguellati
Nardiena A. Pratama
S. Sadiq
Gianluca Demartini
140
0
0
24 Feb 2025
Beyond Pattern Recognition: Probing Mental Representations of LMs
Moritz Miller
Kumar Shridhar
ReLM
LRM
120
0
0
23 Feb 2025
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
Yuheng Ji
Yue Liu
Zhicheng Zhang
Zhao Zhang
Yuting Zhao
Gang Zhou
Xingwei Zhang
Xinwang Liu
Xiaolong Zheng
VLM
186
4
0
21 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-wei Lee
VLM
116
1
0
16 Feb 2025
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
137
0
0
12 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
162
3
0
11 Feb 2025
Foundation Models for Anomaly Detection: Vision and Challenges
Jing Ren
Tao Tang
Hong Jia
Haytham Fayek
Haytham Fayek
Xiaodong Li
Suyu Ma
Xiwei Xu
Feng Xia
153
0
0
10 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
148
0
0
09 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
D. Yin
115
1
0
09 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
155
0
0
09 Feb 2025
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A. Gomaa
Yixing Huang
Pluvio Stephan
Katharina Breininger
Benjamin Frey
...
U. Gaipl
Christoph Bert
R. Fietkau
M. Schmidt
F. Putz
141
1
0
06 Feb 2025
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Tianlin Zhang
En Yu
Yi Shao
Shuai Li
181
0
0
03 Feb 2025
Continually Evolved Multimodal Foundation Models for Cancer Prognosis
Jie Peng
Shuang Zhou
Longwei Yang
Yiran Song
Mohan Zhang
Kaixiong Zhou
Feng Xie
Mingquan Lin
Rui Zhang
Tianlong Chen
213
0
0
30 Jan 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Zhu Li
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
82
0
0
29 Jan 2025
Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap
Srivatsa Mallapragada
Ying Xie
Varsha Rani Chawan
Zeyad Hailat
Yuanbo Wang
108
0
0
28 Jan 2025
Previous
1
2
3
4
5
...
41
42
43
Next