Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,093 papers shown
Title
ViSymRe: Vision-guided Multimodal Symbolic Regression
Da Li
Junping Yin
Jin Xu
Xinxin Li
Juan Zhang
85
1
0
15 Dec 2024
Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation
Yang Yang
Wenjuan Xi
Luping Zhou
Jinhui Tang
79
0
0
14 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
86
0
0
13 Dec 2024
Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations
Niall McGuire
Yashar Moshfeghi
45
0
0
09 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
64
0
0
05 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
100
0
0
04 Dec 2024
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis
Hao Yang
Zhenyu Zhang
Yanyan Zhao
Bing Qin
71
0
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
67
1
0
02 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li
Yifei Xing
X. Lan
Xuzhao Li
Haifeng Chen
D. Jiang
Mamba
74
0
0
01 Dec 2024
MIMIC: Multimodal Islamophobic Meme Identification and Classification
Safrin Sanzida Islam
Sahid Hossain Mustakim
Sadia Ahmmed
Md. Faiyaz Abdullah Sayeedi
Swapnil Khandoker
Syed Tasdid Azam Dhrubo
Nahid Md Lokman Hossain
69
0
0
01 Dec 2024
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation
Yiyuan Pan
Yunzhe Xu
Zhe Liu
Hesheng Wang
LM&Ro
83
0
0
30 Nov 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
64
0
0
30 Nov 2024
LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation
Huadong Tang
Youpeng Zhao
Y. Huang
Min Xu
Jun Wang
Qiang Wu
MLLM
VLM
80
0
0
30 Nov 2024
SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment
Jie Wang
Yichen Wang
Zhilin Zhang
Jianhao Zeng
Kaidi Wang
Zhiyang Chen
72
0
0
27 Nov 2024
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
96
3
0
22 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
55
1
0
15 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
48
0
0
13 Nov 2024
Prompt-enhanced Network for Hateful Meme Classification
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
63
0
0
12 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
29
0
0
11 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
34
0
0
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
37
0
0
09 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
37
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
72
1
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
26
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
42
1
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wei Liu
Xinyu Wang
VLM
MLLM
LRM
49
14
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
62
1
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
58
5
0
28 Oct 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
41
1
0
27 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
39
0
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
26
0
0
23 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
31
0
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
73
4
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
37
0
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
L. Chen
Hexiang Hu
Ruotong Wang
Yangyi Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming Yang
Boqing Gong
31
2
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
7
0
14 Oct 2024
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen
Jianfei Yang
48
1
0
14 Oct 2024
Leveraging Customer Feedback for Multi-modal Insight Extraction
Sandeep Sricharan Mukku
Abinesh Kanagarajan
Pushpendu Ghosh
Chetan Aggarwal
27
0
0
13 Oct 2024
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
Maksim Kuznetsov
Airat Valiev
Alex Aliper
Daniil Polykovskiy
E. Tutubalina
Rim Shayakhmetov
Z. Miftahutdinov
27
0
0
11 Oct 2024
A Social Context-aware Graph-based Multimodal Attentive Learning Framework for Disaster Content Classification during Emergencies
Shahid Shafi Dar
Mohammad Zia Ur Rehman
Karan Bais
Mohammed Abdul Haseeb
Nagendra Kumara
41
10
0
11 Oct 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan YU
Tianxiao Li
Yuxin Zhu
Rongze Pan
38
0
0
10 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
37
0
0
10 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
28
0
0
10 Oct 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
44
4
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
37
3
0
09 Oct 2024
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li
Xinwei Zhang
Peilin Zhong
Yuan Deng
Meisam Razaviyayn
Vahab Mirrokni
25
2
0
09 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
38
0
0
04 Oct 2024
Multi-modal clothing recommendation model based on large model and VAE enhancement
Bingjie Huang
Qingyi Lu
Shuaishuai Huang
Xue-she Wang
Haowei Yang
39
3
0
03 Oct 2024
Previous
1
2
3
4
5
6
...
40
41
42
Next