Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,118 papers shown
Title
BrainGuard: Privacy-Preserving Multisubject Image Reconstructions from Brain Activities
Zhibo Tian
Ruijie Quan
Fan Ma
Kun Zhan
Yi Yang
113
1
0
24 Jan 2025
Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols
John Joon Young Chung
Melissa Roemmele
Max Kreminski
VGen
124
0
0
23 Jan 2025
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu
T. Scargill
M. Gorlatova
106
2
0
22 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
88
0
0
20 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
118
6
0
20 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
278
3
0
14 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
150
0
0
13 Jan 2025
MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection
Kaiying Yan
Moyang Liu
Yukun Liu
Ruibo Fu
Zhengqi Wen
J. Tao
Xuefei Liu
Guanjun Li
123
0
0
12 Jan 2025
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework
Run Shao
Cheng Yang
Qiujun Li
Qing Zhu
Yongjun Zhang
...
Yu Liu
Yong Tang
Dapeng Liu
Shizhong Yang
Haifeng Li
175
0
0
08 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
48
0
0
07 Jan 2025
Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
Malak Mansour
Ahmed Aly
Bahey Tharwat
Sarim Hashmi
Dong An
Ian Reid
LM&Ro
ELM
LRM
131
1
0
07 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yang Liu
128
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
177
15
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
365
59
0
03 Jan 2025
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Cheonsu Jeong
184
4
0
01 Jan 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
284
5
0
31 Dec 2024
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi
Shivam Sharma
Tanmoy Chakraborty
71
1
0
31 Dec 2024
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang
Ziqiao Ma
Jialu Li
Yanyuan Qiao
Zun Wang
J. Chai
Qi Wu
Joey Tianyi Zhou
Parisa Kordjamshidi
LRM
167
24
0
31 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
146
2
0
25 Dec 2024
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
97
1
0
24 Dec 2024
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
190
71
0
22 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
114
3
0
17 Dec 2024
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched EMR
Jangyeong Jeon
Sangyeon Cho
Dongjoon Lee
Changhee Lee
Junyeong Kim
110
0
0
16 Dec 2024
ViSymRe: Vision-guided Multimodal Symbolic Regression
Da Li
Junping Yin
Jin Xu
Xinxin Li
Juan Zhang
135
1
0
15 Dec 2024
Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation
Yang Yang
Wenjuan Xi
Luping Zhou
Jinhui Tang
150
0
0
14 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
148
0
0
13 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
103
0
0
05 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
226
0
0
04 Dec 2024
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis
Hao Yang
Zhenyu Zhang
Yanyan Zhao
Bing Qin
103
0
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
150
1
0
02 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li
Yifei Xing
X. Lan
Xuzhao Li
Haifeng Chen
D. Jiang
Mamba
141
1
0
01 Dec 2024
MIMIC: Multimodal Islamophobic Meme Identification and Classification
Safrin Sanzida Islam
Sahid Hossain Mustakim
Sadia Ahmmed
Md. Faiyaz Abdullah Sayeedi
Swapnil Khandoker
Syed Tasdid Azam Dhrubo
Nahid Md Lokman Hossain
106
0
0
01 Dec 2024
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation
Yiyuan Pan
Yunzhe Xu
Yanfeng Guo
Hesheng Wang
LM&Ro
151
3
0
30 Nov 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
79
0
0
30 Nov 2024
LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation
Huadong Tang
Youpeng Zhao
Y. Huang
Min Xu
Jun Wang
Qiang Wu
MLLM
VLM
131
0
0
30 Nov 2024
SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment
Jie Wang
Yichen Wang
Zhilin Zhang
Jianhao Zeng
Kaidi Wang
Zhiyang Chen
163
0
0
27 Nov 2024
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
143
4
0
22 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
91
1
0
17 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
159
2
0
15 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
72
0
0
13 Nov 2024
Prompt-enhanced Network for Hateful Meme Classification
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
111
0
0
12 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
59
0
0
11 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
58
0
1
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
102
0
0
09 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
108
1
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
175
1
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
36
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
119
2
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLM
MLLM
LRM
115
31
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
102
1
0
29 Oct 2024
Previous
1
2
3
4
5
6
...
41
42
43
Next