Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09036
Cited By
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
14 February 2024
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?"
9 / 9 papers shown
Title
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
Jiajun Qin
Yuan Pu
Zhuolun He
S. Kim
David Z. Pan
Bei Yu
0
0
0
17 May 2025
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian Feng
Dimitrios Dimitriadis
Shrikanth Narayanan
37
4
0
13 Jun 2024
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Tiantian Feng
Xuan Shi
Rahul Gupta
Shrikanth S. Narayanan
49
0
0
27 Apr 2024
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
251
577
0
22 Apr 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
208
310
0
02 Mar 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
268
525
0
04 Feb 2021
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
59
241
0
06 Sep 2019
1