ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09036
  4. Cited By
Can Text-to-image Model Assist Multi-modal Learning for Visual
  Recognition with Visual Modality Missing?

Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?

14 February 2024
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
ArXivPDFHTML

Papers citing "Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?"

8 / 8 papers shown
Title
Can Synthetic Audio From Generative Foundation Models Assist Audio
  Recognition and Speech Modeling?
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian Feng
Dimitrios Dimitriadis
Shrikanth Narayanan
24
4
0
13 Jun 2024
TI-ASU: Toward Robust Automatic Speech Understanding through
  Text-to-speech Imputation Against Missing Speech Modality
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Tiantian Feng
Xuan Shi
Rahul Gupta
Shrikanth S. Narayanan
41
0
0
27 Apr 2024
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,125
0
28 Jan 2022
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
248
577
0
22 Apr 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual
  Machine Learning
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
202
310
0
02 Mar 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
256
525
0
04 Feb 2021
Supervised Multimodal Bitransformers for Classifying Images and Text
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
59
241
0
06 Sep 2019
1