ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.13455
  4. Cited By
CoBIT: A Contrastive Bi-directional Image-Text Generation Model

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

23 March 2023
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
    DiffM
ArXivPDFHTML

Papers citing "CoBIT: A Contrastive Bi-directional Image-Text Generation Model"

19 / 19 papers shown
Title
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
Zhong Ji
Weilong Cao
Yan Zhang
Yanwei Pang
Jungong Han
X. Li
DiffM
VLM
47
0
0
06 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
66
0
0
05 Mar 2025
Multi-modal Generation via Cross-Modal In-Context Learning
Multi-modal Generation via Cross-Modal In-Context Learning
Amandeep Kumar
Muzammal Naseer
Sanath Narayan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
MLLM
51
0
0
28 May 2024
Negative Pre-aware for Noisy Cross-modal Matching
Negative Pre-aware for Noisy Cross-modal Matching
Xu-Yao Zhang
Hao Li
Mang Ye
25
7
0
10 Dec 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical
  Masked Autoencoders
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
A. Fuller
K. Millard
James R. Green
17
60
0
01 Nov 2023
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle
  Consistency
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
Tianhong Li
Sangnie Bhardwaj
Yonglong Tian
Han Zhang
Jarred Barber
Dina Katabi
Guillaume Lajoie
Huiwen Chang
Dilip Krishnan
VLM
36
4
0
05 Oct 2023
Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
Eleonora Grassucci
Sergio Barbarossa
Danilo Comminiello
DiffM
27
54
0
07 Jun 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
28
241
0
26 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
44
43
0
24 May 2023
UniControl: A Unified Diffusion Model for Controllable Visual Generation
  In the Wild
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
Can Qin
Shu Zhen Zhang
Ning Yu
Yihao Feng
Xinyi Yang
...
Caiming Xiong
Silvio Savarese
Stefano Ermon
Yun Fu
Ran Xu
12
118
0
18 May 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
207
148
0
12 Mar 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
519
0
02 Jan 2023
Underspecification in Scene Description-to-Depiction Tasks
Underspecification in Scene Description-to-Depiction Tasks
Ben Hutchinson
Jason Baldridge
Vinodkumar Prabhakaran
DiffM
66
32
0
11 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,125
0
28 Jan 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1