Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.01167
Cited By
v1
v2 (latest)
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
3 March 2025
Haoxin Li
Boyang Li
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data"
50 / 103 papers shown
Title
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
67
22
0
21 Oct 2022
Palm up: Playing in the Latent Manifold for Unsupervised Pretraining
Hao Liu
Tom Zahavy
Volodymyr Mnih
Satinder Singh
SSL
95
7
0
19 Oct 2022
UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image
Dani Valevski
Matan Kalman
Eyal Molad
Eyal Segalis
Yossi Matias
Yaniv Leviathan
DiffM
61
39
0
17 Oct 2022
Imagic: Text-Based Real Image Editing with Diffusion Models
Bahjat Kawar
Shiran Zada
Oran Lang
Omer Tov
Hui-Tang Chang
Tali Dekel
Inbar Mosseri
Michal Irani
91
1,103
0
17 Oct 2022
Is synthetic data from generative models ready for image recognition?
Ruifei He
Shuyang Sun
Xin Yu
Chuhui Xue
Wenqing Zhang
Philip Torr
Song Bai
Xiaojuan Qi
100
302
0
14 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
111
36
0
07 Oct 2022
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
92
393
0
04 Oct 2022
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
Joel Jang
Seonghyeon Ye
Minjoon Seo
ELM
LRM
146
64
0
26 Sep 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
108
98
0
01 Jul 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
197
1,129
0
22 Jun 2022
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Jiahui Gao
Renjie Pi
Yong Lin
Hang Xu
Jiacheng Ye
Zhiyong Wu
Weizhong Zhang
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
SyDa
VLM
144
49
0
25 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
461
6,067
0
23 May 2022
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
362
3,695
0
02 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,602
0
29 Apr 2022
Imagination-Augmented Natural Language Understanding
Yujie Lu
Wanrong Zhu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
46
24
0
18 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
113
427
0
07 Apr 2022
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLM
CLIP
VPVLM
141
1,356
0
10 Mar 2022
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Yu Meng
Jiaxin Huang
Yu Zhang
Jiawei Han
SyDa
75
235
0
09 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
555
4,409
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
493
15,768
0
20 Dec 2021
Learning with Label Noise for Image Retrieval by Selecting Interactions
Sarah Ibrahimi
Arnaud Sors
Rafael Sampaio de Rezende
Stéphane Clinchant
NoLa
VLM
67
16
0
20 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
104
715
0
08 Dec 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
108
642
0
09 Nov 2021
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Peter West
Chandrasekhar Bhagavatula
Jack Hessel
Jena D. Hwang
Liwei Jiang
Ronan Le Bras
Ximing Lu
Sean Welleck
Yejin Choi
SyDa
107
332
0
14 Oct 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
505
2,409
0
02 Sep 2021
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng
Yutong He
Yang Song
Jiaming Song
Jiajun Wu
Jun-Yan Zhu
Stefano Ermon
DiffM
149
1,504
0
02 Aug 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
490
10,496
0
17 Jun 2021
Generate, Annotate, and Learn: NLP with Synthetic Text
Xuanli He
Islam Nassar
J. Kiros
Gholamreza Haffari
Mohammad Norouzi
70
53
0
11 Jun 2021
Learning to See by Looking at Noise
Manel Baradad
Jonas Wulff
Tongzhou Wang
Phillip Isola
Antonio Torralba
81
92
0
10 Jun 2021
Generative Models as a Data Source for Multiview Representation Learning
Ali Jahanian
Xavier Puig
Yonglong Tian
Phillip Isola
83
128
0
09 Jun 2021
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
Yuxuan Zhang
Huan Ling
Jun Gao
K. Yin
Jean-Francois Lafleche
Adela Barriuso
Antonio Torralba
Sanja Fidler
3DH
GAN
VLM
74
335
0
13 Apr 2021
Noise-resistant Deep Metric Learning with Ranking-based Instance Selection
Chang-rui Liu
Han Yu
Boyang Albert Li
Zhiqi Shen
Zhanning Gao
Peiran Ren
Xuansong Xie
Li-zhen Cui
Chunyan Miao
NoLa
80
38
0
30 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
972
29,810
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
459
3,893
0
11 Feb 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
880
42,379
0
28 May 2020
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
216
2,059
0
16 Apr 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
72
84
0
19 Dec 2019
Synthetic Humans for Action Recognition from Unseen Viewpoints
Gül Varol
Ivan Laptev
Cordelia Schmid
Andrew Zisserman
97
99
0
09 Dec 2019
Generative adversarial networks (GAN) based efficient sampling of chemical space for inverse design of inorganic materials
Yabo Dan
Yong Zhao
Xiang Li
Shaobo Li
Ming Hu
Jianjun Hu
AI4CE
GAN
100
199
0
12 Nov 2019
This dataset does not exist: training models from generated images
Victor Besnier
Himalaya Jain
Andrei Bursuc
Matthieu Cord
P. Pérez
DD
46
87
0
07 Nov 2019
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
65
128
0
25 Sep 2019
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
Pengfei Chen
B. Liao
Guangyong Chen
Shengyu Zhang
NoLa
69
388
0
13 May 2019
Learning to Learn from Noisy Labeled Data
Junnan Li
Yongkang Wong
Qi Zhao
Mohan Kankanhalli
NoLa
65
334
0
13 Dec 2018
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
617
10,590
0
12 Dec 2018
Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization
Jonathan Tremblay
Aayush Prakash
David Acuna
M. Brophy
Varun Jampani
Cem Anil
Thang To
Eric Cameracci
Shaad Boochoon
Stan Birchfield
OOD
84
818
0
18 Apr 2018
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang
Zhengyuan Zhou
Thomas Leung
Li Li
Li Fei-Fei
NoLa
128
1,456
0
14 Dec 2017
On Pre-Trained Image Features and Synthetic Images for Deep Learning
Stefan Hinterstoißer
Vincent Lepetit
Paul Wohlhart
K. Konolige
VLM
ObjD
47
230
0
29 Oct 2017
VisDA: The Visual Domain Adaptation Challenge
Xingchao Peng
Ben Usman
Neela Kaushik
Judy Hoffman
Dequan Wang
Kate Saenko
OOD
93
806
0
18 Oct 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Xun Huang
Serge J. Belongie
OOD
181
4,372
0
20 Mar 2017
Previous
1
2
3
Next