Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.08402
Cited By
LAION-5B: An open large-scale dataset for training next generation image-text models
16 October 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
Clayton Mullis
Mitchell Wortsman
P. Schramowski
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-5B: An open large-scale dataset for training next generation image-text models"
50 / 673 papers shown
Title
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
39
63
0
13 Feb 2024
Exploring Perceptual Limitation of Multimodal Large Language Models
Jiarui Zhang
Jinyi Hu
Mahyar Khayatkhoei
Filip Ilievski
Maosong Sun
LRM
29
10
0
12 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
38
10
0
07 Feb 2024
Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search
Yutaro Oguri
Yusuke Matsui
31
0
0
07 Feb 2024
Federated Learning Priorities Under the European Union Artificial Intelligence Act
Herbert Woisetschläger
Alexander Erben
Bill Marino
Shiqiang Wang
Nicholas D. Lane
R. Mayer
Hans-Arno Jacobsen
28
15
0
05 Feb 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
29
35
0
02 Feb 2024
BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion
Yonghao Yu
Shunan Zhu
Huai Qin
Haorui Li
Jinglu Hu
29
7
0
30 Jan 2024
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Wei Li
Xue Xu
Jiachen Liu
Xinyan Xiao
25
5
0
24 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Wang
Xin Li
Luisa Verdoliva
Shu Hu
88
58
0
22 Jan 2024
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
Xiaotian Han
Yiqi Wang
Bohan Zhai
Quanzeng You
Hongxia Yang
VLM
MLLM
33
2
0
17 Jan 2024
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya
Aniruddha Kembhavi
Dhruv Batra
Z. Kira
Kuo-Hao Zeng
Luca Weihs
VLM
46
5
0
15 Jan 2024
Object-Centric Diffusion for Efficient Video Editing
Kumara Kahatapitiya
Adil Karjauv
Davide Abati
Fatih Porikli
Yuki M. Asano
A. Habibian
VGen
40
12
0
11 Jan 2024
AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic Pollocks
T. S. Goetze
30
13
0
10 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang
Yiren Song
Jiaming Liu
Rui Wang
Jinpeng Yu
...
Huaxia Li
Xu Tang
Yao Hu
Han Pan
Zhongliang Jing
49
58
0
26 Dec 2023
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Xinyan Chen
Jiaxin Ge
Tianjun Zhang
Jiaming Liu
Shanghang Zhang
VLM
EGVM
42
0
0
23 Dec 2023
Leveraging Habitat Information for Fine-grained Bird Identification
Tin Nguyen
Anh Nguyen
Anh Nguyen
VLM
44
0
0
22 Dec 2023
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin
Conghui He
Alex Jinpeng Wang
Bin Wang
Weijia Li
Mike Zheng Shou
36
7
0
21 Dec 2023
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
20
241
0
21 Dec 2023
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Hayk Manukyan
Andranik Sargsyan
Barsegh Atanyan
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
DiffM
35
28
0
21 Dec 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Peng Gao
Yandong Guo
Shanghang Zhang
3DV
26
58
0
21 Dec 2023
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie
Jiahao Li
Hao Tan
Xin Sun
Zhixin Shu
Yi Zhou
Sai Bi
Soren Pirk
Arie E. Kaufman
37
8
0
21 Dec 2023
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang
R. Prabha
Tianyuan Huang
Jiajun Wu
Ram Rajagopal
34
55
0
20 Dec 2023
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
Seunghoo Hong
Juhun Lee
Simon S. Woo
25
18
0
20 Dec 2023
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis
Vincent Sitzmann
Matthias NieBner
34
15
0
19 Dec 2023
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang
Chaojie Mao
Yulin Pan
Zhen Han
Jingfeng Zhang
32
28
0
18 Dec 2023
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
36
24
0
17 Dec 2023
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber
Carlo Siebenschuh
Rory Butler
Anton Alexandrov
Valdemar Thanner
...
Haris Jabbar
Ian Foster
Bo-wen Li
Rick L. Stevens
Ce Zhang
21
4
0
15 Dec 2023
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang
Fan-Yun Sun
Luca Weihs
Eli VanderBilt
Alvaro Herrasti
...
Lingjie Liu
Chris Callison-Burch
Mark Yatskar
Aniruddha Kembhavi
Christopher Clark
LM&Ro
39
78
0
14 Dec 2023
SceneWiz3D: Towards Text-guided 3D Scene Composition
Qihang Zhang
Chaoyang Wang
Aliaksandr Siarohin
Peiye Zhuang
Yinghao Xu
Ceyuan Yang
Dahua Lin
Bolei Zhou
Sergey Tulyakov
Hsin-Ying Lee
38
31
0
13 Dec 2023
Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences
C. Kupferschmidt
A. Binns
K. L. Kupferschmidt
G. W. Taylor
DiffM
18
0
0
13 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
50
64
0
11 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
20
16
0
08 Dec 2023
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng
Andrea Vedaldi
3DV
45
48
0
07 Dec 2023
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li
Mingdeng Cao
Xintao Wang
Zhongang Qi
Ming-Ming Cheng
Ying Shan
DiffM
60
189
0
07 Dec 2023
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen
A. Bhunia
Subhadeep Koley
Aneeshan Sain
Pinaki Nath Chowdhury
Yi-Zhe Song
29
8
0
07 Dec 2023
Understanding (Un)Intended Memorization in Text-to-Image Generative Models
Ali Naseh
Jaechul Roh
Amir Houmansadr
DiffM
33
6
0
06 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
51
83
0
06 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
26
6
0
06 Dec 2023
DiffusionSat: A Generative Foundation Model for Satellite Imagery
Samar Khanna
Patrick Liu
Linqi Zhou
Chenlin Meng
Robin Rombach
Marshall Burke
David B. Lobell
Stefano Ermon
29
58
0
06 Dec 2023
Kandinsky 3.0 Technical Report
V.Ya. Arkhipkin
Andrei Filatov
Viacheslav Vasilev
Anastasia Maltseva
Said Azizov
Igor Pavlov
Julia Agafonova
Andrey Kuznetsov
Denis Dimitrov
DiffM
30
12
0
06 Dec 2023
FaceStudio: Put Your Face Everywhere in Seconds
Yuxuan Yan
C. Zhang
Rui Wang
Yichao Zhou
Gege Zhang
Pei Cheng
Gang Yu
Bin-Bin Fu
DiffM
35
40
0
05 Dec 2023
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po
Guandao Yang
Kfir Aberman
Gordon Wetzstein
DiffM
33
26
0
05 Dec 2023
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu
Chenchen Zhu
Sean Culatana
Raghuraman Krishnamoorthi
Fanyi Xiao
Yong Jae Lee
117
14
0
04 Dec 2023
LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models
Qiang Wen
Yazhou Xing
Zhefan Rao
Qifeng Chen
DiffM
50
0
0
02 Dec 2023
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
38
50
0
01 Dec 2023
Text-Guided 3D Face Synthesis -- From Generation to Editing
Yunjie Wu
Yapeng Meng
Zhipeng Hu
Lincheng Li
Haoqian Wu
Kun Zhou
Weiwei Xu
Xin Yu
DiffM
56
9
0
01 Dec 2023
Raising the Bar of AI-generated Image Detection with CLIP
D. Cozzolino
Giovanni Poggi
Riccardo Corvi
Matthias Nießner
L. Verdoliva
VLM
35
74
0
30 Nov 2023
Initializing Models with Larger Ones
Zhiqiu Xu
Yanjie Chen
Kirill Vishniakov
Yida Yin
Zhiqiang Shen
Trevor Darrell
Lingjie Liu
Zhuang Liu
38
17
0
30 Nov 2023
Previous
1
2
3
...
8
9
10
...
12
13
14
Next