ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.13290
  4. Cited By
CogView: Mastering Text-to-Image Generation via Transformers

CogView: Mastering Text-to-Image Generation via Transformers

26 May 2021
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
Da Yin
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
    ViT
    VLM
ArXivPDFHTML

Papers citing "CogView: Mastering Text-to-Image Generation via Transformers"

50 / 540 papers shown
Title
ManiCLIP: Multi-Attribute Face Manipulation from Text
ManiCLIP: Multi-Attribute Face Manipulation from Text
Hao Wang
Guosheng Lin
A. Molino
Anran Wang
Jiashi Feng
Zehuan Yuan
CVBM
40
9
0
02 Oct 2022
Data Poisoning Attacks Against Multimodal Encoders
Data Poisoning Attacks Against Multimodal Encoders
Ziqing Yang
Xinlei He
Zheng Li
Michael Backes
Mathias Humbert
Pascal Berrang
Yang Zhang
AAML
118
46
0
30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Han-Hung Lee
Angel X. Chang
24
63
0
30 Sep 2022
Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image
  Generation
Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation
Xintian Wu
Hanbin Zhao
Liangli Zheng
Shouhong Ding
Xi Li
34
13
0
28 Sep 2022
Collaboration of Pre-trained Models Makes Better Few-shot Learner
Collaboration of Pre-trained Models Makes Better Few-shot Learner
Renrui Zhang
Bohao Li
Wei Zhang
Hao Dong
Hongsheng Li
Peng Gao
Yu Qiao
VLM
65
7
0
25 Sep 2022
All are Worth Words: A ViT Backbone for Diffusion Models
All are Worth Words: A ViT Backbone for Diffusion Models
Fan Bao
Shen Nie
Kaiwen Xue
Yue Cao
Chongxuan Li
Hang Su
Jun Zhu
VLM
30
321
0
25 Sep 2022
Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
Zhaoxi Chen
Guangcong Wang
Ziwei Liu
92
30
0
20 Sep 2022
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng
L. Vuong
Jianfei Cai
Dinh Q. Phung
MQ
73
72
0
19 Sep 2022
ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation
ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation
Zhengzhe Liu
Peng Dai
Ruihui Li
Xiaojuan Qi
Chi-Wing Fu
DiffM
182
25
0
09 Sep 2022
Text-Free Learning of a Natural Language Interface for Pretrained Face
  Generators
Text-Free Learning of a Natural Language Interface for Pretrained Face Generators
Xiaodan Du
Raymond A. Yeh
Nicholas I. Kolkin
Eli Shechtman
Gregory Shakhnarovich
CLIP
34
1
0
08 Sep 2022
DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for
  Text-to-Image Generation
DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
Mengqi Huang
Zhendong Mao
Penghui Wang
Quang Wang
Yongdong Zhang
36
20
0
03 Sep 2022
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Wanshu Fan
Yen-Chun Chen
Dongdong Chen
Yu Cheng
Lu Yuan
Yu-Chiang Frank Wang
DiffM
34
91
0
29 Aug 2022
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
  Subject-Driven Generation
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Yael Pritch
Michael Rubinstein
Kfir Aberman
50
2,714
0
25 Aug 2022
Text to Image Generation: Leaving no Language Behind
Text to Image Generation: Leaving no Language Behind
Pedro Reviriego
Elena Merino-Gómez
VLM
16
13
0
19 Aug 2022
Layout-Bridging Text-to-Image Synthesis
Layout-Bridging Text-to-Image Synthesis
Jiadong Liang
Wenjie Pei
Feng Lu
EGVM
27
15
0
12 Aug 2022
ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal
  Fashion Design
ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design
Xujie Zhang
Yuyang Sha
Michael C. Kampffmeyer
Zhenyu Xie
Zequn Jie
Chengwen Huang
Jianqing Peng
Xiaodan Liang
19
18
0
11 Aug 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
92
1,697
0
02 Aug 2022
A One-Shot Reparameterization Method for Reducing the Loss of Tile
  Pruning on DNNs
A One-Shot Reparameterization Method for Reducing the Loss of Tile Pruning on DNNs
Yancheng Li
Qingzhong Ai
Fumihiko Ino
32
0
0
29 Jul 2022
Iterative Scene Graph Generation
Iterative Scene Graph Generation
Siddhesh Khandelwal
Leonid Sigal
OCL
29
29
0
27 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
36
297
0
20 Jul 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for
  Infinite Visual Synthesis
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
27
72
0
20 Jul 2022
Towards a General Pre-training Framework for Adaptive Learning in MOOCs
Towards a General Pre-training Framework for Adaptive Learning in MOOCs
Qingyang Zhong
Jifan Yu
Zheyuan Zhang
Yiming Mao
Yuquan Wang
Yankai Lin
Lei Hou
Juanzi Li
Jie Tang
CLL
AI4CE
28
4
0
18 Jul 2022
Vector Quantisation for Robust Segmentation
Vector Quantisation for Robust Segmentation
Ainkaran Santhirasekaram
Avinash Kori
Mathias Winkler
A. Rockall
Ben Glocker
OOD
27
9
0
05 Jul 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
134
1,072
0
22 Jun 2022
DALL-E for Detection: Language-driven Compositional Image Synthesis for
  Object Detection
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
ObjD
28
16
0
20 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal
  Learners
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLM
AI4CE
24
16
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
Peng Xu
Xiatian Zhu
David Clifton
ViT
77
530
0
13 Jun 2022
PILC: Practical Image Lossless Compression with an End-to-end GPU
  Oriented Neural Framework
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework
Ning Kang
Shanzhao Qiu
Shifeng Zhang
Zhenguo Li
Shutao Xia
22
17
0
10 Jun 2022
Draft-and-Revise: Effective Image Generation with Contextual
  RQ-Transformer
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
21
28
0
09 Jun 2022
Blended Latent Diffusion
Blended Latent Diffusion
Omri Avrahami
Ohad Fried
Dani Lischinski
DiffM
79
373
0
06 Jun 2022
DE-Net: Dynamic Text-guided Image Editing Adversarial Networks
DE-Net: Dynamic Text-guided Image Editing Adversarial Networks
Ming Tao
Bingkun Bao
Hao Tang
Fei Wu
Longhui Wei
Qi Tian
DiffM
30
15
0
02 Jun 2022
DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder
DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder
Jie Shi
Chenfei Wu
Jian Liang
Xiang Liu
Nan Duan
DiffM
14
25
0
01 Jun 2022
Improved Vector Quantized Diffusion Models
Improved Vector Quantized Diffusion Models
Zhicong Tang
Shuyang Gu
Jianmin Bao
Dong Chen
Fang Wen
DiffM
187
63
0
31 May 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
256
570
0
29 May 2022
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
Zhikang Li
Huiling Zhou
Shuai Bai
Peike Li
Chang Zhou
Hongxia Yang
37
4
0
24 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language
  Understanding
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
96
5,800
0
23 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
213
0
09 May 2022
End-to-End Visual Editing with a Generatively Pre-Trained Artist
End-to-End Visual Editing with a Generatively Pre-Trained Artist
A. Brown
Cheng-Yang Fu
Omkar M. Parkhi
Tamara L. Berg
Andrea Vedaldi
DiffM
37
8
0
03 May 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical
  Transformers
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
41
324
0
28 Apr 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
35
436
0
18 Apr 2022
Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Hyungyu Lee
Sungjin Park
Joonseok Lee
Edward Choi
32
2
0
15 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
118
6,663
0
13 Apr 2022
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise
  Semantic Alignment and Generation
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Chunjing Xu
Yanwei Fu
33
17
0
09 Apr 2022
KNN-Diffusion: Image Generation via Large-Scale Retrieval
KNN-Diffusion: Image Generation via Large-Scale Retrieval
Shelly Sheynin
Oron Ashual
Adam Polyak
Uriel Singer
Oran Gafni
Eliya Nachmani
Yaniv Taigman
VLM
SyDa
DiffM
27
113
0
06 Apr 2022
TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable
  Facial Editing
TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
Yanbo Xu
Yueqin Yin
Liming Jiang
Qianyi Wu
Chengyao Zheng
Chen Change Loy
Bo Dai
Wayne Wu
38
53
0
31 Mar 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
19
513
0
24 Mar 2022
WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Shan Yuan
Shuai Zhao
Jiahong Leng
Zhao Xue
Hanyu Zhao
Peiyu Liu
Zheng Gong
Wayne Xin Zhao
Junyi Li
Tang Jie
VLM
29
5
0
22 Mar 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual
  Sequence-to-Sequence Pre-training
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
Luyang Huang
Guocheng Niu
Jiachen Liu
Xinyan Xiao
Hua Wu
VLM
CoGe
19
7
0
17 Mar 2022
KPE: Keypoint Pose Encoding for Transformer-based Image Generation
KPE: Keypoint Pose Encoding for Transformer-based Image Generation
Soon Yau Cheong
A. Mustafa
Andrew Gilbert
ViT
35
10
0
09 Mar 2022
Show Me What and Tell Me How: Video Synthesis via Multimodal
  Conditioning
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Ligong Han
Jian Ren
Hsin-Ying Lee
Francesco Barbieri
Kyle Olszewski
Shervin Minaee
Dimitris N. Metaxas
Sergey Tulyakov
DiffM
VGen
30
41
0
04 Mar 2022
Previous
123...10119
Next