ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.05902
  4. Cited By
Autoregressive Models in Vision: A Survey
v1v2 (latest)

Autoregressive Models in Vision: A Survey

8 November 2024
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
Yao Mu
Yuan Yao
Jikang Cheng
Zhongwei Wan
Jinfa Huang
Chaofan Tao
Shen Yan
Huaxiu Yao
Lingpeng Kong
Hongxia Yang
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
    VGen
ArXiv (abs)PDFHTMLGithub (625★)

Papers citing "Autoregressive Models in Vision: A Survey"

50 / 211 papers shown
Title
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language
  Models for Report Generation
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan
Che Liu
Xin Wang
Chaofan Tao
Hui Shen
Zhenwu Peng
Jie Fu
Rossella Arcucci
Huaxiu Yao
Mi Zhang
67
10
0
07 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
  Text-to-Image Generation
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
197
121
0
07 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
288
1,388
0
05 Mar 2024
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in
  Text-to-Image Generation
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
Daiqing Li
Aleks Kamko
Ehsan Akhgari
Ali Sabet
Linmiao Xu
Suhail Doshi
69
109
0
27 Feb 2024
Genie: Generative Interactive Environments
Genie: Generative Interactive Environments
Jake Bruce
Michael Dennis
Ashley D. Edwards
Jack Parker-Holder
Yuge Shi
...
Konrad Zolna
Jeff Clune
Nando de Freitas
Satinder Singh
Tim Rocktaschel
VGenVLM
142
186
0
23 Feb 2024
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and
  Scalability
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability
Xue-Qing Qian
Yu Wang
Simian Luo
Yinda Zhang
Ying Tai
...
Xiangyang Xue
Bo Zhao
Tiejun Huang
Yunsheng Wu
Yanwei Fu
82
6
0
19 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
106
82
0
13 Feb 2024
Data-efficient Large Vision Models through Sequential Autoregression
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
90
10
0
07 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled
  Visual-Motional Tokenization
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
85
51
0
05 Feb 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query
  and A Diffusion-based Approach
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
Shaofeng Zhang
Jinfa Huang
Qiang-feng Zhou
Zhibin Wang
Fan Wang
Jiebo Luo
Junchi Yan
DiffM
89
12
0
28 Jan 2024
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
114
273
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLMLRM
151
288
0
20 Dec 2023
Unleashing Large-Scale Video Generative Pre-training for Visual Robot
  Manipulation
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
118
111
0
20 Dec 2023
Efficient Large Language Models: A Survey
Efficient Large Language Models: A Survey
Zhongwei Wan
Xin Wang
Che Liu
Samiul Alam
Yu Zheng
...
Shen Yan
Yi Zhu
Quanlu Zhang
Mosharaf Chowdhury
Mi Zhang
LM&MA
43
136
0
06 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLMVLM
70
169
0
01 Dec 2023
ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with
  Diffusion Models
ART⋅\boldsymbol{\cdot}⋅V: Auto-Regressive Text-to-Video Generation with Diffusion Models
Wenming Weng
Ruoyu Feng
Yanhui Wang
Qi Dai
Chunyu Wang
...
Jianmin Bao
Yuhui Yuan
Chong Luo
Yueyi Zhang
Zhiwei Xiong
VGen
74
38
0
30 Nov 2023
GPT-4V(ision) as A Social Media Analysis Engine
GPT-4V(ision) as A Social Media Analysis Engine
Hanjia Lyu
Jinfa Huang
Daoan Zhang
Yongsheng Yu
Xinyi Mou
Jinsheng Pan
Zhengyuan Yang
Zhongyu Wei
Jiebo Luo
VLMMLLM
55
34
0
13 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application,
  and Challenge
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David Clifton
LM&MA
91
126
0
09 Nov 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image
  Alignment
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
89
202
0
17 Oct 2023
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
107
323
0
09 Oct 2023
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient
  Vision Transformers
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Shiyue Cao
Yueqin Yin
Lianghua Huang
Yu Liu
Xin Zhao
Deli Zhao
Kaiqi Huang
ViT
82
19
0
09 Oct 2023
Generating 3D Brain Tumor Regions in MRI using Vector-Quantization
  Generative Adversarial Networks
Generating 3D Brain Tumor Regions in MRI using Vector-Quantization Generative Adversarial Networks
Meng Zhou
Matthias W. Wagner
U. Tabori
C. Hawkins
B. Ertl-Wagner
Farzad Khalvati
MedIm
88
5
0
02 Oct 2023
PixArt-$α$: Fast Training of Diffusion Transformer for
  Photorealistic Text-to-Image Synthesis
PixArt-ααα: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Junsong Chen
Jincheng Yu
Chongjian Ge
Lewei Yao
Enze Xie
...
Zhongdao Wang
James T. Kwok
Ping Luo
Huchuan Lu
Zhenguo Li
DiffM
102
456
0
30 Sep 2023
Finite Scalar Quantization: VQ-VAE Made Simple
Finite Scalar Quantization: VQ-VAE Made Simple
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
96
187
0
27 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
93
504
0
11 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual
  Tokenization
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Di Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLMVLM
57
49
0
09 Sep 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction
  Tuning
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
86
142
0
05 Sep 2023
Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code
  Diffusion using Transformers
Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers
Abril Corona-Figueroa
Sam Bond-Taylor
Neelanjan Bhowmik
Yona Falinie A. Gaus
T. Breckon
Hubert P. H. Shum
Chris G. Willcocks
DiffM
74
4
0
27 Aug 2023
Structured World Models from Human Videos
Structured World Models from Human Videos
Russell Mendonca
Shikhar Bahl
Deepak Pathak
LM&Ro
99
99
0
21 Aug 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
84
150
0
20 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
364
12,044
0
18 Jul 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
68
52
0
30 Jun 2023
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Zeyue Xue
Guanglu Song
Qiushan Guo
Boxiao Liu
Zhuofan Zong
Yu Liu
Ping Luo
DiffM
116
136
0
29 May 2023
Pre-training Contextualized World Models with In-the-wild Videos for
  Reinforcement Learning
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
Jialong Wu
Haoyu Ma
Chao Deng
Mingsheng Long
OffRL
66
32
0
29 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
103
223
0
25 May 2023
Towards Accurate Image Coding: Improved Autoregressive Image Generation
  with Dynamic Vector Quantization
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization
Mengqi Huang
Zhendong Mao
Zhuowei Chen
Yongdong Zhang
MQ
116
41
0
19 May 2023
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive
  Transformers
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Rong Wu
Wanchao Su
Kede Ma
Jing Liao
84
41
0
27 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
569
4,910
0
17 Apr 2023
Chain of Thought Prompt Tuning in Vision Language Models
Chain of Thought Prompt Tuning in Vision Language Models
Jiaxin Ge
Hongyin Luo
Siyuan Qian
Yulu Gan
Jie Fu
Shanghang Zhang
VLMLRMMLLM
90
29
0
16 Apr 2023
Human Preference Score: Better Aligning Text-to-Image Models with Human
  Preference
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
Xiaoshi Wu
Keqiang Sun
Feng Zhu
Rui Zhao
Hongsheng Li
100
159
0
25 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
Scaling up GANs for Text-to-Image Synthesis
Scaling up GANs for Text-to-Image Synthesis
Minguk Kang
Jun-Yan Zhu
Richard Y. Zhang
Jaesik Park
Eli Shechtman
Sylvain Paris
Taesung Park
85
478
0
09 Mar 2023
Understanding Diffusion Objectives as the ELBO with Simple Data
  Augmentation
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Diederik P. Kingma
Ruiqi Gao
DiffM
52
142
0
01 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,437
0
27 Feb 2023
Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be
  Consistent
Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be Consistent
Giannis Daras
Y. Dagan
A. Dimakis
C. Daskalakis
DiffM
97
48
0
17 Feb 2023
Video Probabilistic Diffusion Models in Projected Latent Space
Video Probabilistic Diffusion Models in Projected Latent Space
Sihyun Yu
Kihyuk Sohn
Subin Kim
Jinwoo Shin
VGenDiffM
92
170
0
15 Feb 2023
Language Quantized AutoEncoders: Towards Unsupervised Text-Image
  Alignment
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
Hao Liu
Wilson Yan
Pieter Abbeel
74
25
0
02 Feb 2023
UPop: Unified and Progressive Pruning for Compressing Vision-Language
  Transformers
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Ying Jin
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLMViT
74
39
0
31 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,641
0
30 Jan 2023
T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete
  Representations
T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
Jianrong Zhang
Yangsong Zhang
Xiaodong Cun
Shaoli Huang
Yong Zhang
Hongwei Zhao
Hongtao Lu
Xiaodong Shen
104
356
0
15 Jan 2023
Previous
12345
Next