ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.00785
  4. Cited By
Sequential Modeling Enables Scalable Learning for Large Vision Models

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 December 2023
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
    MLLMVLM
ArXiv (abs)PDFHTML

Papers citing "Sequential Modeling Enables Scalable Learning for Large Vision Models"

50 / 129 papers shown
Title
X-Drive: Cross-modality consistent multi-sensor data synthesis for
  driving scenarios
X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios
Yichen Xie
Chenfeng Xu
C-T.John Peng
Shuqi Zhao
Nhat Ho
Alexander T. Pham
Mingyu Ding
Masayoshi Tomizuka
Weidong Zhan
DiffM
96
3
0
02 Nov 2024
Randomized Autoregressive Visual Generation
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGenDiffM
151
40
1
01 Nov 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
169
3
0
29 Oct 2024
PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
Tianhao Zhang
Zhixiang Chen
Lyudmila Mihaylova
341
2
0
27 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
94
3
0
18 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with
  Continuous Tokens
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLMDiffM
131
58
0
17 Oct 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified
  Perspective
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Yongxin Zhu
Bing Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing
DiffM
120
9
0
16 Oct 2024
EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by
  Autoregressive Pre-training
EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training
Tongtian Yue
Shuning Xue
Xuange Gao
Yepeng Tang
Longteng Guo
Jie Jiang
Qingbin Liu
59
7
0
14 Oct 2024
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature
  Aggregation
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation
Shun Qian
Bingquan Liu
Chengjie Sun
Zhen Xu
Baoxun Wang
64
0
0
14 Oct 2024
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
  Transformer for Efficient Finegrained Image Generation
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Liang Chen
Sinan Tan
Zefan Cai
Weichu Xie
Haozhe Zhao
Yichi Zhang
Junyang Lin
Jinze Bai
Tianyu Liu
Baobao Chang
ViT
103
4
0
02 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused
  Policies
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
93
5
0
25 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLMDiffM
202
16
0
23 Sep 2024
SimMAT: Exploring Transferability from Vision Foundation Models to Any
  Image Modality
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei
Liyi Chen
Jun Cen
Xiao Chen
Zhen Lei
Felix Heide
Ziwei Liu
Qifeng Chen
Zhaoxiang Zhang
97
0
0
12 Sep 2024
An overview of domain-specific foundation model: key technologies, applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challenges
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALMVLM
137
5
0
06 Sep 2024
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Junjie Li
Yang Liu
Weiqing Liu
Shikai Fang
Lewen Wang
Chang Xu
Jiang Bian
VGen
125
4
0
04 Sep 2024
In-Context Imitation Learning via Next-Token Prediction
In-Context Imitation Learning via Next-Token Prediction
Letian Fu
Huang Huang
Gaurav Datta
Lawrence Yunliang Chen
William Chung-Ho Panitch
Fangchen Liu
Hui Li
Ken Goldberg
LM&Ro
88
23
0
28 Aug 2024
Leveraging Hallucinations to Reduce Manual Prompt Dependency in
  Promptable Segmentation
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
Jian Hu
Jiayi Lin
Junchi Yan
Shaogang Gong
VLM
93
11
0
27 Aug 2024
Image Segmentation in Foundation Model Era: A Survey
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
151
13
0
23 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Forrest Iandola
VLM
162
92
0
22 Aug 2024
Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation
Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation
Cheems Wang
Yiqin Lv
Yixiu Mao
Yun Qu
Yi Tian Xu
Xiangyang Ji
OODTTA
183
9
0
28 Jul 2024
QueST: Self-Supervised Skill Abstractions for Learning Continuous
  Control
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Atharva Mete
Haotian Xue
Albert Wilcox
Yongxin Chen
Animesh Garg
SSL
153
22
0
22 Jul 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
127
12
0
13 Jul 2024
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Lecheng Kong
Jiarui Feng
Hao Liu
Chengsong Huang
Jiaxin Huang
Yixin Chen
Muhan Zhang
AI4CE
173
13
0
12 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
126
5
0
10 Jul 2024
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for
  Interleaved Image-Text Generation
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Ethan Chern
Jiadi Su
Yan Ma
Pengfei Liu
MLLM
95
43
0
08 Jul 2024
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
Xingyu Xie
Zhijie Lin
Kim-Chuan Toh
Pan Zhou
108
3
0
05 Jul 2024
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution
  Sequential Tokenization
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization
Yucheng Tang
Yufan He
Vishwesh Nath
Pengfeig Guo
Ruining Deng
...
Ziyue Xu
Holger Roth
Daguang Xu
Haichun Yang
Yuankai Huo
78
4
0
03 Jul 2024
Segment Anything without Supervision
Segment Anything without Supervision
Xudong Wang
Jingfeng Yang
Trevor Darrell
VLM
128
16
0
28 Jun 2024
Learning Modality Knowledge Alignment for Cross-Modality Transfer
Learning Modality Knowledge Alignment for Cross-Modality Transfer
Wenxuan Ma
Shuang Li
Lincan Cai
Jingxuan Kang
87
2
0
27 Jun 2024
Unified Auto-Encoding with Masked Diffusion
Unified Auto-Encoding with Masked Diffusion
Philippe Hansen-Estruch
S. Vishwanath
Amy Zhang
Manan Tomar
DiffM
103
1
0
25 Jun 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
  99%
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
104
40
0
17 Jun 2024
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Zhe Lin
Rita Singh
Bhiksha Raj
DiffM
97
27
0
14 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
82
0
0
13 Jun 2024
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin
Ilya Zisman
Alexey Zemtsov
Viacheslav Sinii
216
7
0
13 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLMViT
182
104
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
155
305
0
10 Jun 2024
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Sucheng Ren
Xiaoke Huang
Xianhang Li
Junfei Xiao
Jieru Mei
Zeyu Wang
Alan Yuille
Yuyin Zhou
MedIm
94
9
0
08 Jun 2024
The Scaling Law in Stellar Light Curves
The Scaling Law in Stellar Light Curves
Jiashu Pan
Yuan-Sen Ting
Yang Huang
Jie Yu
Ji-Feng Liu
31
0
0
27 May 2024
Position: Foundation Agents as the Paradigm Shift for Decision Making
Position: Foundation Agents as the Paradigm Shift for Decision Making
Xiaoqian Liu
Xingzhou Lou
Jianbin Jiao
Junge Zhang
OffRLLLMAG
105
7
0
27 May 2024
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture
  Token Prediction
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction
Yinda Chen
Haoyuan Shi
Xiaoyu Liu
Te Shi
Ruobing Zhang
Dong Liu
Zhiwei Xiong
Feng Wu
98
10
0
27 May 2024
TrojFM: Resource-efficient Backdoor Attacks against Very Large
  Foundation Models
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Yuzhou Nie
Yanting Wang
Jinyuan Jia
Michael J. De Lucia
Nathaniel D. Bastian
Wenbo Guo
Dawn Song
SILMAAML
110
6
0
27 May 2024
Semantica: An Adaptable Image-Conditioned Diffusion Model
Semantica: An Adaptable Image-Conditioned Diffusion Model
Manoj Kumar
N. Houlsby
Emiel Hoogeboom
DiffMVLM
103
0
0
23 May 2024
Efficiency for Free: Ideal Data Are Transportable Representations
Efficiency for Free: Ideal Data Are Transportable Representations
Peng Sun
Yi Jiang
Tao Lin
DD
127
2
0
23 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World
  Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGenLM&Ro
187
49
0
06 May 2024
In-Context Translation: Towards Unifying Image Recognition, Processing,
  and Generation
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation
Han Xue
Qianru Sun
Li Song
Wenjun Zhang
Zhiwu Huang
MLLM
77
0
0
15 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLMVLM
90
33
0
10 Apr 2024
Adapting LLaMA Decoder to Vision Transformer
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang
Wenqi Shao
Mengzhao Chen
Chengyue Wu
Yong Liu
Taiqiang Wu
Kaipeng Zhang
Songyang Zhang
Kai-xiang Chen
Ping Luo
MLLM
108
4
0
10 Apr 2024
Finding Visual Task Vectors
Finding Visual Task Vectors
Alberto Hojel
Yutong Bai
Trevor Darrell
Amir Globerson
Amir Bar
124
8
0
08 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Keyu Tian
Yi Jiang
Zehuan Yuan
Bingyue Peng
Liwei Wang
VGen
145
351
0
03 Apr 2024
SegICL: A Multimodal In-context Learning Framework for Enhanced
  Segmentation in Medical Imaging
SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging
Lingdong Shen
Fangxin Shang
Xiaoshuang Huang
Yehui Yang
Haifeng Huang
Shiming Xiang
VLM
123
3
0
25 Mar 2024
Previous
123
Next