ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.00785
  4. Cited By
Sequential Modeling Enables Scalable Learning for Large Vision Models

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 December 2023
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Sequential Modeling Enables Scalable Learning for Large Vision Models"

50 / 126 papers shown
Title
Fluid: Scaling Autoregressive Text-to-image Generative Models with
  Continuous Tokens
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLM
DiffM
48
43
0
17 Oct 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified
  Perspective
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Yongxin Zhu
B. Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing
DiffM
42
9
0
16 Oct 2024
EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by
  Autoregressive Pre-training
EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training
Tongtian Yue
Shuning Xue
Xuange Gao
Yepeng Tang
Longteng Guo
Jie Jiang
Qingbin Liu
32
4
0
14 Oct 2024
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature
  Aggregation
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation
Shun Qian
Bingquan Liu
Chengjie Sun
Zhen Xu
Baoxun Wang
36
0
0
14 Oct 2024
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
  Transformer for Efficient Finegrained Image Generation
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Liang Chen
Sinan Tan
Zefan Cai
Weichu Xie
Haozhe Zhao
Yichi Zhang
Junyang Lin
Jinze Bai
Tianyu Liu
Baobao Chang
ViT
58
3
0
02 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused
  Policies
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
43
3
0
25 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
66
11
0
23 Sep 2024
SimMAT: Exploring Transferability from Vision Foundation Models to Any
  Image Modality
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei
Liyi Chen
Jun Cen
Xiao Chen
Zhen Lei
Felix Heide
Ziwei Liu
Qifeng Chen
Zhaoxiang Zhang
55
0
0
12 Sep 2024
An overview of domain-specific foundation model: key technologies,
  applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challenges
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALM
VLM
69
4
0
06 Sep 2024
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Junjie Li
Yang Liu
Weiqing Liu
Shikai Fang
Lewen Wang
Chang Xu
Jiang Bian
VGen
48
4
0
04 Sep 2024
In-Context Imitation Learning via Next-Token Prediction
In-Context Imitation Learning via Next-Token Prediction
Letian Fu
Huang Huang
Gaurav Datta
Lawrence Yunliang Chen
William Chung-Ho Panitch
Fangchen Liu
Hui Li
Ken Goldberg
LM&Ro
37
16
0
28 Aug 2024
Leveraging Hallucinations to Reduce Manual Prompt Dependency in
  Promptable Segmentation
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
Jian Hu
Jiayi Lin
Junchi Yan
Shaogang Gong
VLM
44
7
0
27 Aug 2024
Image Segmentation in Foundation Model Era: A Survey
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
45
5
0
23 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
47
63
0
22 Aug 2024
Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation
Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation
Cheems Wang
Yiqin Lv
Yixiu Mao
Yun Qu
Yi Tian Xu
Xiangyang Ji
OOD
TTA
88
7
0
28 Jul 2024
QueST: Self-Supervised Skill Abstractions for Learning Continuous
  Control
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Atharva Mete
Haotian Xue
Albert Wilcox
Yongxin Chen
Animesh Garg
SSL
40
17
0
22 Jul 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
48
8
0
13 Jul 2024
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Lecheng Kong
Jiarui Feng
Hao Liu
Chengsong Huang
Jiaxin Huang
Yixin Chen
Muhan Zhang
AI4CE
77
8
0
12 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
47
3
0
10 Jul 2024
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for
  Interleaved Image-Text Generation
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Ethan Chern
Jiadi Su
Yan Ma
Pengfei Liu
MLLM
34
29
0
08 Jul 2024
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
Xingyu Xie
Zhijie Lin
Kim-Chuan Toh
Pan Zhou
40
2
0
05 Jul 2024
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution
  Sequential Tokenization
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization
Yucheng Tang
Yufan He
Vishwesh Nath
Pengfeig Guo
Ruining Deng
...
Ziyue Xu
Holger Roth
Daguang Xu
Haichun Yang
Yuankai Huo
30
4
0
03 Jul 2024
Segment Anything without Supervision
Segment Anything without Supervision
Xudong Wang
Jingfeng Yang
Trevor Darrell
VLM
43
10
0
28 Jun 2024
Learning Modality Knowledge Alignment for Cross-Modality Transfer
Learning Modality Knowledge Alignment for Cross-Modality Transfer
Wenxuan Ma
Shuang Li
Lincan Cai
Jingxuan Kang
42
1
0
27 Jun 2024
Unified Auto-Encoding with Masked Diffusion
Unified Auto-Encoding with Masked Diffusion
Philippe Hansen-Estruch
S. Vishwanath
Amy Zhang
Manan Tomar
DiffM
63
1
0
25 Jun 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
  99%
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
48
34
0
17 Jun 2024
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Zhe-nan Lin
Rita Singh
Bhiksha Raj
DiffM
43
21
0
14 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
49
0
0
13 Jun 2024
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin
Ilya Zisman
Alexey Zemtsov
Viacheslav Sinii
120
5
0
13 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
60
85
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
68
230
0
10 Jun 2024
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Sucheng Ren
Xiaoke Huang
Xianhang Li
Junfei Xiao
Jieru Mei
Zeyu Wang
Alan Yuille
Yuyin Zhou
MedIm
48
7
0
08 Jun 2024
The Scaling Law in Stellar Light Curves
The Scaling Law in Stellar Light Curves
Jiashu Pan
Yuan-Sen Ting
Yang Huang
Jie Yu
Ji-Feng Liu
29
0
0
27 May 2024
Position: Foundation Agents as the Paradigm Shift for Decision Making
Position: Foundation Agents as the Paradigm Shift for Decision Making
Xiaoqian Liu
Xingzhou Lou
Jianbin Jiao
Junge Zhang
OffRL
LLMAG
47
6
0
27 May 2024
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture
  Token Prediction
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction
Yinda Chen
Haoyuan Shi
Xiaoyu Liu
Te Shi
Ruobing Zhang
Dong Liu
Zhiwei Xiong
Feng Wu
44
9
0
27 May 2024
TrojFM: Resource-efficient Backdoor Attacks against Very Large
  Foundation Models
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Yuzhou Nie
Yanting Wang
Jinyuan Jia
Michael J. De Lucia
Nathaniel D. Bastian
Wenbo Guo
Dawn Song
SILM
AAML
38
5
0
27 May 2024
Semantica: An Adaptable Image-Conditioned Diffusion Model
Semantica: An Adaptable Image-Conditioned Diffusion Model
Manoj Kumar
N. Houlsby
Emiel Hoogeboom
DiffM
VLM
40
0
0
23 May 2024
Efficiency for Free: Ideal Data Are Transportable Representations
Efficiency for Free: Ideal Data Are Transportable Representations
Peng Sun
Yi Jiang
Tao Lin
DD
48
1
0
23 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World
  Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
87
38
0
06 May 2024
In-Context Translation: Towards Unifying Image Recognition, Processing,
  and Generation
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation
Han Xue
Qianru Sun
Li-Na Song
Wenjun Zhang
Zhiwu Huang
MLLM
44
0
0
15 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLM
VLM
55
26
0
10 Apr 2024
Adapting LLaMA Decoder to Vision Transformer
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang
Wenqi Shao
Yonghong Tian
Chengyue Wu
Yong Liu
Taiqiang Wu
Kaipeng Zhang
Songyang Zhang
Kai-xiang Chen
Ping Luo
MLLM
40
4
0
10 Apr 2024
Finding Visual Task Vectors
Finding Visual Task Vectors
Alberto Hojel
Yutong Bai
Trevor Darrell
Amir Globerson
Amir Bar
70
7
0
08 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Keyu Tian
Yi-Xin Jiang
Zehuan Yuan
Bingyue Peng
Liwei Wang
VGen
61
263
0
03 Apr 2024
SegICL: A Multimodal In-context Learning Framework for Enhanced
  Segmentation in Medical Imaging
SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging
Lingdong Shen
Fangxin Shang
Xiaoshuang Huang
Yehui Yang
Haifeng Huang
Shiming Xiang
VLM
39
3
0
25 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
150
319
0
21 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
54
42
0
19 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language
  Models
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLM
MLLM
46
7
0
14 Mar 2024
Generative deep learning-enabled ultra-large field-of-view lens-free
  imaging
Generative deep learning-enabled ultra-large field-of-view lens-free imaging
Ronald B. Liu
Zhe Liu
Max G.A. Wolf
Krishna P. Purohit
Gregor Fritz
Yi Feng
C. G. Hansen
Pierre Bagnaninchi
X. C. I. Solvas
Yunjie Yang
MedIm
45
0
0
12 Mar 2024
Multi-modal Auto-regressive Modeling via Visual Words
Multi-modal Auto-regressive Modeling via Visual Words
Tianshuo Peng
Zuchao Li
Lefei Zhang
Hai Zhao
Ping Wang
Bo Du
OffRL
35
5
0
12 Mar 2024
Previous
123
Next