ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.19757
  4. Cited By
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

25 March 2025
Zhi Hou
Tianyi Zhang
Yuwen Xiong
Haonan Duan
Hengjun Pu
Ronglei Tong
Chengyang Zhao
X. Zhu
Yu Qiao
Jifeng Dai
Yuxiao Chen
ArXiv (abs)PDFHTML

Papers citing "Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy"

28 / 28 papers shown
Title
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Jiaming Zhou
Ke Ye
Jiayi Liu
Teli Ma
Zifang Wang
Ronghe Qiu
Kun-Yu Lin
Zhilin Zhao
Junwei Liang
94
2
0
21 May 2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
Junjie Wen
Yinlin Zhu
Jinming Li
Zhibin Tang
Yaxin Peng
Feifei Feng
VLM
113
27
0
09 Feb 2025
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
290
2
1
01 Nov 2024
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
Kyle Hatch
Ashwin Balakrishna
Oier Mees
Suraj Nair
Seohong Park
...
Masha Itkina
Benjamin Eysenbach
Sergey Levine
Thomas Kollar
Benjamin Burchfiel
107
4
0
26 Oct 2024
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu
Lingxuan Wu
Bangguo Li
Hengkai Tan
Huayu Chen
Zhengyi Wang
Ke Xu
Hang Su
Jun Zhu
125
125
0
10 Oct 2024
Autoregressive Image Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
138
238
0
17 Jun 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
...
Thomas Kollar
Sergey Levine
Chelsea Finn
Sergey Levine
Chelsea Finn
233
226
0
19 Mar 2024
RT-H: Action Hierarchies Using Language
RT-H: Action Hierarchies Using Language
Suneel Belkhale
Tianli Ding
Ted Xiao
P. Sermanet
Quon Vuong
Jonathan Tompson
Yevgen Chebotar
Debidatta Dwibedi
Dorsa Sadigh
LM&Ro
106
89
0
04 Mar 2024
SkillDiffuser: Interpretable Hierarchical Planning via Skill
  Abstractions in Diffusion-Based Task Execution
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Zhixuan Liang
Yao Mu
Hengbo Ma
Masayoshi Tomizuka
Mingyu Ding
Ping Luo
117
39
0
18 Dec 2023
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
A. Sridhar
Dhruv Shah
Catherine Glossop
Sergey Levine
110
127
0
11 Oct 2023
Goal Representations for Instruction Following: A Semi-Supervised
  Language Interface to Control
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control
Vivek Myers
Andre Wang He
Kuan Fang
Homer Walke
Philippe Hansen-Estruch
Ching-An Cheng
Mihai Jalobeanu
Andrey Kolobov
Anca Dragan
Sergey Levine
LM&Ro
75
31
0
30 Jun 2023
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi
Zhenjia Xu
S. Feng
Eric A. Cousineau
Yilun Du
Benjamin Burchfiel
Russ Tedrake
Shuran Song
349
1,242
0
07 Mar 2023
ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills
ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills
Jiayuan Gu
Fanbo Xiang
Xuanlin Li
Z. Ling
Xiqiang Liu
...
Xiao Yuan
P. Xie
Zhiao Huang
Rui Chen
Hao Su
115
189
0
09 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
432
4,656
0
30 Jan 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
120
2,436
0
19 Dec 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
213
1,835
0
17 Nov 2022
GNM: A General Navigation Model to Drive Any Robot
GNM: A General Navigation Model to Drive Any Robot
Dhruv Shah
A. Sridhar
Arjun Bhorkar
Noriaki Hirose
Sergey Levine
108
117
0
07 Oct 2022
What Matters in Language Conditioned Robotic Imitation Learning over
  Unstructured Data
What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data
Oier Mees
Lukás Hermann
Wolfram Burgard
LM&Ro
107
155
0
13 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
511
15,788
0
20 Dec 2021
CALVIN: A Benchmark for Language-Conditioned Policy Learning for
  Long-Horizon Robot Manipulation Tasks
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Oier Mees
Lukás Hermann
Erick Rosete-Beas
Wolfram Burgard
LM&Ro
122
263
0
06 Dec 2021
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale
  Demonstrations
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations
Tongzhou Mu
Z. Ling
Fanbo Xiang
Derek Yang
Xuanlin Li
Stone Tao
Zhiao Huang
Zhiwei Jia
Hao Su
120
138
0
30 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
502
10,526
0
17 Jun 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
306
7,958
0
11 May 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
29,926
0
26 Feb 2021
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit Models
Jiaming Song
Chenlin Meng
Stefano Ermon
VLMDiffM
304
7,500
0
06 Oct 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
759
18,408
0
19 Jun 2020
Language Conditioned Imitation Learning over Unstructured Data
Language Conditioned Imitation Learning over Unstructured Data
Corey Lynch
P. Sermanet
LM&Ro
84
251
0
15 May 2020
FiLM: Visual Reasoning with a General Conditioning Layer
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAttAIMatOffRLAI4CE
372
2,239
0
22 Sep 2017
1