Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07944
Cited By
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
10 March 2024
Deshun Yang
Luhui Hu
Yu Tian
Zihao Li
Chris Kelly
Bang Yang
Cindy Yang
Yuexian Zou
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs"
11 / 11 papers shown
Title
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang
Jinghao Li
Yu-Wing Tai
DiffM
64
0
0
25 Mar 2025
Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models
Peiran Wang
Linjie Tong
Jiaxiang Liu
Zuozhu Liu
VLM
MoE
43
0
0
10 Feb 2025
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue
Xiangyu Yin
Boyuan Yang
Wei Gao
DiffM
VGen
80
9
0
30 Nov 2024
Quantum Inverse Contextual Vision Transformers (Q-ICVT): A New Frontier in 3D Object Detection for AVs
Sanjay Bhargav Dharavath
Tanmoy Dam
Supriyo Chakraborty
Prithwiraj Roy
Aniruddha Maiti
ViT
26
1
0
20 Aug 2024
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding
Shaobin Zhuang
Kunchang Li
Zhengrong Yue
Yu Qiao
Yali Wang
VGen
30
2
0
20 Aug 2024
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents
Wenjia Xu
Zijian Yu
Yixu Wang
Jiuniu Wang
Mugen Peng
LLMAG
48
7
0
11 Jun 2024
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo
Minfei Shi
Muhammad Osama Khan
Muhammad Muneeb Afzal
Hao Huang
...
Luo Song
Ava Kouhana
T. Elze
Yi Fang
Mengyu Wang
VLM
38
30
0
29 Mar 2024
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
Chris Kelly
Luhui Hu
Jiayin Hu
Yu Tian
Deshun Yang
Bang Yang
Cindy Yang
Zihao Li
Zaoshan Huang
Yuexian Zou
44
2
0
14 Mar 2024
VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Chris Kelly
Luhui Hu
Bang Yang
Yu Tian
Deshun Yang
Cindy Yang
Zaoshan Huang
Zihao Li
Jiayin Hu
Yuexian Zou
37
9
0
14 Mar 2024
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
254
565
0
29 May 2022
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
231
4,460
0
23 Jan 2020
1