Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.10802
Cited By
Meta-Transformer: A Unified Framework for Multimodal Learning
20 July 2023
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Meta-Transformer: A Unified Framework for Multimodal Learning"
50 / 102 papers shown
Title
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation
Zhitong Xiong
Yi Wang
Fahong Zhang
Adam J. Stewart
Joelle Hanna
Damian Borth
Ioannis Papoutsis
B. L. Saux
Gustau Camps-Valls
Xiao Xiang Zhu
AI4CE
78
12
0
22 Mar 2024
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
36
3
0
19 Mar 2024
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu
Xueye Zheng
Jiazhou Zhou
Lin Wang
32
16
0
19 Mar 2024
Tree-Regularized Tabular Embeddings
Xuan Li
Yunhe Wang
Boqian Li
LMTD
38
3
0
01 Mar 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
41
13
0
19 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
48
45
0
08 Feb 2024
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
Yiyuan Zhang
Yuhao Kang
Zhixin Zhang
Xiaohan Ding
Sanyuan Zhao
Xiangyu Yue
VGen
60
4
0
05 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
51
13
0
05 Feb 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
22
7
0
25 Jan 2024
Transformer for Object Re-Identification: A Survey
Mang Ye
Shuo Chen
Chenyue Li
Wei-Shi Zheng
David J. Crandall
Bo Du
ViT
98
13
0
13 Jan 2024
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis
Kebin Wu
Wenbin Li
Xiaofei Xiao
21
3
0
05 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
39
14
0
31 Dec 2023
MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
Liang Peng
Songyue Cai
Zongqian Wu
Huifang Shang
Xiaofeng Zhu
Xiaoxiao Li
37
9
0
22 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
51
3
0
15 Dec 2023
ShareCMP: Polarization-Aware RGB-P Semantic Segmentation
Zhuoyan Liu
Bo Wang
Lizhi Wang
Chenyu Mao
Ye Li
28
1
0
06 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
74
35
0
05 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
31
10
0
05 Dec 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Xiaohan Ding
Yiyuan Zhang
Yixiao Ge
Sijie Zhao
Lin Song
Xiangyu Yue
Ying Shan
VLM
AI4TS
SSL
29
101
0
27 Nov 2023
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
26
27
0
24 Nov 2023
T-Rex: Counting by Visual Prompting
Qing Jiang
Feng Li
Tianhe Ren
Shilong Liu
Zhaoyang Zeng
Kent Yu
Lei Zhang
18
11
0
22 Nov 2023
RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
Stefanos-Iordanis Papadopoulos
C. Koutlis
Symeon Papadopoulos
P. Petrantonakis
24
9
0
16 Nov 2023
Aria-NeRF: Multimodal Egocentric View Synthesis
Jiankai Sun
Jianing Qiu
Chuanyang Zheng
Johnathan Tucker
Javier Yu
Mac Schwager
EgoV
35
5
0
11 Nov 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
21
2
0
25 Oct 2023
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Mohammadreza Salehi
Mehrdad Farajtabar
Maxwell Horton
Fartash Faghri
Hadi Pouransari
Raviteja Vemulapalli
Oncel Tuzel
Ali Farhadi
Mohammad Rastegari
Sachin Mehta
CLIP
VLM
48
1
0
21 Oct 2023
EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs
Xiangyu Zhao
Bo Liu
Qijiong Liu
Guangyuan Shi
Xiao-Ming Wu
VLM
DiffM
21
7
0
13 Oct 2023
Large Language Models Are Zero-Shot Time Series Forecasters
Nate Gruver
Marc Finzi
Shikai Qiu
Andrew Gordon Wilson
AI4TS
33
319
0
11 Oct 2023
Uni3D: Exploring Unified 3D Representation at Scale
Junsheng Zhou
Jinsheng Wang
Baorui Ma
Yu-Shen Liu
Tiejun Huang
Xinlong Wang
40
88
0
10 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
24
4
0
10 Oct 2023
SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training
Kazem Meidani
Parshin Shojaee
Chandan K. Reddy
A. Farimani
18
18
0
03 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
32
204
0
03 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
40
20
0
01 Oct 2023
Pre-training on Synthetic Driving Data for Trajectory Prediction
Yiheng Li
Seth Z. Zhao
Chenfeng Xu
Chen Tang
Chenran Li
Mingyu Ding
M. Tomizuka
Wei Zhan
40
11
0
18 Sep 2023
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models
Dimitris Spathis
F. Kawsar
AI4TS
29
18
0
12 Sep 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Peng Gao
Xianzhi Li
Hongsheng Li
Pheng-Ann Heng
MLLM
35
125
0
01 Sep 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng-Tao Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
MQ
15
176
0
25 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
33
1
0
20 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
21
38
0
18 Aug 2023
Language is All a Graph Needs
Ruosong Ye
Caiqi Zhang
Runhui Wang
Shuyuan Xu
Yongfeng Zhang
AI4CE
63
151
0
14 Aug 2023
General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Societal Implications and Responsible Governance
I. Triguero
Daniel Molina
Javier Poyatos
Javier Del Ser
Francisco Herrera
AI4TS
AI4MH
34
5
0
26 Jul 2023
Large Generative AI Models for Telecom: The Next Big Thing?
Lina Bariah
Qiyang Zhao
Han Zou
Yu Tian
Faouzi Bader
Merouane Debbah
AI4CE
79
64
0
17 Jun 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MH
LM&MA
42
127
0
21 Mar 2023
General-Purpose In-Context Learning by Meta-Learning Transformers
Louis Kirsch
James Harrison
Jascha Narain Sohl-Dickstein
Luke Metz
40
72
0
08 Dec 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
235
1,024
0
13 Oct 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
289
3,623
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,982
0
09 Feb 2021
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Haoyi Zhou
Shanghang Zhang
J. Peng
Shuai Zhang
Jianxin Li
Hui Xiong
Wan Zhang
AI4TS
169
3,885
0
14 Dec 2020
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
Xin Huang
A. Khetan
Milan Cvitkovic
Zohar Karnin
ViT
LMTD
157
417
0
11 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Previous
1
2
3
Next