Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.10533
Cited By
Joint learning of images and videos with a single Vision Transformer
21 August 2023
Shuki Shimizu
Toru Tamaki
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Joint learning of images and videos with a single Vision Transformer"
3 / 3 papers shown
Title
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
226
226
0
20 Jan 2022
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,984
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
204
422
0
01 Feb 2021
1