ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11178
  4. Cited By
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
v1v2v3 (latest)

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

22 April 2021
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
    ViT
ArXiv (abs)PDFHTML

Papers citing "VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text"

10 / 360 papers shown
Title
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLMViT
151
50
0
17 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
108
282
0
09 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLMLRM
104
383
0
04 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or
  Strong Data Augmentations
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
108
330
0
03 Jun 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and
  Interpretable Visual Understanding
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan O. Arik
Tomas Pfister
ViT
96
174
0
26 May 2021
SiT: Self-supervised vIsion Transformer
SiT: Self-supervised vIsion Transformer
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
ViT
110
139
0
08 Apr 2021
Creativity and Machine Learning: A Survey
Creativity and Machine Learning: A Survey
Giorgio Franceschelli
Mirco Musolesi
VLMAI4CE
129
43
0
06 Apr 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
140
39
0
06 Mar 2021
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
170
534
0
22 Dec 2020
Unsupervised Learning of Visual Features by Contrasting Cluster
  Assignments
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCLSSL
334
4,109
0
17 Jun 2020
Previous
12345678