Revisiting Feature Prediction for Learning Visual Representations from Video

15 February 2024

Papers citing "Revisiting Feature Prediction for Learning Visual Representations from Video"

21 / 21 papers shown

Title
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions Qingwen Bu Yanting Yang Jisong Cai Shenyuan Gao Guanghui Ren Maoqing Yao Ping Luo Hongyang Li 119 0 0 09 May 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video Sonia Joseph Praneet Suresh Lorenz Hufe Edward Stevinson Robert Graham Yash Vadi Danilo Bzdok Sebastian Lapuschkin Lee Sharkey Blake A. Richards 72 0 0 28 Apr 2025
TimeCapsule: Solving the Jigsaw Puzzle of Long-Term Time Series Forecasting with Compressed Predictive Representations Yihang Lu Yangyang Xu Qitao Qing Xianwei Meng AI4TS 49 0 0 17 Apr 2025
TULIP: Towards Unified Language-Image Pretraining Zineng Tang Long Lian Seun Eisape Xudong Wang Roei Herzig Adam Yala Alane Suhr Trevor Darrell David M. Chan VLM CLIP MLLM 103 3 0 19 Mar 2025
Learning Actionable World Models for Industrial Process Control Peng Yan Ahmed Abdulkadir Gerrit A. Schatte Giulia Anguzzi Joonsu Gha Nikola Pascher Matthias Rosenthal Yunlong Gao Benjamin Grewe Thilo Stadelmann DRL AI4CE 49 0 0 03 Mar 2025
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations Marcin Przewiȩźlikowski Randall Balestriero Wojciech Jasiński Marek 'Smieja Bartosz Zieliñski 69 0 0 04 Dec 2024
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos Xinhao Liu J. Li Yichen Jiang Niranjan Sujay Z. Yang Juexiao Zhang John Abanes Jing Zhang Chen Feng 114 1 0 26 Nov 2024
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data Hugo Thimonier José Lucas De Melo Costa Fabrice Popineau Arpad Rimmel Bich-Liên Doan 53 1 0 07 Oct 2024
System 2 Reasoning Capabilities Are Nigh Scott C. Lowe VLM LRM 46 0 0 04 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture Dengsheng Chen Jie Hu Xiaoming Wei Enhua Wu DiffM 52 2 0 02 Oct 2024
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control Zichen Jeff Cui Hengkai Pan Aadhithya Iyer Siddhant Haldar Lerrel Pinto VGen 33 10 0 18 Sep 2024
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs Eui Jun Hwang Sukmin Cho Junmyeong Lee Jong C. Park SLR 76 4 0 20 Aug 2024
PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis Satoki Ishikawa Makoto Yamada Han Bao Yuki Takezawa 66 0 0 23 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining Samuel Lavoie Polina Kirichenko Mark Ibrahim Mahmoud Assran Andrew Gordon Wilson Aaron Courville Nicolas Ballas CLIP VLM 64 19 0 30 Apr 2024
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 305 7,443 0 11 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 259 558 0 28 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 317 5,785 0 29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 248 577 0 22 Apr 2021
Understanding self-supervised Learning Dynamics without Contrastive Pairs Yuandong Tian Xinlei Chen Surya Ganguli SSL 138 279 0 12 Feb 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 243 4,469 0 23 Jan 2020
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,198 0 01 Sep 2014