ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.12018
  4. Cited By
LocoMotion: Learning Motion-Focused Video-Language Representations
v1v2 (latest)

LocoMotion: Learning Motion-Focused Video-Language Representations

15 October 2024
Hazel Doughty
Fida Mohammad Thoker
Cees G. M. Snoek
ArXiv (abs)PDFHTML

Papers citing "LocoMotion: Learning Motion-Focused Video-Language Representations"

50 / 58 papers shown
Title
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
123
36
0
20 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLMMLLM
59
12
0
13 Feb 2024
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
469
4,444
0
09 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
181
110
0
29 May 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
110
168
0
28 Mar 2023
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
Davide Moltisanti
Frank Keller
Hakan Bilen
Laura Sevilla-Lara
108
7
0
27 Mar 2023
T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete
  Representations
T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
Jianrong Zhang
Yangsong Zhang
Xiaodong Cun
Shaoli Huang
Yong Zhang
Hongwei Zhao
Hongtao Lu
Xiaodong Shen
119
357
0
15 Jan 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
122
81
0
09 Dec 2022
Executing your Commands via Motion Diffusion in Latent Space
Executing your Commands via Motion Diffusion in Latent Space
Xin Chen
Biao Jiang
Wen Liu
Zilong Huang
Bin-Bin Fu
Tao Chen
Jingyi Yu
Gang Yu
VGenDiffM
156
363
0
08 Dec 2022
Human Motion Diffusion Model
Human Motion Diffusion Model
Guy Tevet
Sigal Raab
Brian Gordon
Yonatan Shafir
Daniel Cohen-Or
Amit H. Bermano
DiffMVGen
285
769
0
29 Sep 2022
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action
  Recognition
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Tianjiao Li
Lin Geng Foo
Qiuhong Ke
Hossein Rahmani
Anran Wang
Jinghua Wang
Jing Liu
70
23
0
03 Sep 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
83
114
0
07 Jun 2022
TEMOS: Generating diverse human motions from textual descriptions
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
136
391
0
25 Apr 2022
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers
  for Repetitive Action Counting
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Huazhang Hu
Sixun Dong
Yiqun Zhao
Dongze Lian
Zhengxin Li
Shenghua Gao
89
51
0
03 Apr 2022
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?
Fida Mohammad Thoker
Hazel Doughty
Piyush Bagad
Cees G. M. Snoek
SSL
79
19
0
27 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
110
19
0
23 Mar 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
520
15,788
0
20 Dec 2021
Relational Self-Attention: What's Missing in Attention for Video
  Understanding
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
77
29
0
02 Nov 2021
Motion-aware Contrastive Video Representation Learning via
  Foreground-background Merging
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding
Maomao Li
Tianyu Yang
Rui Qian
Haohang Xu
Qingyi Chen
Jue Wang
Hongkai Xiong
SSL
88
51
0
30 Sep 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIPVLM
313
582
0
28 Sep 2021
Video Pose Distillation for Few-Shot, Fine-Grained Sports Action
  Recognition
Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
James Hong
Matthew Fisher
Michael Gharbi
Kayvon Fatahalian
3DH
91
40
0
03 Sep 2021
Inter-intra Variant Dual Representations forSelf-supervised Video
  Recognition
Inter-intra Variant Dual Representations forSelf-supervised Video Recognition
Lin Zhang
Qi She
Zhengyang Shen
Changhu Wang
SSL
61
9
0
02 Jul 2021
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
Fanyi Xiao
Joseph Tighe
Davide Modolo
SSL
68
14
0
17 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
300
2,848
0
15 Jun 2021
Revisiting Skeleton-based Action Recognition
Revisiting Skeleton-based Action Recognition
Haodong Duan
Yue Zhao
Kai-xiang Chen
Dahua Lin
Bo Dai
3DH
95
500
0
28 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
106
86
0
19 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
176
1,190
0
01 Apr 2021
Self-supervised Motion Learning from Static Images
Self-supervised Motion Learning from Static Images
Ziyuan Huang
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Rong Jin
M. Ang
SSL
52
29
0
01 Apr 2021
VideoMoCo: Contrastive Video Representation Learning with Temporally
  Adversarial Examples
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
Tian Pan
Yibing Song
Tianyu Yang
Wenhao Jiang
Wei Liu
95
225
0
10 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
29,926
0
26 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
139
664
0
11 Feb 2021
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
127
422
0
14 Nov 2020
RSPNet: Relative Speed Perception for Unsupervised Video Representation
  Learning
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Peihao Chen
Deng Huang
Dongliang He
Xiang Long
Runhao Zeng
Shilei Wen
Mingkui Tan
Chuang Gan
SSL
73
134
0
27 Oct 2020
Self-supervised Co-training for Video Representation Learning
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
252
320
0
19 Oct 2020
Removing the Background by Adding the Background: Towards Background
  Robust Self-supervised Video Representation Learning
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
Jinpeng Wang
Yuting Gao
Ke Li
Yiqi Lin
A. J. Ma
Hao Cheng
Pai Peng
Feiyue Huang
Rongrong Ji
Xing Sun
SSL
116
97
0
12 Sep 2020
Rescaling Egocentric Vision
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
122
466
0
23 Jun 2020
Intra- and Inter-Action Understanding via Temporal Action Parsing
Intra- and Inter-Action Understanding via Temporal Action Parsing
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
51
71
0
20 May 2020
Context-aware and Scale-insensitive Temporal Repetition Counting
Context-aware and Scale-insensitive Temporal Repetition Counting
Huaidong Zhang
Xuemiao Xu
Guoqiang Han
Shengfeng He
41
49
0
18 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
115
102
0
08 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLMVLMOffRLAI4TS
131
504
0
01 May 2020
FineGym: A Hierarchical Video Dataset for Fine-grained Action
  Understanding
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
76
330
0
14 Apr 2020
Temporal Pyramid Network for Action Recognition
Temporal Pyramid Network for Action Recognition
Ceyuan Yang
Yinghao Xu
Jianping Shi
Bo Dai
Bolei Zhou
57
374
0
07 Apr 2020
Action Modifiers: Learning from Adverbs in Instructional Videos
Action Modifiers: Learning from Adverbs in Instructional Videos
Hazel Doughty
Ivan Laptev
W. Mayol-Cuevas
Dima Damen
103
30
0
13 Dec 2019
LVIS: A Dataset for Large Vocabulary Instance Segmentation
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISegVLM
113
1,379
0
08 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
122
1,208
0
07 Jun 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
103
556
0
06 Apr 2019
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
169
3,287
0
10 Dec 2018
Learning Motion in Feature Space: Locally-Consistent Deformable
  Convolution Networks for Fine-Grained Action Detection
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection
Khoi-Nguyen C. Mac
D. Joshi
Raymond A. Yeh
Jinjun Xiong
Rogerio Feris
Minh Do
74
42
0
21 Nov 2018
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
100
1,694
0
20 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,324
0
11 Oct 2018
12
Next