Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.10465
Cited By
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
20 April 2023
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Implicit Temporal Modeling with Learnable Alignment for Video Recognition"
11 / 11 papers shown
Title
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Congqi Cao
Peiheng Han
Y. Zhang
Yating Yu
Qinyi Lv
Lingtong Min
Yanning Zhang
VLM
47
0
0
09 May 2025
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
Ju He
Zhi-Qi Cheng
Chenyang Li
Wangmeng Xiang
Binghui Chen
Bin Luo
Yifeng Geng
Xuansong Xie
AI4CE
37
20
0
30 Mar 2023
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
Hanyuan Chen
Ju He
Wangmeng Xiang
Zhi-Qi Cheng
Wei Liu
Han-Wen Liu
Bin Luo
Yifeng Geng
Xuansong Xie
ViT
24
31
0
03 Feb 2023
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
175
435
0
04 Dec 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
194
387
0
06 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
560
0
28 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
152
362
0
17 Sep 2021
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Kelvin C. K. Chan
Shangchen Zhou
Xiangyu Xu
Chen Change Loy
160
392
0
27 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
251
577
0
22 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
337
3,720
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
1