Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.10990
Cited By
Learning To Recognize Procedural Activities with Distant Supervision
26 January 2022
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning To Recognize Procedural Activities with Distant Supervision"
18 / 68 papers shown
Title
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Hung-Ting Su
Yulei Niu
Xudong Lin
Winston H. Hsu
Shih-Fu Chang
VGen
ELM
21
6
0
07 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
25
38
0
31 Mar 2023
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Yiwu Zhong
Licheng Yu
Yang Bai
Shangwen Li
Xueting Yan
Yin Li
AI4TS
32
31
0
31 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
41
94
0
25 Mar 2023
Learning and Verification of Task Structure in Instructional Videos
Medhini Narasimhan
Licheng Yu
Sean Bell
Ning Zhang
Trevor Darrell
68
19
0
23 Mar 2023
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
23
11
0
22 Mar 2023
Efficient Movie Scene Detection using State-Space Transformers
Md. Mohaiminul Islam
Mahmudul Hasan
Kishan Athrey
Tony Braskich
Gedas Bertasius
ViT
36
44
0
29 Dec 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
18
68
0
12 Oct 2022
Turbo Training with Token Dropout
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
26
10
0
10 Oct 2022
Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang
Manling Li
Xudong Lin
Hairong Lv
A. Schwing
Heng Ji
VLM
19
23
0
09 Oct 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
30
61
0
08 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
28
5
0
30 Sep 2022
Multimedia Generative Script Learning for Task Planning
Qingyun Wang
Manling Li
Hou Pong Chan
Lifu Huang
J. Hockenmaier
Girish Chowdhary
Heng Ji
VGen
29
10
0
25 Aug 2022
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin
Simran Tiwari
Shiyuan Huang
Manling Li
Mike Zheng Shou
Heng Ji
Shih-Fu Chang
25
20
0
05 Jun 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLM
VLM
170
137
0
22 May 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
40
102
0
04 Apr 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
558
0
28 Sep 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,982
0
09 Feb 2021
Previous
1
2