Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.02968
Cited By
Temporal Alignment Networks for Long-term Video
6 April 2022
Tengda Han
Weidi Xie
Andrew Zisserman
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Temporal Alignment Networks for Long-term Video"
28 / 28 papers shown
Title
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
72
0
0
18 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
79
1
0
31 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
99
0
0
04 Dec 2024
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
65
2
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
65
5
0
31 Jul 2024
Video Editing for Video Retrieval
Bin Zhu
Kevin Flanagan
A. Fragomeni
Michael Wray
Dima Damen
CLIP
47
0
0
04 Feb 2024
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
62
5
0
21 Dec 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
34
27
0
25 Sep 2023
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
Jiani Huang
Ziyang Li
Mayur Naik
Ser-Nam Lim
68
3
0
15 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
44
76
0
06 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
63
7
0
29 Mar 2023
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
Jiahao Zhang
A. Cherian
Yanbin Liu
Yizhak Ben-Shabat
Cristian Rodriguez-Opazo
Stephen Gould
37
8
0
24 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
73
225
0
27 Feb 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
29
4
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
115
36
0
05 Jan 2023
Learning Video Representations from Large Language Models
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
42
169
0
08 Dec 2022
Temporal Action Segmentation: An Analysis of Modern Techniques
Guodong Ding
Fadime Sener
Angela Yao
73
77
0
19 Oct 2022
Turbo Training with Token Dropout
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
44
10
0
10 Oct 2022
Multimodal Learning with Transformers: A Survey
Peng Xu
Xiatian Zhu
David Clifton
ViT
84
538
0
13 Jun 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
135
62
0
17 May 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
264
562
0
28 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
353
3,760
0
11 Feb 2021
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
217
311
0
19 Oct 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
278
929
0
24 Sep 2019
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Tianwei Lin
Xu Zhao
Haisheng Su
Chongjing Wang
Ming Yang
141
700
0
08 Jun 2018
Soft-DTW: a Differentiable Loss Function for Time-Series
Marco Cuturi
Mathieu Blondel
AI4TS
141
614
0
05 Mar 2017
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
338
31,348
0
16 Jan 2013
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
84
584
0
18 Dec 2012
1