Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.11782
Cited By
v1
v2 (latest)
Learning Object State Changes in Videos: An Open-World Perspective
19 December 2023
Zihui Xue
Kumar Ashutosh
Kristen Grauman
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Object State Changes in Videos: An Open-World Perspective"
32 / 32 papers shown
Title
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
114
1
0
27 Mar 2025
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
215
1
0
03 Dec 2024
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou
Ziqi Pang
Yu-Xiong Wang
VOS
115
7
0
12 Jun 2024
Breaking the "Object" in Video Object Segmentation
P. Tokmakov
Jie Li
Adrien Gaidon
VOS
72
40
0
12 Dec 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
139
331
0
06 Dec 2022
Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning
Xiangyu Li
Xu Yang
Kun-Juan Wei
Cheng Deng
Muli Yang
CoGe
76
70
0
29 Jun 2022
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLM
EgoV
84
206
0
03 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
78
46
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
177
1,309
0
04 May 2022
Temporal Alignment Networks for Long-term Video
Tengda Han
Weidi Xie
Andrew Zisserman
AI4TS
90
88
0
06 Apr 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
73
33
0
22 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
555
4,413
0
28 Jan 2022
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
100
381
0
08 Dec 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
410
1,114
0
13 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
118
49
0
05 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
313
582
0
28 Sep 2021
Temporal RoI Align for Video Object Recognition
Tao Gong
Kai-xiang Chen
Xinjiang Wang
Qi Chu
Feng Zhu
Dahua Lin
Nenghai Yu
Huamin Feng
67
85
0
08 Sep 2021
Learning to Predict Visual Attributes in the Wild
Khoi Pham
Kushal Kafle
Zhe Lin
Zhi Ding
Scott D. Cohen
Q. Tran
Abhinav Shrivastava
45
113
0
17 Jun 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
298
920
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
463
3,901
0
11 Feb 2021
Learning Graph Embeddings for Compositional Zero-shot Learning
Muhammad Ferjad Naeem
Yongqin Xian
Federico Tombari
Zeynep Akata
CoGe
57
140
0
03 Feb 2021
Open World Compositional Zero-Shot Learning
Massimiliano Mancini
Muhammad Ferjad Naeem
Yongqin Xian
Zeynep Akata
CoGe
146
130
0
29 Jan 2021
Open-Vocabulary Object Detection Using Captions
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
VLM
ObjD
139
433
0
20 Nov 2020
Understanding Human Hands in Contact at Internet Scale
Dandan Shan
Jiaqi Geng
Michelle Shu
David Fouhey
108
325
0
11 Jun 2020
Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks
Joanna Materzynska
Tete Xiao
Roei Herzig
Huijuan Xu
Xiaolong Wang
Trevor Darrell
CoGe
55
176
0
20 Dec 2019
Oops! Predicting Unintentional Action in Video
Dave Epstein
Boyuan Chen
Carl Vondrick
109
103
0
25 Nov 2019
Self-supervised 6D Object Pose Estimation for Robot Manipulation
Xinke Deng
Yu Xiang
Arsalan Mousavian
Clemens Eppner
Timothy Bretl
Dieter Fox
3DPC
SSL
96
187
0
23 Sep 2019
Deep Learning in Video Multi-Object Tracking: A Survey
Gioele Ciaparrone
Francisco Luque Sánchez
Siham Tabik
L. Troiano
R. Tagliaferri
Francisco Herrera
VOT
85
575
0
18 Jul 2019
Procedure Planning in Instructional Videos
C. Chang
De-An Huang
Danfei Xu
Ehsan Adeli
Li Fei-Fei
Juan Carlos Niebles
77
103
0
02 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
122
1,208
0
07 Jun 2019
Video Object Segmentation and Tracking: A Survey
Rui Yao
Guosheng Lin
Shixiong Xia
Jiaqi Zhao
Yong Zhou
VOS
63
148
0
19 Apr 2019
Joint Discovery of Object States and Manipulation Actions
Jean-Baptiste Alayrac
Josef Sivic
Ivan Laptev
Simon Lacoste-Julien
86
79
0
09 Feb 2017
1