ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.02310
  4. Cited By
From Lifestyle Vlogs to Everyday Interactions

From Lifestyle Vlogs to Everyday Interactions

6 December 2017
David Fouhey
Weicheng Kuo
Alexei A. Efros
Jitendra Malik
ArXivPDFHTML

Papers citing "From Lifestyle Vlogs to Everyday Interactions"

35 / 35 papers shown
Title
Write What You Want: Applying Text-to-video Retrieval to Audiovisual
  Archives
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
Yuchen Yang
VGen
19
7
0
09 Oct 2023
Human-Object Interaction Prediction in Videos through Gaze Following
Human-Object Interaction Prediction in Videos through Gaze Following
Zhifan Ni
Esteve Valls Mascaro
Hyemin Ahn
Dongheui Lee
27
10
0
06 Jun 2023
Proactive Robot Assistance via Spatio-Temporal Object Modeling
Proactive Robot Assistance via Spatio-Temporal Object Modeling
Maithili Patel
Sonia Chernova
29
26
0
28 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object
  Interactions
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Wu
Cewu Lu
35
7
0
14 Nov 2022
Understanding 3D Object Articulation in Internet Videos
Understanding 3D Object Articulation in Internet Videos
Shengyi Qian
Linyi Jin
C. Rockwell
Siyi Chen
David Fouhey
27
29
0
30 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
36
207
0
07 Jan 2022
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition
A. Glavan
Estefanía Talavera
21
10
0
23 Dec 2021
Measure and Improve Robustness in NLP Models: A Survey
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video
  Correspondence Learning
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Zixu Zhao
Yueming Jin
Pheng-Ann Heng
SSL
37
21
0
28 Sep 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural
  Language Queries
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
ViT
24
62
0
20 Jul 2021
A Survey on Deep Learning Technique for Video Segmentation
A Survey on Deep Learning Technique for Video Segmentation
Tianfei Zhou
Fatih Porikli
David J. Crandall
Luc Van Gool
Wenguan Wang
VOS
34
232
0
02 Jul 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
22
372
0
04 Jun 2021
AGENT: A Benchmark for Core Psychological Reasoning
AGENT: A Benchmark for Core Psychological Reasoning
Tianmin Shu
Abhishek Bhandwaldar
Chuang Gan
Kevin A. Smith
Shari Liu
Dan Gutfreund
E. Spelke
J. Tenenbaum
T. Ullman
34
66
0
24 Feb 2021
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
Unsupervised Video Representation Learning by Bidirectional Feature
  Prediction
Unsupervised Video Representation Learning by Bidirectional Feature Prediction
Nadine Behrmann
Juergen Gall
M. Noroozi
SSL
MDE
29
29
0
11 Nov 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
27
72
0
15 Oct 2020
Aligning Videos in Space and Time
Aligning Videos in Space and Time
Senthil Purushwalkam
Tian-Chun Ye
Saurabh Gupta
Abhinav Gupta
27
23
0
09 Jul 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
39
100
0
08 May 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
33
54
0
30 Mar 2020
Adversarial Filters of Dataset Biases
Adversarial Filters of Dataset Biases
Ronan Le Bras
Swabha Swayamdipta
Chandra Bhagavatula
Rowan Zellers
Matthew E. Peters
Ashish Sabharwal
Yejin Choi
36
220
0
10 Feb 2020
CATER: A diagnostic dataset for Compositional Actions and TEmporal
  Reasoning
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
Rohit Girdhar
Deva Ramanan
19
176
0
10 Oct 2019
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Chenxu Luo
Alan Yuille
130
150
0
28 Sep 2019
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling
Laura Sevilla-Lara
Shengxin Cindy Zha
Zhicheng Yan
Vedanuj Goswami
Matt Feiszli
Lorenzo Torresani
35
75
0
19 Jul 2019
Learning Individual Styles of Conversational Gesture
Learning Individual Styles of Conversational Gesture
Shiry Ginosar
Amir Bar
Gefen Kohavi
Caroline Chan
Andrew Owens
Jitendra Malik
SLR
18
326
0
10 Jun 2019
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and
  Interactions
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions
Bugra Tekin
Federica Bogo
Marc Pollefeys
EgoV
13
252
0
10 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks,
  Resources, and Approaches
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
21
128
0
02 Apr 2019
DistInit: Learning Video Representations Without a Single Labeled Video
DistInit: Learning Video Representations Without a Single Labeled Video
Rohit Girdhar
Du Tran
Lorenzo Torresani
Deva Ramanan
21
54
0
26 Jan 2019
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly
  Supervised Action Alignment and Segmentation
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
C. Chang
De-An Huang
Yanan Sui
Li Fei-Fei
Juan Carlos Niebles
22
156
0
09 Jan 2019
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
32
866
0
27 Nov 2018
Object Level Visual Reasoning in Videos
Object Level Visual Reasoning in Videos
Fabien Baradel
Natalia Neverova
Christian Wolf
J. Mille
Greg Mori
22
163
0
16 Jun 2018
A Neurobiological Evaluation Metric for Neural Network Model Search
A Neurobiological Evaluation Metric for Neural Network Model Search
Nathaniel Blanchard
Jeffery Kinnison
Brandon RichardWebster
P. Bashivan
Walter J. Scheirer
27
12
0
28 May 2018
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
25
997
0
08 Apr 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
45
538
0
09 Jan 2018
Learning Human Activities and Object Affordances from RGB-D Videos
Learning Human Activities and Object Affordances from RGB-D Videos
H. Koppula
Rudhir Gupta
Ashutosh Saxena
96
727
0
04 Oct 2012
1