ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.01753
  4. Cited By
Hollywood in Homes: Crowdsourcing Data Collection for Activity
  Understanding

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

6 April 2016
Gunnar A. Sigurdsson
Gül Varol
Xueliang Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
    VGen
ArXivPDFHTML

Papers citing "Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding"

50 / 277 papers shown
Title
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
56
0
0
28 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
100
0
0
24 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
55
1
0
21 Mar 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
255
0
0
18 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
55
5
0
21 Jan 2025
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
55
3
0
10 Jan 2025
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Xiaoyang Liu
Boran Wen
Xinpeng Liu
Zizheng Zhou
Hongwei Fan
Cewu Lu
Lizhuang Ma
Yulong Chen
Yong Li
59
2
0
27 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
184
0
0
18 Dec 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
206
1
0
30 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
89
26
0
04 Oct 2024
Automated Vehicle Driver Monitoring Dataset from Real-World Scenarios
Automated Vehicle Driver Monitoring Dataset from Real-World Scenarios
Mohamed Sabry
Walter Morales-Alvarez
Cristina Olaverri-Monreal
37
0
0
19 Aug 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
47
52
0
30 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
45
0
0
11 Jun 2024
An Effective-Efficient Approach for Dense Multi-Label Action Detection
An Effective-Efficient Approach for Dense Multi-Label Action Detection
Faegheh Sardari
Armin Mustafa
Philip J. B. Jackson
Adrian Hilton
37
0
0
10 Jun 2024
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan-Bo Wang
Zhiming Li
Qingchao Chen
Yang Liu
43
9
0
27 May 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Ajmal Mian
50
2
0
21 May 2024
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
57
3
0
21 May 2024
SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint
  Sequences
SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences
Ali Emre Keskin
H. Keles
SLR
33
0
0
05 May 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
57
0
0
16 Apr 2024
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video
  Temporal Grounding
R2R^2R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
46
13
0
31 Mar 2024
Towards Multimodal Video Paragraph Captioning Models Robust to Missing
  Modality
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Sishuo Chen
Lei Li
Shuhuai Ren
Rundong Gao
Yuanxin Liu
Xiaohan Bi
Xu Sun
Lu Hou
57
3
0
28 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
56
7
0
21 Mar 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
61
3
0
21 Mar 2024
Siamese Learning with Joint Alignment and Regression for
  Weakly-Supervised Video Paragraph Grounding
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan
Jian-Huang Lai
Wei-Shi Zheng
Jianfang Hu
AI4TS
46
5
0
18 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
45
29
0
20 Feb 2024
M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced
  Video-grounded Dialogue Generation
M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation
Hongcheng Liu
Pingjie Wang
Yu Wang
Yanfeng Wang
47
1
0
19 Feb 2024
SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization
SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization
David Pujol-Perich
Albert Clapés
Sergio Escalera
37
0
0
20 Dec 2023
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Sunjae Yoon
Dahyun Kim
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
C. Yoo
39
6
0
15 Dec 2023
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation
  in Video Understanding
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen
Pha Nguyen
Khoa Luu
34
12
0
05 Dec 2023
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video
  Grounding with Multimodal Large Language Model
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De Cheng
Jie Li
Nannan Wang
Xinbo Gao
34
1
0
05 Dec 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action
  Recognition
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
29
1
0
28 Nov 2023
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal
  Consistency and Correlation Debiasing
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
Anant Khandelwal
36
2
0
24 Oct 2023
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Haoyu Zhang
Meng Liu
Yaowei Wang
Da Cao
Weili Guan
Liqiang Nie
36
0
0
11 Oct 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
51
3
0
13 Sep 2023
Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation
Xinyu Lyu
Jingwei Liu
Yuyu Guo
Lianli Gao
29
1
0
10 Aug 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing
  Modality Fusion
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
21
2
0
15 Jun 2023
A Survey on Video Moment Localization
A Survey on Video Moment Localization
Meng Liu
Liqiang Nie
Yunxiao Wang
Meng Wang
Yong Rui
37
28
0
13 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
40
31
0
08 Jun 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
59
4
0
25 May 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Pha Nguyen
Kha Gia Quach
Kris Kitani
Khoa Luu
45
18
0
22 May 2023
Is end-to-end learning enough for fitness activity recognition?
Is end-to-end learning enough for fitness activity recognition?
Antoine Mercier
Guillaume Berger
Sunny Panchal
Florian Letsch
Cornelius Boehm
Nahua Kang
Ingo Bax
Roland Memisevic
23
2
0
14 May 2023
SViTT: Temporal Learning of Sparse Video-Text Transformers
SViTT: Temporal Learning of Sparse Video-Text Transformers
Yi Li
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
31
12
0
18 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
42
20
0
05 Apr 2023
Unbiased Scene Graph Generation in Videos
Unbiased Scene Graph Generation in Videos
Sayak Nag
Kyle Min
Subarna Tripathi
A. Roy-Chowdhury
31
29
0
03 Apr 2023
Generation-Guided Multi-Level Unified Network for Video Grounding
Generation-Guided Multi-Level Unified Network for Video Grounding
Xingyi Cheng
Xiangyu Wu
Dong Shen
Hezheng Lin
Fan Yang
21
0
0
14 Mar 2023
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building
  [Technical Report]
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
Maureen Daum
Enhao Zhang
Dong He
Stephen Mussmann
Brandon Haynes
Ranjay Krishna
Magdalena Balazinska
32
4
0
07 Mar 2023
Connecting Vision and Language with Video Localized Narratives
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
52
21
0
22 Feb 2023
123456
Next