ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01966
  4. Cited By
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
v1v2v3 (latest)

Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker

2 October 2024
Xinlong Hou
Sen Shen
Xueshen Li
Xinran Gao
Ziyi Huang
Steven J. Holiday
Matthew R. Cribbet
Susan W. White
Edward Sazonov
Yu Gan
ArXiv (abs)PDFHTML

Papers citing "Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker"

19 / 19 papers shown
Title
ROOT: VLM based System for Indoor Scene Understanding and Beyond
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
Haoyang Li
VLM
142
3
0
24 Nov 2024
YOLOv11: An Overview of the Key Architectural Enhancements
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam
Muhammad Hussain
ObjD
74
355
0
23 Oct 2024
Video Question Answering for People with Visual Impairments Using an
  Egocentric 360-Degree Camera
Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera
Inpyo Song
Minjun Joo
Joonhyung Kwon
Jangwon Lee
EgoV
91
4
0
30 May 2024
MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer
MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer
Sushmita Sarker
Prithul Sarker
G. Bebis
Alireza Tavakkoli
ViT
88
11
0
26 Feb 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
262
824
0
19 Jan 2024
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
167
2,075
0
20 Apr 2023
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity
  Recognition
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition
Marius Bock
Hilde Kuehne
Kristof Van Laerhoven
Michael Moeller
EgoV
153
28
0
11 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
165
551
0
03 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
442
4,666
0
30 Jan 2023
Towards Continual Egocentric Activity Recognition: A Multi-modal
  Egocentric Activity Dataset for Continual Learning
Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning
Linfeng Xu
Qingbo Wu
Lili Pan
Fanman Meng
Hongliang Li
Chiyuan He
Hanxin Wang
Shaoxu Cheng
Yunshu Dai
EgoVHAI
86
25
0
26 Jan 2023
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
  Few-shot Image Classification
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification
Fang Peng
Xiaoshan Yang
Linhui Xiao
Yaowei Wang
Changsheng Xu
VLM
81
49
0
28 Nov 2022
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
494
21,752
0
25 Mar 2021
Indoor Future Person Localization from an Egocentric Wearable Camera
Indoor Future Person Localization from an Egocentric Wearable Camera
Jianing Qiu
Frank P.-W. Lo
Xiao Gu
Yingnan Sun
Shuo Jiang
Benny Lo
EgoV
134
9
0
06 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
30,029
0
26 Feb 2021
YOLOv4: Optimal Speed and Accuracy of Object Detection
YOLOv4: Optimal Speed and Accuracy of Object Detection
Alexey Bochkovskiy
Chien-Yao Wang
H. Liao
VLMObjD
178
12,350
0
23 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
218
1,285
0
25 Feb 2020
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D
  Cameras
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Lingni Ma
J. Stückler
C. Kerl
Daniel Cremers
63
152
0
26 Mar 2017
A Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms
A Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms
A. Kadkhodamohammadi
A. Gangi
M. de Mathelin
N. Padoy
3DH
78
53
0
25 Jan 2017
Fast R-CNN
Fast R-CNN
Ross B. Girshick
ObjD
341
25,145
0
30 Apr 2015
1