Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.01966
Cited By
v1
v2
v3 (latest)
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
2 October 2024
Xinlong Hou
Sen Shen
Xueshen Li
Xinran Gao
Ziyi Huang
Steven J. Holiday
Matthew R. Cribbet
Susan W. White
Edward Sazonov
Yu Gan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker"
19 / 19 papers shown
Title
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
Haoyang Li
VLM
142
3
0
24 Nov 2024
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam
Muhammad Hussain
ObjD
74
355
0
23 Oct 2024
Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera
Inpyo Song
Minjun Joo
Joonhyung Kwon
Jangwon Lee
EgoV
91
4
0
30 May 2024
MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer
Sushmita Sarker
Prithul Sarker
G. Bebis
Alireza Tavakkoli
ViT
88
11
0
26 Feb 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
262
824
0
19 Jan 2024
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
167
2,075
0
20 Apr 2023
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition
Marius Bock
Hilde Kuehne
Kristof Van Laerhoven
Michael Moeller
EgoV
153
28
0
11 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
165
551
0
03 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
442
4,666
0
30 Jan 2023
Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning
Linfeng Xu
Qingbo Wu
Lili Pan
Fanman Meng
Hongliang Li
Chiyuan He
Hanxin Wang
Shaoxu Cheng
Yunshu Dai
EgoV
HAI
86
25
0
26 Jan 2023
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification
Fang Peng
Xiaoshan Yang
Linhui Xiao
Yaowei Wang
Changsheng Xu
VLM
81
49
0
28 Nov 2022
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
494
21,752
0
25 Mar 2021
Indoor Future Person Localization from an Egocentric Wearable Camera
Jianing Qiu
Frank P.-W. Lo
Xiao Gu
Yingnan Sun
Shuo Jiang
Benny Lo
EgoV
134
9
0
06 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
30,029
0
26 Feb 2021
YOLOv4: Optimal Speed and Accuracy of Object Detection
Alexey Bochkovskiy
Chien-Yao Wang
H. Liao
VLM
ObjD
178
12,350
0
23 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
218
1,285
0
25 Feb 2020
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Lingni Ma
J. Stückler
C. Kerl
Daniel Cremers
63
152
0
26 Mar 2017
A Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms
A. Kadkhodamohammadi
A. Gangi
M. de Mathelin
N. Padoy
3DH
78
53
0
25 Jan 2017
Fast R-CNN
Ross B. Girshick
ObjD
341
25,145
0
30 Apr 2015
1