ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.09951
  4. Cited By
Unifying Global and Local Scene Entities Modelling for Precise Action
  Spotting

Unifying Global and Local Scene Entities Modelling for Precise Action Spotting

15 April 2024
Kim Hoang Tran
Phuc Vuong Do
Ngoc Quoc Ly
Ngan Le
ArXivPDFHTML

Papers citing "Unifying Global and Local Scene Entities Modelling for Precise Action Spotting"

50 / 51 papers shown
Title
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
170
1,957
0
09 Mar 2023
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video
  Paragraph Captioning
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
60
38
0
28 Nov 2022
AOE-Net: Entities Interactions Modeling with Adaptive Attention
  Mechanism for Temporal Action Proposals Generation
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation
Khoa T. Vo
Sang Truong
Kashu Yamazaki
Bhiksha Raj
Minh-Triet Tran
Ngan Le
108
28
0
05 Oct 2022
Spotting Temporally Precise, Fine-Grained Events in Video
Spotting Temporally Precise, Fine-Grained Events in Video
James Hong
Haotian Zhang
Michael Gharbi
Matthew Fisher
Kayvon Fatahalian
68
36
0
20 Jul 2022
Temporally Precise Action Spotting in Soccer Videos Using Dense
  Detection Anchors
Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors
J. C. V. Soares
Avijit Shah
Topojoy Biswas
68
32
0
20 May 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai
Gukyeong Kwon
Avinash Ravichandran
Erhan Bas
Zhuowen Tu
Rahul Bhotika
Stefano Soatto
ObjD
MLLM
VLM
58
50
0
12 Apr 2022
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
  Assessment
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Jinglin Xu
Yongming Rao
Xumin Yu
Guangyi Chen
Jie Zhou
Jiwen Lu
41
91
0
07 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLM
SSL
118
226
0
07 Apr 2022
Language-driven Semantic Segmentation
Language-driven Semantic Segmentation
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
110
616
0
10 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
110
380
0
22 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
126
575
0
16 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
105
1,058
0
07 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
187
573
0
02 Dec 2021
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal
  Action Proposals Generation
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation
Khoa T. Vo
Kevin Hyekang Joo
Kashu Yamazaki
Sang Truong
Kris Kitani
Minh-Triet Tran
Ngan Le
EgoV
91
17
0
21 Oct 2021
ASFormer: Transformer for Action Segmentation
ASFormer: Transformer for Action Segmentation
Fangqiu Yi
Hongyu Wen
Tingting Jiang
ViT
111
176
0
16 Oct 2021
Feature Combination Meets Attention: Baidu Soccer Embeddings and
  Transformer based Temporal Detection
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection
Xin Zhou
Le Kang
Zhiyu Cheng
Bo He
Jingyu Xin
61
34
0
28 Jun 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
263
915
0
28 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
127
1,257
0
22 Apr 2021
Camera Calibration and Player Localization in SoccerNet-v2 and
  Investigation of their Representations for Action Spotting
Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting
A. Cioppa
Adrien Deliège
Floriane Magera
Silvio Giancola
Olivier Barnich
Guohao Li
Marc Van Droogenbroeck
50
57
0
19 Apr 2021
Temporally-Aware Feature Pooling for Action Spotting in Soccer
  Broadcasts
Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts
Silvio Giancola
Guohao Li
50
45
0
14 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
411
21,347
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
816
29,167
0
26 Feb 2021
RMS-Net: Regression and Masking for Soccer Event Spotting
RMS-Net: Regression and Masking for Soccer Event Spotting
Matteo Tomei
Lorenzo Baraldi
Simone Calderara
Simone Bronzin
Rita Cucchiara
80
28
0
15 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
419
3,826
0
11 Feb 2021
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of
  Broadcast Soccer Videos
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos
Adrien Deliège
A. Cioppa
Silvio Giancola
M. J. Seikavandi
J. Dueholm
Kamal Nasrollahi
Guohao Li
T. Moeslund
Marc Van Droogenbroeck
62
153
0
26 Nov 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization
  Tasks
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
59
124
0
23 Nov 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
52
21
0
15 May 2020
FineGym: A Hierarchical Video Dataset for Fine-grained Action
  Understanding
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
55
328
0
14 Apr 2020
Temporal Pyramid Network for Action Recognition
Temporal Pyramid Network for Action Recognition
Ceyuan Yang
Yinghao Xu
Jianping Shi
Bo Dai
Bolei Zhou
45
370
0
07 Apr 2020
Designing Network Design Spaces
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
96
1,680
0
30 Mar 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
211
2,612
0
26 Mar 2020
A Context-Aware Loss Function for Action Spotting in Soccer Videos
A Context-Aware Loss Function for Action Spotting in Soccer Videos
A. Cioppa
Adrien Deliège
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
Rikke Gade
T. Moeslund
50
80
0
03 Dec 2019
Gate-Shift Networks for Video Action Recognition
Gate-Shift Networks for Video Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
3DPC
55
155
0
01 Dec 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
129
18,058
0
28 May 2019
Video Classification with Channel-Separated Convolutional Networks
Video Classification with Channel-Separated Convolutional Networks
Du Tran
Heng Wang
Lorenzo Torresani
Matt Feiszli
3DV
61
586
0
04 Apr 2019
MS-TCN: Multi-Stage Temporal Convolutional Network for Action
  Segmentation
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Yazan Abu Farha
Juergen Gall
54
664
0
05 Mar 2019
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
162
3,262
0
10 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.5K
94,511
0
11 Oct 2018
Learning Visual Question Answering by Bootstrapping Hard Attention
Learning Visual Question Answering by Bootstrapping Hard Attention
Mateusz Malinowski
Carl Doersch
Adam Santoro
Peter W. Battaglia
OOD
49
96
0
01 Aug 2018
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
Silvio Giancola
Mohieddine Amine
Tarek Dghaily
Guohao Li
AI4TS
84
197
0
12 Apr 2018
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
271
9,743
0
25 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
640
130,942
0
12 Jun 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
344
27,129
0
20 Mar 2017
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
288
8,091
0
13 Aug 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.0K
193,426
0
10 Dec 2015
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
Relja Arandjelović
Petr Gronát
Akihiko Torii
Tomas Pajdla
Josef Sivic
3DV
SSL
116
2,634
0
23 Nov 2015
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
658
36,801
0
08 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
471
62,122
0
04 Jun 2015
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
512
12,692
0
11 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.4K
100,213
0
04 Sep 2014
12
Next