ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.08237
  4. Cited By
LoCoNet: Long-Short Context Network for Active Speaker Detection
v1v2 (latest)

LoCoNet: Long-Short Context Network for Active Speaker Detection

19 January 2023
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
ArXiv (abs)PDFHTMLGithub (33★)

Papers citing "LoCoNet: Long-Short Context Network for Active Speaker Detection"

50 / 62 papers shown
Title
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
431
0
0
06 May 2025
TimeRefine: Temporal Grounding with Time Refining Video LLM
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
175
2
0
12 Dec 2024
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive
  Learning
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning
Chaeyoung Jung
Suyeon Lee
KiHyun Nam
Kyeongha Rho
You Jin Kim
Youngjoon Jang
Joon Son Chung
46
10
0
21 Sep 2023
A Real-Time Active Speaker Detection System Integrating an Audio-Visual
  Signal with a Spatial Querying Mechanism
A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism
I. Gurvich
Ido Leichter
Dharmendar Reddy Palle
Yossi Asher
Alon Vinnikov
Igor Abramovski
Vishak Gopal
Ross Cutler
Eyal Krupka
55
4
0
15 Sep 2023
Target Active Speaker Detection with Audio-visual Cues
Target Active Speaker Detection with Audio-visual Cues
Yiding Jiang
Ruijie Tao
Zexu Pan
Haizhou Li
83
17
0
22 May 2023
A Light Weight Model for Active Speaker Detection
A Light Weight Model for Active Speaker Detection
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
64
43
0
08 Mar 2023
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual
  Diarization
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization
Kyle Min
VLM
52
11
0
14 Oct 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
63
38
0
15 Jul 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection
Rethinking Audio-visual Synchronization for Active Speaker Detection
Abudukelimu Wuerkaixi
You Zhang
Z. Duan
Changshui Zhang
40
10
0
21 Jun 2022
End-to-End Active Speaker Detection
End-to-End Active Speaker Detection
Juan Carlos León Alcázar
M. Cordes
Chen Zhao
Guohao Li
84
28
0
27 Mar 2022
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker
  Detection and Speech Enhancement
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Jun Xiong
Yu Zhou
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
65
22
0
04 Mar 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
61
41
0
22 Dec 2021
Decompose the Sounds and Pixels, Recompose the Events
Decompose the Sounds and Pixels, Recompose the Events
Varshanth R. Rao
Md Ibrahim Khalil
Haoda Li
Peng Dai
Juwei Lu
51
5
0
21 Dec 2021
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Eric Z. Xu
Zeyang Song
Satoshi Tsutsui
C. Feng
Mang Ye
Mike Zheng Shou
VGen
64
43
0
29 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
410
1,114
0
13 Oct 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
96
21
0
17 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
UniCon: Unified Context Network for Robust Active Speaker Detection
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
76
38
0
05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for
  Audio-visual Active Speaker Detection
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
77
181
0
14 Jul 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker
  Detection in the Wild
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
75
46
0
07 Jun 2021
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
Xinyuan Qian
Maulik C. Madhavi
Zexu Pan
Jiadong Wang
Haizhou Li
62
44
0
13 May 2021
AST: Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
145
884
0
05 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
403
2,064
0
09 Feb 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
116
52
0
11 Jan 2021
Spot the conversation: speaker diarisation in the wild
Spot the conversation: speaker diarisation in the wild
Joon Son Chung
Jaesung Huh
Arsha Nagrani
Triantafyllos Afouras
Andrew Zisserman
VGen
78
150
0
02 Jul 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
440
13,130
0
26 May 2020
Active Speakers in Context
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
56
62
0
20 May 2020
VGGSound: A Large-scale Audio-Visual Dataset
VGGSound: A Large-scale Audio-Visual Dataset
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
92
583
0
29 Apr 2020
TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
Zhuoqian Yang
Wentao Zhu
Wayne Wu
Chao Qian
Qiang-feng Zhou
Bolei Zhou
Chen Change Loy
VGen
90
56
0
31 Mar 2020
Self-supervised learning for audio-visual speaker diarization
Self-supervised learning for audio-visual speaker diarization
Yifan Ding
Yong-mei Xu
Shi-Xiong Zhang
Yahuan Cong
Liqiang Wang
VLM
64
29
0
13 Feb 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
199
1,084
0
21 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
83
253
0
10 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
556
42,639
0
03 Dec 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
65
339
0
22 Aug 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
59
145
0
05 Jan 2019
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
98
709
0
06 Sep 2018
Recycle-GAN: Unsupervised Video Retargeting
Recycle-GAN: Unsupervised Video Retargeting
Aayush Bansal
Shugao Ma
Deva Ramanan
Yaser Sheikh
VGenDiffM
92
297
0
15 Aug 2018
Character-Level Language Modeling with Deeper Self-Attention
Character-Level Language Modeling with Deeper Self-Attention
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
154
392
0
09 Aug 2018
Speaker Recognition from Raw Waveform with SincNet
Speaker Recognition from Raw Waveform with SincNet
Mirco Ravanelli
Yoshua Bengio
191
718
0
29 Jul 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
100
754
0
10 Apr 2018
Deep Learning using Rectified Linear Units (ReLU)
Deep Learning using Rectified Linear Units (ReLU)
Abien Fred Agarap
81
3,241
0
22 Mar 2018
Neural Predictive Coding using Convolutional Neural Networks towards
  Unsupervised Learning of Speaker Characteristics
Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
Arindam Jati
P. Georgiou
SSL
51
49
0
22 Feb 2018
Dynamic Graph CNN for Learning on Point Clouds
Dynamic Graph CNN for Learning on Point Clouds
Yue Wang
Yongbin Sun
Ziwei Liu
Sanjay E. Sarma
M. Bronstein
Justin Solomon
GNN3DPC
260
6,177
0
24 Jan 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
116
530
0
18 Dec 2017
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
303
8,918
0
21 Nov 2017
Speaker Diarization with LSTM
Speaker Diarization with LSTM
Quan Wang
Carlton Downey
Li Wan
Philip Mansfield
Ignacio López Moreno
103
319
0
28 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
805
132,725
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,041
0
22 May 2017
The Kinetics Human Action Video Dataset
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
259
3,816
0
19 May 2017
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
522
10,351
0
16 Nov 2016
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
264
793
0
16 Nov 2016
12
Next