Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.00747
Cited By
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
1 March 2023
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WhisperX: Time-Accurate Speech Transcription of Long-Form Audio"
20 / 120 papers shown
Title
Tracking the Newsworthiness of Public Documents
Alexander Spangher
Emilio Ferrara
Ben Welsh
Nanyun Peng
Serdar Tumgoren
Jonathan May
31
2
0
16 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
27
52
0
01 Nov 2023
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
34
2
0
28 Oct 2023
Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews
Armin Haberl
Jürgen Fleiß
Dominik Kowald
Stefan Thalmann
30
3
0
18 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
32
36
0
10 Oct 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
23
26
0
25 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
32
1
0
22 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
34
20
0
20 Sep 2023
DiariST: Streaming Speech Translation with Speaker Diarization
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Junkun Chen
Peidong Wang
Jian Xue
Jinyu Li
Takuya Yoshioka
32
6
0
14 Sep 2023
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Eliana Pastor
Alkis Koudounas
Giuseppe Attanasio
Dirk Hovy
Elena Baralis
19
4
0
14 Sep 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
41
0
0
08 Aug 2023
OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh
Max Bain
Andrew Zisserman
45
0
0
18 Jul 2023
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning
F. Liao
Yung-Chieh Chan
Yi-Chang Chen
Chan-Jan Hsu
Da-shan Shiu
41
6
0
18 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
43
44
0
14 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
32
2
0
04 Jul 2023
Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
Simone Wills
Yu Bai
Cristian Tejedor-García
C. Cucchiarini
H. Strik
15
10
0
29 Jun 2023
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
Lucía Gómez Zaragozá
Simone Wills
Cristian Tejedor-García
Javier Marín-Morales
Mariano Alcañiz
H. Strik
27
8
0
06 Jun 2023
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
24
34
0
29 Mar 2023
Augmentation adversarial training for self-supervised speaker recognition
Jaesung Huh
Hee-Soo Heo
Jingu Kang
Shinji Watanabe
Joon Son Chung
SSL
48
76
0
23 Jul 2020
pyannote.audio: neural building blocks for speaker diarization
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
197
313
0
04 Nov 2019
Previous
1
2
3