ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.00747
  4. Cited By
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

1 March 2023
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
ArXivPDFHTML

Papers citing "WhisperX: Time-Accurate Speech Transcription of Long-Form Audio"

20 / 120 papers shown
Title
Tracking the Newsworthiness of Public Documents
Tracking the Newsworthiness of Public Documents
Alexander Spangher
Emilio Ferrara
Ben Welsh
Nanyun Peng
Serdar Tumgoren
Jonathan May
31
2
0
16 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
  Labelling
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
27
52
0
01 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
34
2
0
28 Oct 2023
Take the aTrain. Introducing an Interface for the Accessible
  Transcription of Interviews
Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews
Armin Haberl
Jürgen Fleiß
Dominik Kowald
Stefan Thalmann
30
3
0
18 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
32
36
0
10 Oct 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
23
26
0
25 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
32
1
0
22 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
34
20
0
20 Sep 2023
DiariST: Streaming Speech Translation with Speaker Diarization
DiariST: Streaming Speech Translation with Speaker Diarization
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Junkun Chen
Peidong Wang
Jian Xue
Jinyu Li
Takuya Yoshioka
32
6
0
14 Sep 2023
Explaining Speech Classification Models via Word-Level Audio Segments
  and Paralinguistic Features
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Eliana Pastor
Alkis Koudounas
Giuseppe Attanasio
Dirk Hovy
Elena Baralis
19
4
0
14 Sep 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion
  and Infinite Data Generation
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
41
0
0
08 Aug 2023
OxfordVGG Submission to the EGO4D AV Transcription Challenge
OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh
Max Bain
Andrew Zisserman
45
0
0
18 Jul 2023
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning
  Fine-tuning
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning
F. Liao
Yung-Chieh Chan
Yi-Chang Chen
Chan-Jan Hsu
Da-shan Shiu
41
6
0
18 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
43
44
0
14 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
32
2
0
04 Jul 2023
Automatic Speech Recognition of Non-Native Child Speech for Language
  Learning Applications
Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
Simone Wills
Yu Bai
Cristian Tejedor-García
C. Cucchiarini
H. Strik
15
10
0
29 Jun 2023
Alzheimer Disease Classification through ASR-based Transcriptions:
  Exploring the Impact of Punctuation and Pauses
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
Lucía Gómez Zaragozá
Simone Wills
Cristian Tejedor-García
Javier Marín-Morales
Mariano Alcañiz
H. Strik
27
8
0
06 Jun 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
24
34
0
29 Mar 2023
Augmentation adversarial training for self-supervised speaker
  recognition
Augmentation adversarial training for self-supervised speaker recognition
Jaesung Huh
Hee-Soo Heo
Jingu Kang
Shinji Watanabe
Joon Son Chung
SSL
48
76
0
23 Jul 2020
pyannote.audio: neural building blocks for speaker diarization
pyannote.audio: neural building blocks for speaker diarization
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
197
313
0
04 Nov 2019
Previous
123