MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing
Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech,
OCR, and Visual Features
Papers citing "MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing
Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech,
OCR, and Visual Features"