Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.08850
Cited By
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
14 December 2023
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition"
12 / 12 papers shown
Title
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
79
150
0
06 Jul 2022
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu
Shiliang Zhang
Yihui Fu
Lei Xie
Siqi Zheng
...
Pengcheng Guo
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
53
119
0
14 Oct 2021
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
Maja Pantic
131
233
0
12 Feb 2021
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
98
1,692
0
20 Nov 2018
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
98
708
0
06 Sep 2018
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
George Sterpu
Christian Saam
N. Harte
72
65
0
05 Sep 2018
Motion Feature Network: Fixed Motion Filter for Action Recognition
Myunggi Lee
Seungeui Lee
S. Son
Gyutae Park
Nojun Kwak
80
123
0
26 Jul 2018
End-to-end Audiovisual Speech Recognition
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Feipeng Cai
Georgios Tzimiropoulos
Maja Pantic
74
251
0
18 Feb 2018
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
155
1,333
0
13 Dec 2017
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
Zheng Shou
Jonathan Chan
Alireza Zareian
K. Miyazawa
Shih-Fu Chang
82
561
0
04 Mar 2017
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
261
792
0
16 Nov 2016
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
256
7,542
0
09 Jun 2014
1