ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.08850
  4. Cited By
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model
  for Audio-Visual Speech Recognition

Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition

14 December 2023
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
ArXiv (abs)PDFHTML

Papers citing "Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition"

12 / 12 papers shown
Title
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
79
150
0
06 Jul 2022
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription
  Challenge
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu
Shiliang Zhang
Yihui Fu
Lei Xie
Siqi Zheng
...
Pengcheng Guo
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
53
119
0
14 Oct 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
Maja Pantic
134
233
0
12 Feb 2021
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
98
1,692
0
20 Nov 2018
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
98
708
0
06 Sep 2018
Attention-based Audio-Visual Fusion for Robust Automatic Speech
  Recognition
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
George Sterpu
Christian Saam
N. Harte
72
65
0
05 Sep 2018
Motion Feature Network: Fixed Motion Filter for Action Recognition
Motion Feature Network: Fixed Motion Filter for Action Recognition
Myunggi Lee
Seungeui Lee
S. Son
Gyutae Park
Nojun Kwak
80
123
0
26 Jul 2018
End-to-end Audiovisual Speech Recognition
End-to-end Audiovisual Speech Recognition
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Feipeng Cai
Georgios Tzimiropoulos
Maja Pantic
74
251
0
18 Feb 2018
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
155
1,333
0
13 Dec 2017
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action
  Localization in Untrimmed Videos
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
Zheng Shou
Jonathan Chan
Alireza Zareian
K. Miyazawa
Shih-Fu Chang
82
561
0
04 Mar 2017
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
261
792
0
16 Nov 2016
Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
256
7,542
0
09 Jun 2014
1