Title |
---|
![]() Large Language Models are Strong Audio-Visual Speech Recognition Learners Umberto Cappellazzo Minsu Kim Honglie Chen Pingchuan Ma Stavros Petridis Daniele Falavigna Alessio Brutti Maja Pantic |
![]() Siamese Vision Transformers are Scalable Audio-visual Learners Yan-Bo Lin Gedas Bertasius |
![]() AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation Rongjie Huang Huadai Liu Xize Cheng Yi Ren Lin Li ...Jinzheng He Lichao Zhang Jinglin Liu Xiaoyue Yin Zhou Zhao |