ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.13963
  4. Cited By
Connecting Speech Encoder and Large Language Model for ASR

Connecting Speech Encoder and Large Language Model for ASR

25 September 2023
Wenyi Yu
Changli Tang
Guangzhi Sun
Xianzhao Chen
T. Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
    AuLLM
ArXivPDFHTML

Papers citing "Connecting Speech Encoder and Large Language Model for ASR"

12 / 12 papers shown
Title
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
345
2
0
12 Apr 2025
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Rui Liu
Hongyu Yuan
Hong Li
68
0
0
03 Jan 2025
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language Models
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
64
5
0
30 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
81
6
0
25 Sep 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
102
5
0
08 Feb 2024
Can Generative Large Language Models Perform ASR Error Correction?
Can Generative Large Language Models Perform ASR Error Correction?
Rao Ma
Mengjie Qian
Potsawee Manakul
Mark Gales
Kate Knill
AuLLM
KELM
37
54
0
09 Jul 2023
Prompting Large Language Models for Zero-Shot Domain Adaptation in
  Speech Recognition
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
60
43
0
28 Jun 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
109
78
0
22 May 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
283
3,458
0
29 Apr 2022
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling
  for Self-Supervised Speech Pre-Training
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
23
421
0
07 Aug 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
127
2,879
0
14 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
86
360
0
13 Jun 2021
1