Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14259
Cited By
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition
23 May 2024
Chan-Jan Hsu
Yi-Chang Chen
Feng-Ting Liao
Pei-Chen Ho
Yu-Hsiang Wang
Po-Chun Hsu
Da-shan Shiu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition"
11 / 11 papers shown
Title
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
45
45
0
27 Sep 2023
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
60
43
0
28 Jun 2023
Provable Dynamic Fusion for Low-Quality Multimodal Data
Qingyang Zhang
Haitao Wu
Changqing Zhang
Qinghua Hu
Huazhu Fu
Qiufeng Wang
Xi Peng
59
57
0
03 Jun 2023
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
Yichong Leng
Xu Tan
Wenjie Liu
Kaitao Song
Rui Wang
Xiang-Yang Li
Tao Qin
Ed Lin
Tie-Yan Liu
58
16
0
02 Dec 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
53
694
0
14 Sep 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
102
306
0
25 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
268
3,458
0
29 Apr 2022
g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin
Yi-Chang Chen
Yu-Chuan Chang
Yenling Chang
Yi-Ren Yeh
37
14
0
20 Mar 2022
mSLAM: Massively multilingual joint pre-training for speech and text
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
39
111
0
03 Feb 2022
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le
Mahaveer Jain
Gil Keren
Suyoun Kim
Yangyang Shi
...
Yuan Shangguan
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
M. Seltzer
39
93
0
05 Apr 2021
Two-Pass End-to-End Speech Recognition
Tara N. Sainath
Ruoming Pang
David Rybach
Yanzhang He
Rohit Prabhavalkar
...
Qiao Liang
Trevor Strohman
Yonghui Wu
Ian McGraw
Chung-Cheng Chiu
49
147
0
29 Aug 2019
1