Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.14638
Cited By
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
27 February 2023
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing"
31 / 31 papers shown
Title
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
Wang Zeng
Sheng Jin
Wentao Liu
Chao Qian
Ping Luo
Ouyang Wanli
Xiaogang Wang
ViT
40
124
0
19 Apr 2022
Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information
Heqing Zou
Yuke Si
Chen Chen
D. Rajan
Chng Eng Siong
37
119
0
29 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the valence gap
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
45
299
0
14 Mar 2022
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
50
34
0
08 Mar 2022
MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations
Dou Hu
Xiaolong Hou
Lingwei Wei
Lian-Xin Jiang
Yang Mo
38
120
0
04 Mar 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
132
268
0
02 Feb 2022
BOAT: Bilateral Local Attention Vision Transformer
Tan Yu
Gangming Zhao
Ping Li
Yizhou Yu
ViT
54
27
0
31 Jan 2022
Detecting Dementia from Speech and Transcripts using Transformers
Loukas Ilias
D. Askounis
J. Psarras
31
35
0
27 Oct 2021
Key-Sparse Transformer for Multimodal Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jichen Yang
Jianxin Pang
35
49
0
22 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
127
2,879
0
14 Jun 2021
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
ViT
63
685
0
03 Jun 2021
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
66
67
0
02 Jun 2021
ResT: An Efficient Transformer for Visual Recognition
Qing-Long Zhang
Yubin Yang
ViT
48
230
0
28 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
78
910
0
03 May 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
327
21,175
0
25 Mar 2021
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
66
517
0
22 Mar 2021
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
Jui Shah
Yaman Kumar Singla
Changyou Chen
R. Shah
55
81
0
02 Jan 2021
Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition
Hang Li
Wenbiao Ding
Zhongqin Wu
Zitao Liu
57
32
0
24 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
400
40,217
0
22 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
179
5,734
0
20 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
190
3,082
0
16 May 2020
Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network
M. R. Makiuchi
Tifani Warnita
Nakamasa Inoue
Koichi Shinoda
M. Yoshimura
Momoko Kitazawa
K. Funaki
Yoko Eguchi
T. Kishimoto
47
11
0
16 Apr 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Bin Wang
C.-C. Jay Kuo
31
153
0
16 Feb 2020
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
41
248
0
22 Oct 2019
Self-Attention Transducers for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Zhengqi Wen
AI4TS
49
72
0
28 Sep 2019
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Soujanya Poria
Devamanyu Hazarika
Navonil Majumder
Gautam Naik
Min Zhang
Rada Mihalcea
85
1,055
0
05 Oct 2018
Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment
Yue Gu
Kangning Yang
Shiyu Fu
Shuhong Chen
Xinyu Li
I. Marsic
41
123
0
22 May 2018
Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data
Tifani Warnita
Nakamasa Inoue
Koichi Shinoda
29
40
0
30 Mar 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
254
10,412
0
21 Jul 2016
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
647
23,235
0
03 Jun 2014
1