Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.10757
Cited By
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
20 July 2023
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition"
50 / 56 papers shown
Title
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
68
0
0
23 May 2025
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
Vrushank Ahire
Kunal Shah
Mudasir Nazir Khan
Nikhil Pakhale
L. Sookha
M. A. Ganaie
Abhinav Dhall
104
0
0
16 Mar 2025
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Yuzhe Weng
Haotian Wang
Tian Gao
Kewei Li
Shutong Niu
Jun Du
56
0
0
19 Oct 2024
Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers
Ruchik Mishra
Andrew Frye
M. M. Rayguru
Dan O. Popa
104
1
0
16 Sep 2024
LanSER: Language-Model Supported Speech Emotion Recognition
Taesik Gong
Joshua Belanich
Krishna Somandepalli
Arsha Nagrani
B. Eoff
Brendan Jou
48
10
0
07 Sep 2023
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
264
7,047
0
05 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
815
12,840
0
27 Feb 2023
DST: Deformable Speech Transformer for Emotion Recognition
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
66
21
0
27 Feb 2023
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
45
38
0
27 Feb 2023
Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
Siyuan Shen
Feng Liu
Aimin Zhou
46
15
0
26 Feb 2023
Masked Motion Encoding for Self-Supervised Video Representation Learning
Xinyu Sun
Peihao Chen
Liang-Chieh Chen
Chan Li
Thomas H. Li
Mingkui Tan
Chuang Gan
50
30
0
12 Oct 2022
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie Zhang
82
4
0
28 Sep 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
283
3,458
0
29 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
335
6,132
0
05 Apr 2022
Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information
Heqing Zou
Yuke Si
Chen Chen
D. Rajan
Chng Eng Siong
37
119
0
29 Mar 2022
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
50
34
0
08 Mar 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
64
109
0
02 Mar 2022
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
83
182
0
18 Feb 2022
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Ryandhimas E. Zezario
Szu-Wei Fu
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
DiffM
44
78
0
03 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
176
1,794
0
26 Oct 2021
Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Li-Wei Chen
Alexander I. Rudnicky
VLM
37
126
0
12 Oct 2021
Key-Sparse Transformer for Multimodal Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jichen Yang
Jianxin Pang
35
49
0
22 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
127
2,879
0
14 Jun 2021
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
78
910
0
03 May 2021
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
Guanhua Chen
Shuming Ma
Yun-Nung Chen
Li Dong
Dongdong Zhang
Jianxiong Pan
Wenping Wang
Furu Wei
46
39
0
18 Apr 2021
Speech Emotion Recognition using Semantic Information
Panagiotis Tzirakis
Anh-Tuan Nguyen
Stefanos Zafeiriou
Björn W. Schuller
33
19
0
04 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
681
28,659
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
327
4,873
0
24 Feb 2021
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
49
222
0
20 Feb 2021
CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models
Yusheng Su
Xu Han
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Peng Li
Jie Zhou
Maosong Sun
51
10
0
07 Feb 2021
LSSED: a large-scale dataset and benchmark for speech emotion recognition
Weiquan Fan
Xiangmin Xu
Xiaofen Xing
Weidong Chen
Dongyan Huang
58
34
0
30 Jan 2021
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
Jui Shah
Yaman Kumar Singla
Changyou Chen
R. Shah
55
81
0
02 Jan 2021
Toward Transformer-Based Object Detection
Josh Beal
Eric Kim
Eric Tzeng
Dong Huk Park
Andrew Zhai
Dmitry Kislyuk
ViT
55
212
0
17 Dec 2020
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Peihao Chen
Deng Huang
Dongliang He
Xiang Long
Runhao Zeng
Shilei Wen
Mingkui Tan
Chuang Gan
SSL
41
133
0
27 Oct 2020
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation
Alexander R. Fabbri
Simeng Han
Haoyuan Li
Haoran Li
Marjan Ghazvininejad
Shafiq Joty
Dragomir R. Radev
Yashar Mehdad
161
96
0
24 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
403
40,217
0
22 Oct 2020
BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
Usman Naseem
Matloob Khushi
V. Reddy
S. Rajendran
Imran Razzak
Jinman Kim
43
63
0
19 Sep 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
179
5,734
0
20 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
56
66
0
02 May 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
75
356
0
05 Apr 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
255
18,607
0
13 Feb 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
44
104
0
13 Jan 2020
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation
Rongxiang Weng
Heng Yu
Shujian Huang
Shanbo Cheng
Weihua Luo
41
66
0
04 Dec 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
113
12,007
0
13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
270
19,824
0
23 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
134
7,437
0
02 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
wav2vec: Unsupervised Pre-training for Speech Recognition
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
SSL
45
418
0
11 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
54
419
0
28 Mar 2019
1
2
Next