v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown

Title
On Compressing Sequences for Self-Supervised Speech Models Yen Meng Hsuan-Jui Chen Jiatong Shi Shinji Watanabe Paola García Hung-yi Lee Hao Tang SSL 56 15 0 13 Oct 2022
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models Haoyu Wang Weiqiang Zhang Hongbin Suo Yulong Wan 53 0 0 13 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR DongSeon Hwang K. Sim Yu Zhang Trevor Strohman 67 11 0 11 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Zijia Zhao Longteng Guo Xingjian He Shuai Shao Zehuan Yuan Jing Liu 105 9 0 09 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning Chutong Meng Junyi Ao Tom Ko Mingxuan Wang Haizhou Li SSL 111 6 0 08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training Zi-Hua Zhang Long Zhou Junyi Ao Shujie Liu Lirong Dai Jinyu Li Furu Wei 131 58 0 07 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining H. S. Bovbjerg Zheng-Hua Tan VLM 79 3 0 04 Oct 2022
That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation Abitha Thankaraj Lerrel Pinto 68 17 0 03 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Yi-Jen Shih Hsuan-Fu Wang Heng-Jui Chang Layne Berry Hung-yi Lee David Harwath VLM CLIP 137 32 0 03 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods Skanda Koppula Yazhe Li Evan Shelhamer Andrew Jaegle Nikhil Parthasarathy Relja Arandjelović João Carreira Olivier J. Hénaff 86 9 0 30 Sep 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio Yan Gao Javier Fernandez-Marques Titouan Parcollet Pedro Porto Buarque de Gusmão Nicholas D. Lane 87 9 0 30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data Zi-Hua Zhang Sanyuan Chen Long Zhou Yu Wu Shuo Ren ... Zhuoyuan Yao Xun Gong Lirong Dai Jinyu Li Furu Wei 79 57 0 30 Sep 2022
TVLT: Textless Vision-Language Transformer Zineng Tang Jaemin Cho Yixin Nie Joey Tianyi Zhou VLM 137 31 0 28 Sep 2022
An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis Tobias Hallmen Silvan Mertes Dominik Schiller Elisabeth André 52 5 0 28 Sep 2022
Implementing and Experimenting with Diffusion Models for Text-to-Image Generation Robin Zbinden 42 3 0 22 Sep 2022
Deep Lake: a Lakehouse for Deep Learning S. Hambardzumyan Abhina Tuli Levon Ghukasyan Fariz Rahman Hrant Topchyan ... Mark McQuade M. Harutyunyan Tatevik Hakobyan I. Stranic Davit Buniatyan 90 21 0 22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models R. Olivier H. Abdullah Bhiksha Raj AAML 75 1 0 17 Sep 2022
Exploring Target Representations for Masked Autoencoders Xingbin Liu Jinghao Zhou Tao Kong Xianming Lin Rongrong Ji 197 52 0 08 Sep 2022
Generalization in Neural Networks: A Broad Survey Chris Rohlfs OOD AI4CE 67 7 0 04 Sep 2022
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec Joon Sern Lee Kai Keng Tay Zong Fu Chua 15 2 0 02 Sep 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining Xiaoyi Dong Jianmin Bao Yinglin Zheng Ting Zhang Dongdong Chen ... Weiming Zhang Lu Yuan Dong Chen Fang Wen Nenghai Yu CLIP VLM 115 167 0 25 Aug 2022
AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research Trends Muhammad Zawish Fayaz Ali Dharejo Sunder Ali Khowaja Saleem Raza Steven Davy Kapal Dev P. Bellavista 82 68 0 23 Aug 2022
Estimating a potential without the agony of the partition function E. Haber Moshe Eliasof L. Tenorio 61 2 0 19 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Zhiliang Peng Li Dong Hangbo Bao QiXiang Ye Furu Wei 75 323 0 12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation Zejiang Hou Fei Sun Yen-kuang Chen Yuan Xie S. Kung ViT 123 70 0 11 Aug 2022
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature Xiangwen Kong Xiangyu Zhang SSL 78 55 0 08 Aug 2022
SdAE: Self-distillated Masked Autoencoder Yabo Chen Yuchen Liu Dongsheng Jiang Xiaopeng Zhang Wenrui Dai H. Xiong Qi Tian ViT 99 74 0 31 Jul 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond Chaoning Zhang Chenshuang Zhang Junha Song John Seon Keun Yi Kang Zhang In So Kweon SSL 96 78 0 30 Jul 2022
UAVM: Towards Unifying Audio and Visual Models Yuan Gong Alexander H. Liu Andrew Rouditchenko James R. Glass 75 23 0 29 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale Gopinath Chennupati Milind Rao Gurpreet Chadha Aaron Eakin A. Raju ... Andrew Oberlin Buddha Nandanoor Prahalad Venkataramanan Zheng Wu Pankaj Sitpure CLL 95 8 0 19 Jul 2022
Bootstrapped Masked Autoencoders for Vision BERT Pretraining Xiaoyi Dong Jianmin Bao Ting Zhang Dongdong Chen Weiming Zhang Lu Yuan Dong Chen Fang Wen Nenghai Yu 89 79 0 14 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality Wei-Ning Hsu Bowen Shi SSL VLM 112 43 0 14 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 83 31 0 14 Jul 2022
Masked Autoencoders that Listen Po-Yao (Bernie) Huang Hu Xu Juncheng Billy Li Alexei Baevski Michael Auli Wojciech Galuba Florian Metze Christoph Feichtenhofer 145 290 0 13 Jul 2022
Big Learning Yulai Cong Miaoyun Zhao AI4CE 94 0 0 08 Jul 2022
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR Kun Wei Yike Zhang Sining Sun Lei Xie Long Ma 62 9 0 03 Jul 2022
$FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy$ FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy Nikil Ravi Pranshu Chaturvedi Eliu A. Huerta Zhengchun Liu Ryan Chard Aristana Scourtas K. J. Schmidt Kyle Chard Ben Blaiszik Ian Foster 119 29 0 01 Jul 2022
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition Einari Vaaras Manu Airaksinen Okko Räsänen 48 6 0 21 Jun 2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training Chengyi Wang Yiming Wang Yu Wu Sanyuan Chen Jinyu Li Shujie Liu Furu Wei SSL 95 20 0 21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm Jiangning Zhang Xiangtai Li Yabiao Wang Chengjie Wang Yibo Yang Yong Liu Dacheng Tao ViT 121 35 0 19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos Rohit Girdhar Alaaeldin El-Nouby Mannat Singh Kalyan Vasudev Alwala Armand Joulin Ishan Misra ViT 117 99 0 16 Jun 2022
Masked Frequency Modeling for Self-Supervised Visual Pre-Training Jiahao Xie Wei Li Xiaohang Zhan Ziwei Liu Yew-Soon Ong Chen Change Loy 115 76 0 15 Jun 2022
Masked Siamese ConvNets L. Jing Jiachen Zhu Yann LeCun SSL 118 35 0 15 Jun 2022
Language Models are General-Purpose Interfaces Y. Hao Haoyu Song Li Dong Shaohan Huang Zewen Chi Wenhui Wang Shuming Ma Furu Wei MLLM 78 102 0 13 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual Representations Zhirong Wu Zihang Lai Xiao Sun Stephen Lin 106 22 0 09 Jun 2022
Words are all you need? Language as an approximation for human similarity judgments Raja Marjieh Pol van Rijn Ilia Sucholutsky T. Sumers Harin Lee Thomas Griffiths Nori Jacoby 93 19 0 08 Jun 2022
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks Jia Pan Pan Zhou Shuicheng Yan SSL 89 18 0 08 Jun 2022
Masked Unsupervised Self-training for Label-free Image Classification Junnan Li Silvio Savarese Steven C. H. Hoi VLM SSL 45 13 0 07 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data Shohreh Deldari Hao Xue Aaqib Saeed Jiayuan He Daniel V. Smith Flora D. Salim AI4TS 75 37 0 06 Jun 2022
Siamese Image Modeling for Self-Supervised Vision Representation Learning Chenxin Tao Xizhou Zhu Weijie Su Gao Huang Bin Li Jie Zhou Yu Qiao Xiaogang Wang Jifeng Dai SSL 111 97 0 02 Jun 2022