Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.15830
Cited By
v1
v2
v3 (latest)
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
27 November 2023
Zhengcong Fei
Mingyuan Fan
Junshi Huang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A-JEPA: Joint-Embedding Predictive Architecture Can Listen"
50 / 57 papers shown
Title
Leveraging Joint Predictive Embedding and Bayesian Inference in Graph Self Supervised Learning
Srinitish Srinivasan
Omkumar CU
SSL
BDL
127
0
0
02 Feb 2025
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
370
3
0
14 Dec 2024
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TS
GNN
102
5
0
27 Sep 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,247
0
27 Feb 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSL
AI4TS
MDE
78
352
0
19 Jan 2023
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
81
97
0
14 Dec 2022
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
41
12
0
30 Nov 2022
VICRegL: Self-Supervised Learning of Local Visual Features
Adrien Bardes
Jean Ponce
Yann LeCun
SSL
93
125
0
04 Oct 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
32
19
0
27 Apr 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
66
68
0
27 Apr 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
83
68
0
26 Apr 2022
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
Alan Baade
Puyuan Peng
David Harwath
56
101
0
30 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
74
9
0
25 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
224
1,191
0
23 Mar 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
97
858
0
07 Feb 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
154
271
0
02 Feb 2022
Robust Self-Supervised Audio-Visual Speech Recognition
Bowen Shi
Wei-Ning Hsu
Abdel-rahman Mohamed
57
93
0
05 Jan 2022
High Fidelity Visualization of What Your Self-Supervised Representation Knows About
Florian Bordes
Randall Balestriero
Pascal Vincent
DiffM
62
65
0
16 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
145
668
0
16 Dec 2021
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
81
735
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
465
7,757
0
11 Nov 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
77
273
0
19 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
69
25
0
14 Oct 2021
Semi-Autoregressive Image Captioning
Xu Yan
Zhengcong Fei
Zekang Li
Shuhui Wang
Qingming Huang
Qi Tian
71
25
0
11 Oct 2021
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
131
260
0
11 Oct 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
100
567
0
30 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
274
2,826
0
15 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
180
2,966
0
14 Jun 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
S. Verbitskiy
Vladimir Berikov
Viacheslav Vyshegorodtsev
77
73
0
03 Jun 2021
The Benefit Of Temporally-Strong Labels In Audio Event Classification
Shawn Hershey
D. Ellis
Eduardo Fonseca
A. Jansen
Caroline Liu
Channing Moore
Manoj Plakal
53
104
0
14 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
108
937
0
03 May 2021
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
125
865
0
05 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
453
21,439
0
25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
79
35
0
18 Mar 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
171
147
0
02 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
657
41,103
0
22 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
285
5,801
0
20 Jun 2020
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCL
SSL
233
4,083
0
17 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
808
42,055
0
28 May 2020
Learning Representations by Predicting Bags of Visual Words
Spyros Gidaris
Andrei Bursuc
N. Komodakis
P. Pérez
Matthieu Cord
SSL
102
117
0
27 Feb 2020
GridMask Data Augmentation
Pengguang Chen
Shu Liu
Hengshuang Zhao
Xingquan Wang
Jiaya Jia
94
311
0
13 Jan 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
192
1,082
0
21 Dec 2019
Fast Image Caption Generation with Position Alignment
Z. Fei
62
37
0
13 Dec 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
148
374
0
25 Oct 2019
Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff
A. Srinivas
J. Fauw
Ali Razavi
Carl Doersch
S. M. Ali Eslami
Aaron van den Oord
SSL
133
1,429
0
22 May 2019
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
177
3,461
0
18 Apr 2019
wav2vec: Unsupervised Pre-training for Speech Recognition
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
SSL
73
418
0
11 Apr 2019
On The Power of Curriculum Learning in Training Deep Networks
Guy Hacohen
D. Weinshall
ODL
75
445
0
07 Apr 2019
Competence-based Curriculum Learning for Neural Machine Translation
Emmanouil Antonios Platanios
Otilia Stretcu
Graham Neubig
Barnabás Póczós
Tom Michael Mitchell
86
344
0
23 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
1
2
Next