Fast Chirplet Transform feeding CNN, application to orca and bird bioacoustics

Advanced soundscape analysis or machine listening are requiring efficient time frequency decompositions. The recent scattering theory is offering a robust hierar- chical convolutional decomposition, nevertheless its kernels need to be fixed. The CNN can be seen as the optimal kernel decomposition, nevertheless it requires large amount of training data. This paper aims to show that Chirplet kernels are providing good constant Q time-frequency representation which yields to a better CNN classification than usual log-Fourier representation. We first recall the main advantages of the Chirplet concerning bioinspired auditory processing. Then the contributions of this paper are (1) to give a new fast implementation of the Chirplet by decreasing its complexity. (2) We validate fast Chirplet computation on nearly real-time over long series of orca online monitoring recordings, and on bird songs on hundreds of birds species. (3) We demonstrate that the Chirplet is improving convolutional neural net classification on complex overlapping bird calls challenge compared to usual Mel representation. Validations are conducted on a subset of the Amazon bird species of the LifeClef 2016 classification challenge
View on arXiv