How Does the Spotify API Compare to the Music Emotion Recognition State-of-the-Art?

Features are arguably the key factor to any machine learning problem. Over the decades, myriads of audio features and recently feature-learning approaches have been tested in Music Emotion Recognition (MER) with scarce improvements. Here, we shed some light on the suitability of the audio features p...

ver descrição completa

Detalhes bibliográficos
Autor principal: Panda, Renato (author)
Outros Autores: Redinho, Hugo (author), Gonçalves, Carolina (author), Malheiro, Ricardo (author), Paiva, Rui Pedro (author)
Formato: other
Idioma:eng
Publicado em: 2021
Assuntos:
Texto completo:http://hdl.handle.net/10316/95161
País:Portugal
Oai:oai:estudogeral.sib.uc.pt:10316/95161
Descrição
Resumo:Features are arguably the key factor to any machine learning problem. Over the decades, myriads of audio features and recently feature-learning approaches have been tested in Music Emotion Recognition (MER) with scarce improvements. Here, we shed some light on the suitability of the audio features provided by the Spotify API, the leading music streaming service, when applied to MER. To this end, 12 Spotify API features were obtained for 704 of our 900-song dataset, annotated in terms of Russell’s quadrants. These are compared to emotionally-relevant features obtained previously, using feature ranking and emotion classification experiments. We verified that energy, valence and acousticness features from Spotify are highly relevant to MER. However, the 12-feature set is unable to meet the performance of the features available in the state-of-the-art (58.5% vs. 74.7% F1-measure). Combining Spotify and state-of-the-art sets leads to small improvements with fewer features (top5: +2.3%, top10: +1.1%), while not improving the maximum results (100 features). From this we conclude that Spotify provides some higher-level emotionally-relevant features. Such extractors are desirable, since they are closer to human concepts and allow for interpretable rules to be extracted (harder with hundreds of abstract features). Still, additional emotionally-relevant features are needed to improve MER.