Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis

We propose a multi-modal approach to the music emotion recognition (MER) problem, combining information from distinct sources, namely audio, MIDI and lyrics. We introduce a methodology for the automatic creation of a multi-modal music emotion dataset resorting to the AllMusic database, based on the...

Full description

Bibliographic Details
Main Author: Panda, Renato Eduardo Silva (author)
Other Authors: Malheiro, Ricardo (author), Rocha, Bruno (author), Oliveira, António Pedro (author), Paiva, Rui Pedro (author)
Format: other
Language:eng
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10316/94095
Country:Portugal
Oai:oai:estudogeral.sib.uc.pt:10316/94095
Description
Summary:We propose a multi-modal approach to the music emotion recognition (MER) problem, combining information from distinct sources, namely audio, MIDI and lyrics. We introduce a methodology for the automatic creation of a multi-modal music emotion dataset resorting to the AllMusic database, based on the emotion tags used in the MIREX Mood Classification Task. Then, MIDI files and lyrics corresponding to a sub-set of the obtained audio samples were gathered. The dataset was organized into the same 5 emotion clusters defined in MIREX. From the audio data, 177 standard features and 98 melodic features were extracted. As for MIDI, 320 features were collected. Finally, 26 lyrical features were extracted. We experimented with several supervised learning and feature selection strategies to evaluate the proposed multi-modal approach. Employing only standard audio features, the best attained performance was 44.3% (F-measure). With the multi-modal approach, results improved to 61.1%, using only 19 multi-modal features. Melodic audio features were particularly important to this improvement.