2010 results
2010 results
Modeling and estimation framework (E. Vincent, S. Raczynski)
We proposed a general dynamic Bayesian network structure (paper) integrating the most essential features of music into 4 layers: temporal organization features, symbolic features, performance features and acoustic features. This work pinpointed the most challenging aspects of music modeling, including high dimensionality, large vocabulary, data-dependent vocabulary and long-term dependencies, and provided promising research directions to address them. The family of junction tree algorithms has been identified as key for versatile Bayesian inference in such models.
Software and evaluation infrastructure (S. Raczynski, E. Vincent, S. Sagayama, H. Tachibana, S. Fukayama)
The work on symbolic modeling led to a small set of software modules for music language modeling which will form the basis of the targeted software platform.
We analyzed the requirements regarding training and testing data and started creating a test corpus consisting of quantized MIDI and aligned MIDI scores and functional harmony annotations for 20 classical music audio files so far (RWC Classical Music Database). Development corpora on the order of several thousand MIDI scores or lead sheets have been collected for the training of symbolic language models.
Symbolic modeling (S. Raczynski, E. Vincent, S. Fukayama, S. Sagayama)
We proposed a joint model of chord sequences and polyphonic note sequences (paper) and evaluated it both in terms of prediction capabilities (aka "perplexity") and polyphonic pitch transcription performance. This study provided a proof of concept of VERSAMUS's goal, namely the probabilistic integration of multiple features, and was the first to our knowledge to address joint modeling of "horizontal" (sequential) and "vertical" (simultaneous) dependencies between notes by interpolation of the corresponding conditional probabilities.
Performance modeling (J. Wu, E. Vincent, K. Suzuki, N. Ono, S. Sagayama)
Instrumental timbre is one of features characterizing a performance. By designing a new model of the short-term power spectrum of musical audio representing both the attack part and the harmonic part of each note, we derived instrumental timbre features by PCA over the model parameters and evaluated them for the task of polyphonic instrument identification. The proposed features outperformed state-of-the-art polyphonic instrument identification algorithms based on the modeling of harmonic sounds only.
Acoustic modeling and segregation (N. Ito, N. Duong, H. Tachibana, E. Vincent, N. Ono)
We conducted a survey (tutorial) on the use of source separation in music information retrieval. We designed a new algorithm for harmonic vs. percussive source separation based on joint modeling of spectral and spatial continuity and showed that it resulted in a significant decrease of "separation artifacts".
In parallel, we conducted more general work on the localization and the separation of sources in the presence of diffuse noise (paper, paper, paper) by modeling the contribution of diffuse noise to the spatial covariance matrix.
Events and funding (E. Vincent, N. Ono)
We organized a two-day workshop in Rennes.
We obtained funding from the Franco-Japanese Doctoral College and from JSPS for the 1-year visit of Nobutaka Ito and from the Department of Information Science and Technology of the University of Tokyo for the 1-month visit of Kosuke Suzuki.
On his visit to Japan, E. Vincent visited researchers from NTT CS Labs (Atsugi branch) which could possibly take part in a future collaborative project.