2011 results
2011 results
Software and evaluation infrastructure (S. Raczynski, E. Vincent, H. Tachibana, S. Sagayama, S. Fukayama, F. Bimbot)
We released a corpus of quantized MIDI scores and functional harmony annotations for 63 classical music audio files (RWC Classical Music Database) (corpus). We also created a set of melody annotations for 10 stereo songs, so as to evaluate our stereo drum removal algorithm for subsequent melody estimation.
In parallel, we outlined a sustainable procedure for the collection of large-scale music corpora by exploiting the wealth of music data available on the web (audio, MIDI, leadsheets, lyrics, etc) together with algorithms for the automatic detection and alignment of matching data. We believe that this procedure will eventually provide a breakthrough in the field of symbolic music processing, which remains dominated by small handcrafted datasets today.
Temporal organization modeling (G. Sargent, F. Bimbot, S. Raczynski, E. Vincent, S. Sagayama)
We designed a model of the long-term (e.g. repetition of choruses and verses) and mid-term (e.g. repetition of chord patterns within a verse) structure of a musical piece based on left-right language models. This model was applied to symbolic chord sequences and shown to outperform the structural segmentation algorithms evaluated within the MIREX 2010 evaluation campaign (paper). The resulting software was submitted to the MIREX 2011 evaluation campaign and released (paper, software).
Symbolic modeling (S. Raczynski, S. Fukayama, E. Vincent, S. Sagayama)
We pursued the work on the joint modeling of "horizontal" (sequential) and "vertical" (simultaneous) dependencies between notes by log-linear interpolation of the corresponding conditional distributions. We identified the normalization of the resulting distribution as a crucial problem for the performance of the model and proposed an exact solution to this problem. A journal article is under preparation.
We also applied the log-linear interpolation paradigm to the joint modeling of melody, key, chords and meter, which evolve according to different timelines. In order to synchronize these feature sequences, we explored the use of beat-long templates consisting of several notes as opposed to short time frames containing a fragment of a single note. We are planning to finalize the evaluation of this model by the end of the year.
Performance modeling (J. Wu, E. Vincent, S. Raczynski, N. Ono, S. Sagayama)
We finalized and published our work on polyphonic pitch estimation and instrument identification by joint modeling of sustained and attack sounds (journal, paper). The proposed model outperformed state-of-the-art algorithms based on the modeling of harmonic sounds only. We also conducted a study on advanced classification algorithms for polyphonic instrument identification (paper).
Acoustic modeling and segregation (N. Ito, N. Duong, H. Tachibana, E. Vincent, N. Ono)
We published our work on stereo harmonic vs. percussive source separation and released the corresponding software (paper, software). We are currently evaluating the impact of drum removal via this approach for subsequent music information retrieval tasks such as melody estimation and chord detection.
In parallel, we pursued our work on the localization of multiple sources in the presence of diffuse noise. We uncovered a general statistical framework for the modeling of diffuse noise in the covariance domain and derived a general denoising algorithm (paper). This work was awarded the Best Student Presentation Award of the 2011 ASJ Spring Meeting. N. Ito is currently writing his PhD thesis and a journal article is under preparation.
Events and funding (E. Vincent, N. Ono)
We organized a one-day workshop in Tokyo. Besides the presenters and other students from the University of Tokyo, the workshop attracted 9 external attendees from Japanese companies and universities.
Following S. Raczynski's PhD defense at the University of Tokyo in March, we obtained a scholarship from INRIA to recruit him as a post-doctoral researcher in METISS from December 1, 2011, to March 31, 2013.
On their visit to Japan, E. Vincent and N.Q.K. Duong visited researchers from NTT CS Labs (Keihanna branch) and NAIST which could possibly take part in a future collaborative project.
N. Ono was appointed as an Associate Professor with the National Institute of Informatics (NII) in Tokyo. Due to the strong links remaining between him and the University of Tokyo, the project partners agreed that the collaboration with him would be pursued.
E. Vincent and N. Ono made plans for a collaborative project that would replace VERSAMUS after its end in December 2012. The topic of ad-hoc microphone arrays which emerged during the jointly supervised PhD thesis of N. Ito was selected as one of the hottest topics for funding agencies. Due to the lack of bi-national ANR+JST call in the field of ICT this year, it was decided that N. Ono would submit a project to the Grant-in-Aid for Scientific Research call of JSPS in October 2011. The project proposal will identify E. Vincent as an international collaborator, so as to enable visits and co-supervision of the main project researcher, and involve a second Japanese lab. If accepted, this project will make it possible to pursue collaborative research in this field until a new ANR+JST call opens.