Personal tools
Document Actions

2012 workshop

When 2012-03-31
from 10:00 to 17:30
Where University of Tokyo
Add event to calendar vCal


 Co-sponsored by IEEE Signal Processing Society, Japan Chapter


On the occasion of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, to take place in Kyoto from 25/03 to 30/03), we will be holding a 1-day workshop on Advances in Music Signal and Information Processing on Saturday, March 31, at the University of Tokyo. 


Preliminary program


9:30--10:00 Welcome
10:00--12:00 Oral presentations 1
12:00--12:30 Tour of Sagayama lab

12:30--14:00 Lunch

14:00--15:00 Oral presentations 2
15:00--17:30 Poster presentations

19:00--21:00 Dinner


Tentative list of presentations:

  1. Akira Maezawa / Yamaha Corporation
    "An Audio-to-Score Alignment Method with Bayesian Model of Timbre, Volume, Score, Tempo, and Reverberation"
  2. e present MAHL-R, a Bayesian model for jointly estimating the audio-score alignment, timbre, dynamics and tempo, with a Bayesian dereverberation frontend. 
    Difficulties in score alignment arise from several factors, including late reverberation, fluctuation of tempo, variety of timbre, dynamics, and relative onset timing of different instruments. To cope with reverberation, the method first attenuates late reverberation by modeling the spectrogram as a mixture of dry signal and a non-parametric mixture of past observed signals. We distribute the observed spectrogram into dry source and past observation weighted by the reverberation filter. Next, our method aligns the dereveberated audio signal and the music score. To realize this, we model the audio signal as mixture of different musical parts. Each part is modeled as a Hidden Semi-Markov Model (HSMM) whose state transition is governed by the music score and the state duration conditioned on Bayesian Autoregressive process (AR(1)), which is shared among different parts. Each state emits a Bayesian power spectrum modeled using Latent Harmonic Allocation (LHA).
  3. Variational Bayes (VB) is used to infer the score alignment, where we initially tie the HSMM state sequence of each part.  To infer the reverberation model, we optimize the lower bound of the objective function used in VB using an EM-like algorithm.
  5. Chris Cannam, Luis A. Figueira and Mark D. Plumbley / Queen Mary University of London
    "SoundSoftware: Software Sustainability for audio and Music Researchers"
    Sustainable and reusable software and data are becoming increasingly important in today's research environment. Methods for processing audio and music have become so complex they cannot fully be described in a research paper. Even if really useful research is being done in one research group, other researchers may find it hard to build on this research - or even to know it exists. Researchers are becoming increasingly aware of the need to publish and maintain software code alongside their results, but practical barriers often prevent this from happening. We will describe the Sound Software project, an effort to support software development practice in the UK audio and music research community. We examine some of the the barriers to software reuse, and suggest an incremental approach to overcoming some of them. Finally we make some recommendations for research groups seeking to improve their own researchers' software practice.
  6. Slim Essid / Télécom ParisTech
    "The 3DLife Multimodal Dance Corpus and Applications"
  7. Satoru Fukayama, Emmanuel Vincent and Shigeki Sagayama / The University of Tokyo
    "High dimensional dependency modeling on musical components with log-linear interpolation"
    High dimensional dependency modeling on musical components with log-linear interpolation" The aim of this research is to give a general framework that combines high dimensional dependencies between music components such as melody, harmony and key.  We examined log-linear interpolation for combining dependencies in music. Evaluation with cross cross-entropy indicated that combining the dependencies with our method had a contribution in increasing ability on predicting chord sequence.
  8. Masato Tsuchiya, Kazuki Ochiai, Hirokazu Kameoka, Masahiro Nakano and Shigeki Sagayama / The University of Tokyo
    "Stochastic Grammatical Rhythm/Harmonic Modeling of the Generating Process of Polyphonic Music Signals for Automatic Transcription"
  9. Tomohiko Nakamura, Daisuke Saito, Hirokazu Kameoka and Shigeki Sagayama / The University of Tokyo
    "Structure analysis based on music modeling with repeated segments"
    Music structure analysis refers to a technique that make computers understand the entire structure of a piece of music and can be used to create a meaningful summary of a music audio signal. Music structure analysis usually consists of two major tasks: dividing an audio signal into temporal segments, and grouping the segments into musically meaningful clusters. While extensive attempts have been made to tackle these tasks, they have met with only limited success. In this work, we choose to take a generative model approach. Here, a sequence of audio features is assumed to be a concatenation of subsequences each of which is associated with one of repeating segments. The problem is then formulated as finding the most likely segmentation that best explains the observed sequence of audio features with as few repeating segments as possible. We chose to use the chroma vector as the audio feature, as the chromagram is known to be effective when searching for similar harmonic sequences. The current progress of this work will be presented.
  10. Ki Yang Lee, Daisuke Saito, Hirokazu Kameoka and Shigeki Sagayama / The University of Tokyo
    "Covers Song Identification using Vocal-Enhanced Chroma"
    In the aspect of music information retrieval, methods of measuring music similarity have been studied in recent years. Similarity of music, however, is a vague concept despite its naturality in human recognition. Cover song is a re-performance or an arrangement of the original song, usually produced for easy commercial success and is recognized as musically similar by listeners in many cases. By this reason, cover song identification is considered as a good starting point of studying music similarity. We propose a new method of measuring similarity between two songs which is focused on vocal melody and harmonic sequences based on top-down modeling of song arrangement and generation.
  11. Gabriel Sargent, Frédéric Bimbot and Emmanuel Vincent / IRISA-INRIA
    "A regularity-constrained Viterbi algorithm and its applications to the structural segmentation of songs"
    This paper presents a general approach for the structural segmentation of songs. It is formalized as a cost optimization problem that combines properties of the musical content and prior regularity assumption on the segment length. A versatile implementation of this approach is proposed by means of a Viterbi algorithm, and the design of the costs are discussed. We then present two systems derived from this approach, based on acoustic and symbolic features respectively. The advantages of the regularity constraint are evaluated on a database of 100 popular songs by showing a significant improvement of the segmentation performance in terms of F-measure.
  12. Frédéric Bimbot, Emmanuel Deruty, Gabriel Sargent and Emmanuel Vincent / IRISA-INRIA
    "Methodology and resources for the structural segmentation of music pieces into autonomous and comparable blocks"
    The approach called  decomposition into autonomous and comparable blocks specifies a methodology for producing music structure annotation by human listeners based on a set of criteria relying on the listening experience of  the human annotator. The present article develops further a number of fundamental notions and practical issues, so as  to facilitate the usability and the reproducibility of the approach. We formalize the general methodology as an iterative process which aims at estimating both a  structural metric pattern and its  realization, by searching empirically for an optimal compromise describing the organization of the content  of the music piece in the most economical way, around a typical timescale. Based on experimental observations, we detail some  practical considerations and we illustrate the method by an extensive case study. We introduce a set of 500 songs for which we are releasing freely the structural annotations to the  research community, for examination, discussion and utilization.
  13. Aggelos Gkiokas / Institute for Language and Speech Processing
    “Music Tempo Estimation and Beat Tracking by Applying Source Separation and Metrical Relations”
    In this paper, we present tempo estimation and beat tracking algorithms by utilizing percussive/harmonic separation of the audio signal, in order to extract filterbank energies and chroma features from the respective components. Periodicity analysis is carried out by the convolution of feature sequences with a bank of resonators. Target tempo is estimated from the resulting periodicity vector by incorporating metrical relations knowledge. Tempo estimation is followed by a local tempo refinement method to enhance the beat-tracking algorithm. Beat tracking involves the computation of the beat saliencies derived from the resonators responses and proposes a distance measure between candidate beats locations. A dynamic programming algorithm is adopted to find the optimal “path” of beats. Both tempo estimation and beat tracking methods were submitted on MIREX 2011, while the tempo estimation algorithm was also evaluated on ISMIR 2004 Tempo Induction Evaluation Exchange Dataset.
  14. Jan Larsen / Technical University of Denmark
    “Eliciting Preferences in Music”
  15. Satoru Fukayama, Daisuke Saito and Shigeki Sagayama / The University of Tokyo
    "Orpheus Version 3: Integrated system for assisting novice users to generate original songs from their Japanese lyrics"
    Orpheus Version 3: Integrated system for assisting novice users to generate original songs from their Japanese lyrics" We discuss a system design which helps novice users to create their original songs from Japanese lyrics within three approaches; to design a system with directionfunctionality in generating songs, to formulate composition as an optimization problem, and to integrate synthesis and analysis engine of vocal and lyrics. Evaluation took place through the operation of our web-based implemented system. The results indicated that our method was able to assist users to generate their original songs from Japanese lyrics.
  16. Hideyuki Tachibana, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama / The University of Tokyo
    "Comparative evaluations of harmonic/percussive sound separation algorithms based on anisotropic continuity of spectrogram"
    In this presentation, we address the comparison and performance evaluation of harmonic and percussive sound separation techniques, which are based on anisotropic continuity of spectrogram. Such separation technique is a very useful as a preprocessors for many music-information-retrieval tasks, including chord estimation, etc. So far, we have introduced a method called Harmonic/Percussive Sound Separation (HPSS), that decomposes a music signal into the two components by separating the spectrogram into horizontally-continuous component and vertically-continuous component, which roughly correspond to harmonic and percussive component respectively. There are many ways to formulate HPSS algorithms based on the concept, and each algorithm has its own parameters that need to be tuned. Of all these algorithms, we have not yet verified which one performs best. As HPSS can be used as an effective preprocessor for many MIR tasks, it is important to investigate which HPSS variant performs best. This paper describes the details of each HPSS algorithms, and provides comparative evaluation of the devised algorithms using real music signals.
  17. Ken O'Hanlon, Hidehisa Nagano, Mark D. Plumbley / Queen Mary University of London, United Kingdom
    “Structured Sparsity For Automatic Music Transcription”
    Sparse representations have previously been applied to the automatic music transcription (AMT) problem. Structured sparsity, such as group and molecular sparsity allows the introduction of prior knowledge to sparse representations. Molecular sparsity has previously been proposed for AMT, however the use of greedy group sparsity has not previously been proposed for this problem. We propose a greedy sparse pursuit based on nearest subspace classification for groups with coherent blocks, based in a non-negative framework, and apply this to AMT. Further to this, we propose an enhanced molecular variant of this group sparse algorithm and demonstrate the effectiveness of this approach.




 The University of Tokyo, Engineering Building #6,
7-3-1, Hongo, Bunkyo-ku, Tokyo, Japan
Access maps:
Google maps link:,139.761575&spn=0.001115,0.001725&sll=35.713652,139.762208&sspn=0.003711,0.006899&t=m&z=19


To register, please send an email to stating:
 - your name
 - your affiliation
 - whether you wish to attend lunch and/or dinner
 - (if appropriate) the topic of your presentation.
We will confirm your registration within 2 days subject to space constraints. Lunch and dinner will not be free but reasonably priced.



Stanisław Raczyński and Emmanuel Vincent, INRIA, France

Hirokazu Kameoka and Shigeki Sagayama, the University of Tokyo, Japan

Nobutaka Ono, National Institute of Informatics, Japan




Emmanuel Vincent
INRIA Rennes - Bretagne Atlantique
Campus de Beaulieu
F-35042 Rennes Cedex, France.

PHONE: +33 2 99 84 22 69
FAX: +33 2 99 84 71 71