Music-Image Matching

Music-Image Semantic Similarity Estimation


Human perceptions of music and image are closely related to each other. Both music and image can inspire similar human sensations such as emotion, motion, and power etc. The main objective of this paper is to investigate whether and how music and image can be bridged by machine. The contributions are three folds. Firstly, we construct a dataset composed by more than 25,000 music-image pairs obtained from music videos, and conduct human annotation of comparing the matching degree of these pairs. The results show that the human labelers largely agree with each other on the matching degree of music-image pairs. Secondly, we propose semantic representations of music and image which are suitable for cross modal matching task. Specially, we adopt lyrics as a middle-media to connect music and image and extract a set of attributes from lyrics for image representation. Thirdly, we propose a new method, cross-modal kernel analysis (CMKA) to learn the semantic similarity between music and image with side information. CMKA aims to find the optimal embedding spaces for both music and image in sense of maximizing the ordinal margin between music-image pairs annotated by the labelers and the random ones. The proposed method is able to learn the non-linear relationship between music and images, and more importantly, it can efficiently integrate heterogeneous data from different modalities into a unified space. Experimental results demonstrate that the proposed method performs best in the music-image matching task.

Human Labelling

Source Data

  • Music feature: mfcc, delta-mfcc, delta-delta mfcc feature (details can be found in our supplementary material). MFCC
  • Image feature: Image: hog2x2 feature. HOG
  • The matching information: Matching Result
    the first column represents music ID
    the second column represents the matched image's ID
    the third column represents the randomly chosen image's ID.

Supplementary Material

The supplementary material (details) of this project.

  • Supplementary Material PDF

Video Source

The video for framework illustration.

  • Framework Introduction MP4

Examples of Music-Image Matching results

  • Example cases by our algorithm. demo