Navigation

Current Research Topics

  • Action Recognition and Detection in Videos
  • Scene Understanding and Classification
  • Scene Text Detection and Recognition
  • Facial Analysis and Recognition
  • Video and Image Enhancement for Surveillance
  • Music Image Matching
  • Voice Conversion
  • Previous Research Topics

  • Invariant structural representation for speech recognition
  • Phase Singularities for Image Representation and Object Matching
  • Unsupervised Phoneme Segmentation
  • Signature Verification
  • Recover drawing order from static handwritten images
  • Other topics
  • Action Recognition and Detection in Videos

    Human action understanding is receiving extensive research interests in computer vision nowadays due to its wide applications in surveillance, human-computer interface, sports video analysis, and content based video retrieval. The challenges of action understanding come from background clutter, viewpoint changes, and motion and appearance variations. Our group has put continuous efforts to address these challenges, which ranges from mining middle level parts (CVPR13, ICCV13), multi-view encoding of local descriptors (CVPR 14), hierarchical model (TIP14), dictionary learning (ECCV 14), to sequential modeling (ECCV 14) for action recognition and detection. Recently, we are interested in using deep learning techniques (ECCV14, CVPR15) for action and video modelling. Experimental results on large public datasets (e.g. UCF101, HMDB51) demonstrate the effectiveness of the proposed methods.

  • Limin Wang, Y. Qiao, Xiaooutang, " Motionlets: Mid-Level 3D Parts for Human Motion Recognition," Proc. Int. Conf. Computer Vision and Pattern Recognition ( CVPR), 2013
  • In spite of motionlet, we have develop several other methods on video based action Recognition.
  • X.Peng, Y. Qiao, Q. Peng, and X. Qi, "Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition", Proc. BMVC, 2013
  • Xingxing Wang, Limin Wang, Y. Qiao," A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition," Proc. Asian Conference on Computer Vision ( ACCV), 2012.
  • Back to Top

    Scene Understanding and Classification

    We propose several local descriptors for scene classification. These descriptors can also be extended to on texture and material Classification. We are also interested in using deep learning for scene classification. Our deep models achieve rank 2 (Google is No 1) in Scene Classification task of LSUN Challenge .

  • X.Qi, R. Xiao, G. Li, Yu Qiao , J. Guo, X. Tang " Pairwise Rotation Invariant Co-occurrence Local Binary Pattern", IEEE Trans. on Pattern Analysis and Machine Intelligence (T-PAMI), Vol. 36, No. 11, pp. 2199 - 2213, Nov. 2014
  • X. Qi, Y. Qiao, and J. Guo, "Multi-scale Joint Encoding of Local Binary Patterns for Texture and Material Classification", Proc. BMVC, 2013
  • X. Qi, Y. Qiao, and J. Guo, "Exploring Cross-Channel Texture Correlation for Color Texture Classification", Proc. BMVC, 2013
  • Back to Top

    Scene Text Detection and Recognition

    We . We also develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. The proposed model achieves impressive performance on a number of benchmarks, ICDAR03, SVT-50, IIIT5k-1000, and advancing the-state-of-the-arts PhotoOCR substantially.
  • W. Huang, Yu Qiao, X. Tang, " Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees ," Proc. European Conference Computer Vision ( ECCV), 2014
  • Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, and Xiaoou Tang, "Reading Scene Text in Deep Convolutional Sequences," ,arXiv:1506.04395v1, 2015
  • Back to Top

    Facial Analysis and Recognition

    Face analysis and recognition has been under intensive researches in the last years. In spite of significant progresses made in this area, it remains a challenging problem. We are particularly interested in face recognition in remote, age invariant face recognition, and cross-modal face recognition. This project is collaborated with Prof. Zhifeng Li.
  • Zhifeng Li, Dihong Gong, Yu Qiao , Dacheng Tao , "Common Feature Discriminant Analysis for Matching Infrared Face Images to Optical Face Images", IEEE Transactions on Image Processing (TIP), vol.23, no 6, pp. 2436 - 2445, 2014
  • Baochang Zhang, Yu Qiao, "Face Recognition based on Gradient Gabor feature and Efficient Kernel Fisher Analysis," Neural Computing and Application, p.941-643, Nov., 2009.
  • Zhifeng Li, Dihong Gong, Jianzhuang Liu, Yu Qiao, "Multi-feature Canonical Correlation Analysis for Face Photo-Sketch Image Retrieval," Proc. ACM Multimedia (ACM-MM), 2013.
  • Yu Qiao , Huang Xiyue and Chaiyi; "Face Recognition based on Weighted PCA" Journals of Chongqing University , Vol. 27(3), pp.28-31, 2004
  • Back to Top

    Music and Image Retrieval




    Human perceptions of music and image are closely related to each other. Both music and image can inspire similar human sensation such as emotion, motion, and power etc. The main objective of this paper is to investigate whether and how music and image can be bridged by machine. As a preparation, we ask six labelers to compare more than 25,000 music-image pairs obtained from music videos against random music-image pairs. The results show that the human labelers largely agree with each other on the matching degree of music-image pairs, and all prefer the pairs from music videos than random ones. We use a semantic vector composed by the posterior probabilities of descriptive concepts to represent music segment. We adopt lyrics as a middle-media to connect music and image and extract a set of attributes from lyrics for image representation. Then we propose a new cross-modal kernel analysis (CMKA) method to learn the semantic similarity between music and image with side information. Experimental results demonstrate that the proposed method significantly outperforms previous methods, and achieves a high consistency rate with human labelers.

  • X. Wu, Y. Qiao, X. Wang, X. Tang, "Bridging Music and Image: A Preliminary Study with Multiple Rank-CCA Learning," Proc. ACM Multimedia (ACM-MM), 2012.
  • X. Wu, B. Xu, Y. Qiao, X. Tang, " Automatic Music Video Generation: Cross Matching of Music and Image," Proc. ACM Multimedia (ACM-MM), 2012.
  • We develop a demo system which use image query to search image, MiSearch: http://210.75.252.74/musicSys/View/Demo/image2music.php

    Back to Top

    Voice Conversion







    Voice conversion, a task to transform one speaker's voice to another¡¯s, can be regarded as a problem to find a mapping function between speech sequences of two speakers. We proposed a series methods for voice conversion.

  • Na Li, Yu Qiao, "Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion," Interspeech 2012
  • Yu Qiao, T. Tong, N. Minematsu, "A Study on Bag of Gaussian Model with Application to Voice Conversion," Proc. INTERSPEECH, 2011
  • Y. Qiao and N. Minematsu, "HMM-based sequence-to-frame mapping for voice conversion," Proc. Int. Conf. Acoustics, Speech, & Signal Processing (ICASSP), pp.4830-4833, 2010
  • Y. Qiao and N. Minematsu, "Mixture of probabilistic linear regressions: a unified view of GMM-based mapping techniques," Proc. Int. Conf. Acoustics, Speech, & Signal Processing (ICASSP),pp.3913-3916, 2009
  • Back to Top

    Video and Image Enhancement for Surveillance

    Video surveillance system has been widely in various conditions. Many of the systems suffer from the poor video quality which limits its application. We are developing techniques to improve the image and video quality in haze, fog, defocusing, and low illumination conditions.
  • X. Zhu, Y. Li, Yu Qiao, "Fast Single Image Dehazing Through Edge-Guided Interpolated Filter," Proc. Machine Vision Application (MVA), 2015
  • Back to Top

    Previous Research Projects

    Phase Singularities for Image Representation and Object Matching




    Phases have been widely used in signal and image processing due to their stability to transformation, deformation, and noise addition. However, phase singularities, where the complex signals vanish, are generally regarded as harmful and unreliable. In this work, on the contrary, we try to show that phase singularities calculated by using the Laguerre-Gauss filter contain important information and can provide a reliable representation for images. We prove that phase singularities are invariant to translation and rotation, and show how to reconstruct an image up to a scale only from the positions of phase singularities. We develop two applications of phase singularities: object tracking and image matching. In object tracking, we use the iterative closest point algorithm to determine the corresponding relations of phase singularities between two adjacent frames.

  • Y. Qiao, W. Wang, N. Minematsu, J. Liu, X. Tang "Phase singularities for image representation and matching," Proc. Int. Conf. Acoustics, Speech, & Signal Processing (ICASSP), 2008
  • Yu Qiao , W. Wang, N. Minematsu, J. Liu, M. Takeda , X. Tang " A Theory of Phase Singularities for Image Representation and its Applications to Object Tracking and Image Matching," IEEE Trans. on Image Processing, vol.18, no.10, pp.2153-2166, 2009
  • Demo video on tracking fugu (6 M)
  • We develop a demo system which search building with query images, iGAPSearch£º http://mmlab.siat.ac.cn/gbir/

  • Jiemin Wang, Yuanhai He, Yujie Zhou, Yu Qiao, "iGAPSearch : Using Phone Cameras to Search Around the World", Proc.IEEE Int. Conf. Information and Automation (ICIA), 2011
  • Back to Top

    Signature Verification


    We proposes a novel framework for offline signature verification. Different from previous methods, our approach makes use of online handwriting other than 2D signature images for registration signatures. The online registrations enable a robust recovery of the writing trajectory from an input offline signature and thus allow an effective shape matching between registration and verification signatures. For the first time, we formulate and solve the recovery of writing trajectory within the framework of Conditional Random Fields. We use online context to align signatures and develop a verification criterion which combines the duration and amplitude variances of handwriting. Experiments on benchmark databases exhibit that the proposed method significantly outperforms the compared offline signature verification methods.



  • Yu Qiao, Jianzhuang Liu and Xiaoou Tang, "Offline Signature Verification using Online Registration", International Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
  • We have developed a method for online signature verification.
  • Yu Qiao, Xingxing Wang, Chunjing Xu, "Learning Mahalanobis Distance for DTW based Online Signature Verification", Proc.IEEE Int. Conf. Information and Automation (ICIA), 2011
  • N. Houmani, S. Garcia-Salicetti, B. Dorizzi1, J. Montalv, J. C. Canuto, M. V. Andrade, Y. Qiao, X.Wang,et.al "BioSecure Signature Evaluation Campaign (ESRA2011):Evaluating Systems on Quality-based categories of Skilled Forgeries," International Joint Conference on Biometrics (IJCB) 2011
  • Back to Top

    Invariant structural representation for automatic speech recognition

    Speech recognition has to deal with inevitable acoustic variations caused by non-linguistic factors. Recently, an invariant structural representation of speech was proposed by N. Minematsu, where the non-linguistic variations are effectively removed though modeling the dynamic aspects of speech signals. I work on both the theoretical and practical aspects of the invariant structural representation.

    Theoretically, we prove f-divergence yields a general family of invariant measures, and prove that all invariant measures have to be written in the form of f-divergence.
  • Y. Qiao and N. Minematsu, "f-divergence is a generalized invariant measure between distributions," Proc. INTERSPEECH, 2008


  • I also develop practical techniques to address the two problems of structural representations: high dimensionality and too strong invariance.
  • Yu Qiao, Satoshi Asakawa andNobuaki Minematsu, "Random Discriminant Structure Analysis for Automatic Recognition of Connected Vowels", IEEE workshop on Automatic Speech Recognition and Understanding (ASRU), 2007.
  • Yu Qiao, S. Asakawa, N. Minematsu, and K. Hirose, "Dimension reduction and discriminant analysis for Japanese connected vowel recognition," Proc. Autumn Meeting of Acoust. Soc. Jpn., 2-P-2, (2008-9)
  • Back to Top

    Unsupervised Phoneme Segmentation

    Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. In this work, we formulate the optimal segmentation problem into a probabilistic framework. Using statistics and information theory analysis, we develop three different objective functions, namely, Summation of Square Error (SSE), Log Determinant (LD) and Rate Distortion (RD). We introduce a time-constrained agglomerative clustering algorithm to find the optimal segmentations. The proposed method outperforms the recently published unsupervised segmentation methods.
  • Y. Qiao, N. Shimomura, N. Minematsu, "Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons," Proc. Int. Conf. Acoustics, Speech, & Signal Processing (ICASSP), 2008


  • We further show learned metric by Minimum of Summation Variance (MSV) and Maximum of Discrimination Variance (MDV) can significantly improve the segmentation results.
  • Y. Qiao and N. Minematsu, "Metric learning for unsupervised phoneme segmentation," Proc. INTERSPEECH, 2008
  • Back to Top

    Recover drawing order from static handwritten images

    The object of this research is to recover the temporal information (online) from static handwriting image, which is generally regarded as an important yet a hard problem in the handwriting recognition field. We formulate the recovery problem as to find the smoothest path in its graph representation. A 3-phase approach to recover a writing order is proposed within the framework of Edge Continuity Relation (ECR). Experiments on 708,988 static images show that our method achieves a restoration rate of 96.0%. To the best of our knowledge, this is the highest result reported on large database.

  • Yu Qiao M. Nishiara and M. Yasuhara, " A Framework toward Restoration of Writing Order from Single-Stroked Handwriting Image", (Proof of Theorems) IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1724-1737, Vol. 28, No. 11, November 2006

  • Some experimental results can be found here.
  • Yu Qiao Mikihiko Nishiara and M. Yasuhara; " A Novel Approach to Recover Writing Order From Single Stroke Offline Handwritten Images ", Proceeding of Eighth International Conference on Document Analysis and Recognition (ICDAR), pp. 227-231, 2005, Seoul Korea
  • Yu Qiao and M. Yasuhara. " Recover Writing Trajectory from Multiple Stroked Image Using Bidirectional Dynamic Search", International Conference on Pattern Recognition (ICPR), 2006. Hongkong China
  • Back to Top

    Other topics

    Optimal Euler Circuit/Path

    This research introduces and solves a new graph problem: to find an Optimal Euler Circuit (OEC) in Euler graph. I prove that the OEC problem is NP-complete. I develop a polynomial time algorithm to find OEC in an Euler graph with 4-degree vertex only and propose a 1/4-approximation algorithm for general Euler graphs. .
    The source codes of some proposed algorithms to find optimal Euler circuit together can be found here.

  • Yu Qiao and M. Yasuhara, "Optimal Euler Circuit of Maximum Contiguous Cost" , IEICE Transactions on Fundamental Electronics, Communication and Computer Science, Vol.E90-A,No.1,pp.274-280,Jan. 2007
  • Yu Qiao, M. Yasuhara. "Reccovering Drawing Order From Offline Handwritten Image Using Direction Context and Optimal Euler Path," 31st International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006. Toulouse France
  • Back to Top

    Safety Control based on Expert Network

  • Y. Li, G.Zhao, X.Y. Huang, Yu Qiao and W. Li, "Object-oriented Design of the Intelligent Decision-making System of Safety Controlling", Journal of Chongqing University, Vol.27(12), pp.102-106, 2004
  • W. Li, X.Y. Huang, H.B.Wei, Yu Qiao ,Y. Li "Knowledge-based Intelligent Decision-making System for the Safety Controlling", Journal of Chongqing University, Vol.29(1),pp.77-80 2006
  • H.B.Wei, X.Y. Huang, Yu Qiao , Y. Li, W. Li "Orbit Selection by Knowledge Based Neural Network", Journals of Chongqing University Vol.28(11), pp.27-30,2005
  • Corder detection

  • Yu Qiao, Huang X.Y. and Chai Y; "Corner Point Detection Based on Adaptive Line Approximation", Journals of Chongqing University Vol. 26(2),pp.29-31, 2003
  • Back to Top