Local Multi-Grouped Binary Descriptor with Ring-based Pooling Configuration and Optimization

Yongqiang Gao,Weilin Huang, Yu Qiao

Shenzhen Key lab of Computer Vision and Pattern Recognition

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China

Shenzhen Colledge of Advanced Technology, University of Chinese Academy of Sciences

Chinese University of Hong Kong, Hong Kong SAR

Abstract

Local binary descriptors are attracting increasingly attentions due to their great advantages in computational speed, which are able to achieve real-time performance in numerous image/vision applications. Various methods have been proposed to learn data-dependent binary descriptors. However, most existing binary descriptors aim overly at computational simplicity at the expense of significant information loss which causes ambiguity in similarity measure using Hamming distance. In this paper, by considering multiple features might share complementary information which could boost performance, we present a novel local binary descriptor, referred as Ring-based Multi-Grouped Descriptor (RMGD), to successfully bridge the performance gap between current binary and floated-point descriptors. Our contributions are two-fold. Firstly, we introduce a new pooling configuration based on spatial ring-region sampling, which allows to involve binary tests on the full set of pairwise regions with different shapes, scales and distances. This leads to a more meaningful description than existing methods which normally apply a limited set of pooling configurations. Then, an extended Adaboost is proposed for efficient bit selection by emphasizing high variance and low correlation, achieving highly compact representation. Secondly, the RMGD is computed from multiple image properties where binary strings are extracted. We cast multi-grouped features integration as a rankSVM or sparse SVM for weights learning, so that different features can compensate strongly for each other, which is the key to discriminativeness and robustness. The performance of RMGD was evaluated on a number of publicly available benchmarks, where the RMGD outperforms the state-of-the-art binary descriptors significantly.

Paper

Coming soon...

Software

Download: coming soon, code(matlab)

Discussion and Analysis
  • Visualization
  • Spatial arrangement of the eight selected ring-regions learned from the one group on the gradient magnitude map of the "Liberty" dataset.

    Spatial arrangement of the eight selected ring-regions learned from all eight groups on the gradient magnitude map of the "Liberty" dataset.

    Note: 1, Two paired regions are indicated in a same color.

             2, The central part of the patch contains the much more meaningful information just like BRIEF [1].

             3, The paired regions are mostly in the same or nearby fans. This arrangement resembles the underlying structure of the dataset , where matched patches agree 0.25 octaves of scale and pi/8 radians in angle.

  • Time Cost
  • Average time costs of different descriptors
    Descriptor Extracting time (ns) Matching time (us)
    SIFT[9] 235 940
    BinBoost[7] 6.48 36.45
    BRIEF[1] 12.54 35.46
    RMGD 10.46 32.46
    Experimental Results
  • Comparison Results: Ring-Region Sampling VS BRIEF
  • Four types of pooling regions: All, 4-division, 8-division and 16-division.

    False positive rate at 95%(FPR@ 95%) for ring-region features with different divisions on the dataset of 100k Notre Dame.

    Note: 1, Results are obtained by averaging 5 loops and all bits are selected by uniform random and error bar indicated variance for each divisin with related bits.

             2, It can be seen that 8-division method achieves the best performance among the four cases compared.

  • Comparison Results: BBSCC VS BSB
  • Note: 1,"BBSCC" stands for Boosted Bit Selection with Correlation Constraints and "BSB" stands for Bit Selection with Boosting.

            2, The experiments are trained on "Liberty" datasets with two different scales: 4k pairs(1k matching and 3k non-matching), and 40k paris(10k matching and 30k non-matching). And it was test in the "Notre Dame" with 100k pairs(50k matching and 50k non-matching).

            3, The BBSCC achieves the FPR @95% at 38.46% and 28.27% for 4k- and 40k- training sets by selecting only 256 bits from total 591,328 bits. The FPR@95 of BSB algorithm on the 40k training data with 256 selected bits is 29.45% with the time costs of four times compared BBSCC-40k's.

  • Comparison Results of Different Integration Learners
  • Comparison resluts of L1 and L2

    Comparison results of using different regularizers which include concatenated groups directly (No-opt), using L1 norm and L2 norm, respectively. And FPR@ 95% recall is showed in the vertical axis and the horizontal axis gives the number of bits for each group from 8 to 1024. "Lib-NoD" shows it is training on "liberty" (abbr. Lib) and testing on "Notre Dame" (abbr. NoD) and so on.

  • Comparison Results of State-of-the-art
  • Brown et al.'s Patch Dataset

    Comparative results on the Brown's Dataset. The results are reported in terms of false positive rate at 95% recall. For learning based descriptors, we give the results for the two training datasets per testing dataset, while for those descriptors that do not depend on the training data, we write results per each testing dataset. In parentheses, we give the number of bits used to encode them.  We assume 1 byte (8 bits) per dimension for floating-point descriptors since this quantization was reported as sufficient for SIFT in literature. RFD significantly outperforms its binary competitors and performs comparably to the best floating-point descriptors.

    Train Yosemite Notre Dame Yosemitee Liberty Notre Dame Liberty
    Test Liberty Notre Dame Yosemite
    Binary Descriptors
    BRIEF[1] 54.01(512b) 48.64(512b) 52.69(512b)
    BRISK[2] 79.36(1024b) 74.88(1024b) 73.21(1024b)
    FREAK[3] 58.14(512b) 50.62(512b) 52.95(512b)
    D-BRIEF[4] 53.39(32b) 51.30(32b) 43.96(32b) 43.10(32b) 46.22(32b) 47.29(32b)
    ITQ-SFIT[5] 37.11(64b) 36.95(64b) 30.56(64b) 31.07(64b) 34.34(64b) 34.43(64b)
    BGM[6] 22.18(256b) 21.62(256b) 14.69(256b) 15.99(256b) 18.42(256b) 21.11(256b)
    BinBoost[7] 21.67(64b) 20.49(64b) 14.54(64b) 16.90(64b)) 18.97(64b) 22.88(64b)
    RFDG[8] 19.03(563b) 17.77(542b) 11.37(563b) 12.49(406b)) 15.14(542b) 17.62(406b)
    RFDR[8] 19.40(598b) 19.35(446b) 11.68(598b) 13.23(293b)) 14.50(446b) 16.99(293b)
    Floated-point Descriptors
    SIFT[9] 32.46(128f) 26.44(128f) 30.84(128f)
    Brown et al.[10] 18.27(29f) 16.85(36f) 11.98(29f) - 13.5(36f) -
    Simonyan et al.[11] 16.7(32f) 14.26(32f) 9.99(32f) 9.07(32f) 13.4(32f) 14.32(32f)
    Our Combined-binary Descriptors
    RMGD104 17.42(50 x 32b) 15.09(44 x 32b) 10.86(45 x 32b) 10.15(50 x 32b) 13.82(44 x 32b) 14.64(43 x 32b)

    Note: In Simonyan et al.'s TPAMI paper, the results are a litter different from here, which are from their ECCV paper.

  • Application
    1. Image Matching

    Matching accuracy obtained by ORB-32, BRISK-64 and IMGBD-104 for the six image sequences of "Mikolajczyk" datasets.

    Note: For a fair comparison, all the descriptors are build on the same interest regions (including rotation), which are detected by Hessian-Affine detector.

      Object Recognition
    Object recognition accuracy on the ZuBuD and Kentucky datasets with different local descriptors.

    ZuBuD Kentucky
    SIFT[9] 75.5% 48.2%
    BGM[6] 67.3% 36.3%
    BinBoost-256[7] 62.3% 19.2%
    BRIEF[1] 70.5% 41.6%
    FREAK[3] 48.8% 21.9%
    SIFT-KSH[12] 64.6% 29.8%
    RFDG[8] 82.5% 65.1%
    RFDR[8] 80.7% 62.5%
    RMGD104 85.4% 67.3%
    References

    BRIEF[1]: M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha and P. Fua, "BRIEF: Computing a Local Binary Descriptor Very Fast", IEEE TPAMI, 2012

    BRISK[2]: S. Leutenegger, M. Chli and R. Siegwart, "BRISK: Binary Robust invariant scalable keypoints", ICCV, 2011

    FREAK[3]: R. O. Alahi, Alexandre and P. Vandergheynst, "Freak: Fast retina keypoint", CVPR, 2012

    D-BRIEF[4]: T. Trzcinski and V. Lepetit, "Efficient Discriminative Projections for Compact Binary Descriptors", ECCV, 2012

    ITK-SFIT[5]: Yunchao Gong, S. Lazebnik, A. Gordo, and F. Perronnin, "Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-scale Image Retrieval", IEEE TPAMI, 2012

    BGM[6]: T. Trzcinski and M. Christoudias and P. Fua and V. Lepetit, "Learning image descriptors with the boosting-trick", NIPS, 2012

    BinBoost[7]: STrzcinski T. , Christoudias M. , Lepetit V. and Pascal Fua, "Boosting Binary Keypoint Descriptors", CVPR, 2013

    RFD[8]: Fan, B. and Kong, Q.Q. and Trzcinski, T. and Wang, Z. H. and Pan, C.H. and Fua, P., "Receptive Fields selectioni for binary Feature Description", TIP, 2014

    SFIT[9]: Lowe, David G., "Distinctive Image Features from Scale-Invariant Keypoints", IJCV, 2004

    Brown et al.[10]: H. G. Brown, M. and W. S., "Discriminant embedding for local image descriptors", IEEE TPAMI, 2010

    Simonyan et al.[11]: Simonyan, K. and Vedaldi, A. and Zisserman, A., "Learning Local Feature Descriptors Using Convex Optimisation", IEEE TPAMI, 2013

    SIFT-KSH[12]: W.Liu, J.Wang, R.Ji, Y.G. Jiang, and S.F. Chang, "Supervised hashing with kernels", CVPR, 2012

    Contact

    Yongqiang Gao,  yq.gao@siat.ac.cn