2010 International Conference on Electrical Engineering and Automatic Control (ICEEAC2010)
Product-image classification with support vector machine
Shijie Jia 1,2 Jianying Zha
o 1 Yanping Yang 1 Nan Xiao 1
1
School of Electronic and Information Dalian Jiaotong University Dalian, China Email : jsj@djtu.edu.cn
2
Faculty of Electronic Information & Electrical Engineering Dalian University of Technology Dalian, China this histogram and the trained model, SVM does predict the class labels to the test image. A. Feature Extraction We adopt pyramid histogram of words (PHOW) [7] (proposed by Bosch and based on bag of words) as image descriptor. PHOW describes each image as a series of visual word histograms. The traditional bag of words process is as follows: (1) Automatically detect regions/points of interest (local patches). (2) Compute local descriptors (such as sift) over these regions/points. (3) Form the visual vocabulary through k-means clustering of these descriptors. (4) Calculate the number of the images containing visual words in order to form the keyword histograms. In addition, PHOW takes into account the spatial position characteristics of images and forms visual keyword histograms from low resolution to high resolution in the feature space.
Abstract— SVMs with kernel have been established with good generalization capabilities. This paper proposed a supervised product-image classification method based on SVM and Pyramid Histogram of words(PHOW). We tested several kernel functions on PI100 (Microsoft product-image dataset), such as linear, Radial Basis, Chi-square, histogram intersection and spatial pyramid kernel. Experimental results showed the effectiveness of our algorithm. Keywords：support vector machines; pyramid histogram of words； kernel function
I.
INTRODUCTION
E-commerce is becoming more and more popular. Now, for most e-commerce sites, the methods of product search are based on text, tagging the online products to facilitate the customers to search. However, it is difficult and timeconsuming to tag all the potential interest information for the increasing large-scale product. How to complete the online automatic classification of products according to the content feature of images is a great need for the current field of e-commerce. The purpose of content-based image classification is to predict the unknown object type according to the image features. There are many commonly used methods, such as K nearest neighbor, Bayesian, boosting and support vector machine (SVM). Kernel based SVM is developed on the basis of statistical learning theory and has obvious advantages particularly for small sample, nonlinear and high dimensional pattern recognition problem. Now, SVM method is broadly used for natural images [1-4], medical images [5] and remote sensing images [6] classification. In this paper, we adopted SVM as classifier to implement product-image classification. II. OUR METHOD The process of product-image classification is shown in Figure 1. The classification process is divided into training and testing part. In the training part,features of all the images in the training dataset are extracted and clustered into bag of words, and then the histograms of visual words are formed for each image. The histograms and their corresponding class labels are trained with SVM to form a particular model. For testing, features of the test image are extracted to form the visual words histogram. Combining
Figure 1 the process schematic diagram of product-image classification
978-1-4244-8109-5 /10/$26.00 ?2010 IEEE -
V3-61
2010 International Conference on Electrical Engineering and Automatic Control (ICEEAC2010)
B. Kernel based SVM Classifier 1) Linear SVM SVM develops from the optimal classification surface in linear separable case, and its basic idea can be illustrated at figure 2 with two-dimensional situation. The solid points and hollow points represent two types of sample, respectively. H is the classification line and H1, H2 are lines which are parallel to H and pass through the samples, and the distance between them is called the margin . The socalled optimal separating line not only correctly separates the two classes, but also maximizes the margin. The equation of H is (w·x) +b=0, which is able to correctly distinguish between two types of samples. The optimization problem of maximizing margin can be expressed as the minimum value solved by (1) ? (w ) = 1 w 2 = 1 (w ? w ) 2 2 on the constrained conditions of
d
(X
i
,X
j
)=
M ?1 m=0
2 ∑ ( h X [ m ] ? o [ m ])
i
o [m ]
(5) (6) (7)
o ?m ? = ? ?
hX ?m ? + hX ?m ? i? ? j? ? 2
N
b) Histogram intersection kernel function
d
( X ,Y ) = ∑
c)
N is the dimension of X and Y. Spatial pyramid kernel function K ( X , Y ) = exp d d
3
i =1
m in ( X i , Y i )
(
)
(8) (9)
d ( X , Y ) = ∑ α L d L ( X L , YL )
L =1
L is the pyramid level. III.
EXPERIMENTAL RESULTS AND ANALYSIS
yi [ ( w ? xi ) + b ] ? 1 ≥ 0
i= (1… n).
(2)
Training samples of satisfying (2) are called support vectors correspond to those which are nearest to and parallel to the optimal classification surface H. Getting the optimal classification function after the optimization problem is solved: (3) ? n ? ? * * ?
f
We adopted PI100 [9] from Microsoft as our test set. The images in PI100 were mainly collected from MSN shopping web site. The image dataset contained ten thousand 100*100 JPEG images and was divided into 100 classes by the product type of its main object. Figure 3 illustrates some samples of PI100.
(x ) =
sg n ? ∑ ai yi (xi ? xi )+ b ?
? ?
i=1
?. ?
Figure 3 samples of PI100 data set Figure 2 the optimal classifier line in linear separable case
Non-linear SVM and kernel Functions Non-linear SVM maps the features of the data into a new feature space by the kernel function in such a way that the sought relations can be represented in a linear form. The common kernel functions are mainly four types: linear, polynomial, Radial Basis, Sigmoid kernel function [8]. Some pre-computed kernels common used in image classification are as follows a) Chi-square kernel function
K
2)
( X , X ) = exp ( d d )
i j
(4)
Where d is the chi-square distance, d is the average square distance.
We tested on PI100 using LIBSVM package [10]. Figure 4 illustrated the classification accuracy variation with the number of training images per category under RBF kernel and chi-square kernel. We randomly selected 5, 15, 25, 35 and 45 images per category as training samples respectively, and then selected other 5 images per category (i.e., a total of 500 images) as test samples. Figure 1 indicated that the more training samples, the better accuracy. However the accuracy added little as the training number reach 35. Compared to RBF kernel, chi-square kernel achieved 4~15% higher accuracy. Table 2 showed the classification accuracies vary with each kernel type. The result indicated that spatial pyramid kernel performed best, followed by histogram intersection kernel and chi-square kernel, linear kernel and Radial Basis kernel perform worst.
V3-62
2010 International Conference on Electrical Engineering and Automatic Control (ICEEAC2010)
100 90 80 the classification accuracy(%) 70 60 50 40 30 20 10 0 5 10 chi-square kernel RBF kernel 15 20 25 30 35 the number of images in the training samples 40 45
prosperous direction of further research. On the other hand, it is a great need to construct more effective kernel functions. ACKNOWLEDGMENT This work was sponsored by the Project of National innovation fund for technology based firms (No. 09c26222123243) REFERENCES
[1] Li F F and Perona P. A Bayesian hierarchical model for learning natural scene categories[C]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) , San Diego, CA, USA ,2005 ,vol 2, 524-531 [2] Grin G, Holub A and Perona P. The caltech-256.Technical report, Caltech, 2007. [3] Lazebnik S, Schmid C and Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories[C]. Proceedings of the IEEE Computer Society Conference of Computer Vision and Pattern Recognition(CVPR'06), New York, USA, 17-22 June 2006, vol 2, 2169-2178. [4] Fu Yan, Wang Yao-wei, Wang wei-qiang, Gao wen. Content-Based Natural Image Classification and Retrieval Using SVM[J]. chinese journal of computers,2003,26(10). [5] Sun Lei, Geng Guohua, Zhou Mingquan, Li Bingchun. Algorithm research of support vector machine for medical image classification[J].Computer Applications and software,2004,21(11). [6] Hui Wen-hua.TM Image Classification Based on Support Vector Machine[J].Journal of Earth Sciences and Environment,2006,28(2). [7] Anna Bosch Rue. Image classification for large number of object categories[D]. Department of Electronics, Informatics and Automation, University of Girona, 2007. [8] Steve R.Gunn. Support Vector Machines for Classification and Regression. Technical report, School of Electronics and Computer Science, Faculty of Engineering and Applied Science and Department of Electronics and Computer Science,1998:20~22 [9] http://research.microsoft.com/en-us/default.aspx. [10] LIBSVM -- A Library for Support Vector Machines http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Figure 4 the classification accuracy variation with the number of training images per category under RBF kernel and chi-square kernel TABLE I. THE CLASSIFICATION ACCURACY OF DIFFERENT KERNEL TYPES Kernel functions Classification accuracy Linear kernel(C=1500) 64% 62.4% RBF kernel(C=1500，g=0.07) Chi-square kernel 71.2% Histogram intersection kernel 74.2% Spatial pyramid kernel 78.6%
IV.
CONCLUSIONS
We adopt PHOW descriptors combined with SVM to implement product–image classification. Experimental results showed the effectiveness of our algorithm. However, as an appearance descriptor, PHOW is not universal. Combing PHOW with other complimentary descriptor is a
V3-63