当前位置:首页 >> >>

Face recognition from a single image per person A survey

Face Recognition from a Single Image per Person: A Survey
Xiaoyang Tan 1,2 Songcan Chen 1,3,* Zhi-Hua Zhou 2 Fuyan Zhang 2
1

Department of Computer Science and Engineering

Nanjing University of Aeronautics & Astronautics, Nanjing 210016, China
2

National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China
3

State Key Laboratory of Pattern Recognition

Institution of Automation, Chinese Academy of Sciences, Beijing 100080, China

Abstract One of the main challenges faced by the current face recognition techniques lies in the difficulties of collecting
samples. Fewer samples per person mean less laborious effort for collecting them, lower costs for storing and processing them. Unfortunately, many reported face recognition techniques rely heavily on the size and representative of training set, and most of them will suffer serious performance drop or even fail to work if only one training sample per person is available to the systems. This situation is called “one sample per person” problem: given a stored database of faces, the goal is to identify a person from the database later in time in any different and unpredictable poses, lighting, etc from just one image. Such a task is very challenging for most current algorithms due to the extremely limited representative of training sample. Numerous techniques have been developed to attack this problem, and the purpose of this paper is to categorize and evaluate these algorithms. The prominent algorithms are described and critically analyzed. Relevant issues such as data collection, the influence of the small sample size, and system evaluation are discussed, and several promising directions for future research are also proposed in this paper.

Keywords: Face recognition; Single training image per person;

1. Introduction As one of the few biometric methods that possess the merits of both high accuracy and low intrusiveness, Face Recognition Technology (FRT) has a variety of potential applications in information security, law enforcement and surveillance, smart cards, access control, among others [1-3]. For this reason, FRT has received significantly increased attention from both the academic and industrial communities during the past twenty years. Several authors have recently surveyed and evaluated the current FRTs from different aspects. For example, Smal et al. [4] and Valentin et al. [5] surveyed the feature-based and the neuralnetwork-based techniques, respectively, Yang et al. reviewed face detection techniques [6], Pantic and Rothkrantz [7] surveyed the automatic facial expression analysis, Daugman [3] pointed out several critical issues involved in an effective face recognition system, while the most recent and comprehensive survey is possibly from that of Zhao et al [1], where many of the latest techniques are reviewed. The aim of face recognition is to identify or verify one or more persons from still images or video im*

Corresponding author: Tel: +86-25-8489-2805; fax: +86-25-8489-3777. E-mail: s.chen@nuaa.edu.cn (S. Chen),zhouzh@nju.edu.cn(Z.-H.Zhou),x.tan@nuaa.edu.cn(X.Tan), fyzhang@nju.edu.cn(F.Y.Zhang) 1

ages of a scene using a stored database of faces. Many research efforts [1] have been focused on how to improve the accuracy of a recognition system, however, it seems that most of them ignore the potential problem that may stem from the face database at hand, where there may be only one sample image per person stored, possibly due to the difficulties of collecting samples or the limitations of storage capability of the systems, etc. Under this condition, most of the traditional methods such as eigenface [9-10] and fisherface [12-15] will suffer serious performance drop or even fail to work (see more details in section 2). This problem, called the one sample per person problem (or, one sample problem for short), is defined as follows. Given a stored database of faces with only one image per person, the goal is to identify a person from the database later in time in any different and unpredictable poses, lighting, etc from the individual image. Due to its challenge and significance for real-world applications, this problem has rapidly emerged as an active research sub-area of FRT in recent years, and many ad hoc techniques have been developed to attack this problem, such as synthesizing virtual samples [33-35], localizing the single training image [47,48], probabilistic matching [30-32] and neural network methods [42,43]. The significant contribution of this paper is to give a comprehensive and critical survey of those ad hoc methods that recognize face from one individual image. We believe this work would be a useful complement to [1]-[7], where most of the techniques surveyed do not consider the one sample problem. Indeed, through a more focused and detailed study of the techniques addressing this problem, we hope this survey can provide more insights into the underlying principles, interrelations, advantages, limitations, and design tradeoffs of these techniques. Relevant issues such as data collection, the influence of the small sample size, and system evaluation are also discussed. In the following section we first try to establish common ground regarding what the one sample per person problem is and why and when this problem should be considered. Specifically, we also discuss what the problem is not. Then in section 3 we continue to review state of the art techniques addressing this problem. From this, we hope some useful lessons can be learned to help address this problem more effectively. In section 4, we discuss a few issues concerning performance evaluation. Finally, we conclude this paper with a discussion of several promising directions for the one sample problem in section 5.

2. The One Sample per Person Problem
In this section, we discuss what the problem of one sample per person indeed is. At first, we give some background relating directly to the generation of the one sample problem. Then we describe how the problem influences the existing FR algorithms and some challenges it arises to the face recognition algorithm designs. At last, we discuss why and when we should consider this problem. 2.1 Background The origin of the one sample problem can be traced back to the early period when the geometric-based methods were popular, where various configural features such as the distance between two eyes are manually extracted from the single face image and stored as templates for later recognition [8]. One image per person is not a problem at all for these methods. However, in some application scenarios where a large amount of face images are available (e.g., in law
2

enforcement), one may need more intelligent and less laborious way to process faces. This directly leads to the birth of the so-called appearance-based techniques. Armed with modern intelligent tools from diverse disciplines such as computer vision, pattern recognition, machine learning and neural network, appearance-based techniques circumvent the laborious procedure of geometrical feature extraction with a vectorlized representation of face image, and greatly improve the effectiveness and efficiency of face recognition systems. Consequently, these methods have become one of the dominant techniques in the field of face recognition since the 1990s. However, one of the key components of appearance-based methods is their learning mechanism, whose performance is heavily affected by the number of training samples for each face [11]. Most of current FR techniques assume that several (at least two) samples of the same person are always available for training. Unfortunately, in many real-world applications, the number of training samples we actually have is by far smaller than that we supposedly have. More specifically, in many application scenarios, especially in large-scale identification applications, such as law enforcement, driver license or passport card identification, there is usually only one training sample per person in the database. In addition, we seldom have opportunity to add more samples of the same person to the underlying database, because collecting samples is costly and even we can do so, there is the question of how many samples to add and in what way. Those situations have been less studied in the field so far. Therefore, it makes sense to distinguish the face recognition techniques using only one training sample per person between those using multiple (at least two) training samples of the same person. In this paper, these two categories of face recognition problems are named the one sample problem and the multiple samples problem, respectively. At the first sight, the difference between them seems to lie in how many number of training images they possess for each person. In this sense, the one sample problem appears to be a special case of the multiple samples problem. Is that true? Can algorithms handling multiple samples problem be simply used to deal with the one sample problem as well? We will discuss these questions in the following section. 2.2 The Challenges of One Sample Problem In this section, we discuss the influence and challenges brought by the one sample per person problem. Broadly speaking, one sample problem is directly related to the small sample size problem in statistics and pattern recognition (see [11, 107,108] for a general discussion on this topic). As mentioned before, the basis of appearance-based methods is their learning mechanisms, while the classic families of learning mechanisms (or classifiers) basically need sufficiently large training set for a good generation performance [109], partly due to the high-dimensional representation of face images (recall that in the appearancebased domain, face images are vectorlized directly with the gray value of each image pixel). For example [30], for a 100x100 face image being vectorlized into a 10000 dimensions feature space, theoretically the number of training images for each person should be at least ten times that of the dimensionality [11], that is, 100,000 images in total per person. Intuitively, it is hardly conceivable that we human being need so many photos of a person in order to develop a good model of his appearance. To address this problem, dimensionality reduction techniques can be employed. One of the most suc3

cessful techniques used in face recognition is Principal-Component Analysis (PCA).The method based on PCA technique is named eigenface in literatures [9-10]. Formally, each n dimensional face image x can be represented as a linearly weighted sum of a set of orthogonal basis ui (i=1…n):

x = ∑ in=1α i ui ≈ ∑ im 1α i ui ( typically m =

n ), by solving the eigenproblem CU=UΛ, where C is the

covariance matrix for the N training samples and can be rewritten as follows [16]:
1 C= N

∑ ( X ? μ )( X ? μ )
i i i =1

N

T

=

1 2N

l ( X i ) =l ( X j )



( X i ? X j )( X i ? X j )T +

1 2N

l ( X i )≠l ( X j )



( X i ? X j )( X i ? X j )T

(1)

CI + CE

That is, the total scatter matrix C equals the sum of intra-person scatter matrix CI and inter-person scatter matrix CE. Under the situation of only one training sample per person, CI=0, therefore, e.q.(1) reduces to CE. The eigenspace estimated using only CE is not reliable, however, because it cannot effectively capture the major identification difference among other transformation errors and random noise [16]. To illustrate how the performance of eigenface is influenced by different number of training samples per person, we take ORL dataset [17] as test bed. The ORL dataset contains images from 40 individuals, each providing 10 different images. See Fig.1 for 10 sample images of one person. In the experiment, we fix the testing face but vary the number of training faces for each person. More specifically, we use the last face image of each person (Fig.1) for testing, and randomly choose the first n images of each person (n<9) for training. The above procedure is repeated 20 times. Fig.2 shows the average top 1 recognition rate as a function of the number of training samples per person. We can see from Fig.2 that the performance of eigenface drops with the decreasing number of training samples for each person. In the extreme case, if only one training sample per person is used, the average recognition rate of eigenface falls to below 65%, a 30% drop from 95% when 9 training samples per person are given.

Fig.1 Some samples from one subject in ORL dataset.

4

Fig.2. The average top 1 recognition performance as a function of the number of training samples per person.

Based on the standard eigenface technique, researchers have developed various extended algorithms during the last decades, including probabilistic-based eigenface [18], Linear Discriminative Analysis (LDA) based subspace algorithms [12-15], Support Vector Machine (SVM) based method [19], feature line method [20], Evolution pursuit [21], and Laplacianfaces [22], et al. All of these approaches claimed to be superior to eigenface. However, it may not be the case if only one training sample per person is available, due to the fact that most of them will either reduce to the basic eigenface approach or simply fail to work in that case. Detailed explanations are given below: 1) The goal of most LDA-based subspace algorithms is to find the most discriminative projection directions in eigenspace, by maximizing the ratio of inter-person variations to intra-person variations. However, LDA-based method works well only when a lot of representative training samples per person are given, otherwise, its performance may be even worse than eigenface [23]. When there is only one training sample per person stored in the database, LDA-based subspace methods fail to work because the soneeded intra-person variation cannot be obtained. As a remedial measure, [15] proposed to replace the intra-person scatter matrix using a constant matrix and by doing so, the LDA-based method actually reduces to eigenface method. 2) Probabilistic-based [18] turns face recognition problem into a two-class problem by estimating the probability of the difference between a test image and a prototype image belongs to intra-person variation or inter-person-variation. Based on the same reason given above, the distribution of intra-person can not be estimated in the situation of one sample per person and thus the method also reduces to eigenface. 3) Both Evolution pursuit [21] and more recent Laplacianfaces [22] depend on large number of training samples per person to reliably estimate the so-needed low-dimensional manifold. In the situation of one sample per person, both methods also reduce to their starting points, i.e. eigenface.
5

4) SVM-based method [19] and feature line method [20] are actually the classification method in eigenspace. If only one sample per person is given, neither of them works. In summary, most of state of the art face recognition methods will suffer a lot from the one sample per person problem, and some of them even fail to work. In other words, this problem has indeed become a blind area for most current face recognition techniques. However, only may ask the question: is investigating this problem really worthwhile? 2.3 The Significance of the One Sample Problem We have shown that the performance of most face recognition algorithms can be seriously influenced by the limited number of training sample per person. However, one might still question whether it deserves further investigation. In this section, we will discuss these problems from two aspects. On one hand, as mentioned above, the extreme case of one sample per person really commonly happens in real scenarios, and therefore, this problem needs be carefully addressed so as to make face recognition techniques applicable in those situations. On the other hand, despite likely bad news for most FR techniques, storing only one sample per person in the database has several advantages, which are desired by most real world applications: 1) Easy to collect samples, either directly or indirectly: One common component of face recognition systems is the face database, where the “template” face images are stored. Construction of such a face database is a very laborious and time-consuming work. This problem can now be effectively alleviated if only one image per person is needed to be sampled. For the same reason, the deployment of face recognition system would also become much easier. Furthermore, in those application scenarios where direct image sampling is very difficult (if not impossible), one sample per person has its distinctive prevalence. Consider an application in surveillance of public place such as airports and train stations, where a large number of people need to be identified. In this case, we can construct the needed face database efficiently by scanning photographs attached on most certificates such as passports, identification cards, student ID, driver license ID and so on, rather than really taking photos for each people. 2) Save storage cost: The storage costs of face recognition system will be reduced when only one image per person is needed to be stored in the database. 3) Save computational cost: The computational expense for large-scale applications could be significantly reduced, because the number of training samples per person has direct effect on the costs of operations involved in face recognition, such as preprocessing, feature extraction and recognition. In summary, the above observations reveal that one sample problem is unavoidable in real world scenarios and it has equally impressive advantages. In addition, developing a clear insight into this particular problem will have broader-ranging implications not only for face recognition but for solving the more general small sample problems as well. Therefore, recognition from one sample per person is an important problem for both practice and research. It provides new challenges and new opportunity to the face recognition community. By addressing this problem, the application areas of FRT could be much extended and the underlying techniques could also be enhanced. Meanwhile, it should be noted the essence of one sample problem is not a problem concerning how
6

many number of training samples each person has, but that concerning how to improve robustness performance against different variations under this extremely small sample size condition. Fortunately, this problem has drawn attention from more and more researchers, and numerous techniques have been developed to deal with it. We will categorize and review these techniques in the next section.

3. Recognizing From One Sample per Person In this section, we review existing methods dealing with robust face recognition from a single intensity image. We have broadly classified these methods into three categories, according to the type of features used by various methods; some methods clearly overlap category boundaries and are discussed at the end of this section. 1. Holistic methods. These methods identify a face using the whole face image as input. The main challenge faced by these methods is how to address the extremely small sample problem. 2. Local methods. These methods use the local facial features for recognition. Care should be taken when deciding how to incorporate global configurational information into local face model. 3. Hybrid methods. These methods use both the local and holistic features to recognize a face. These methods have the potential to offer better performance than individual holistic or local methods, since more comprehensive information could be utilized. Table 1 summarizes algorithms and representative works for face recognition from a single image. Below, we discuss the motivation and general approach of each category first, and then, we give the review of each method, discussing its advantages and disadvantages.

7

Table 1 Categorization of Methods for Face Recognition from a Single Image
Approach Holistic Methods Extensions of Principal-Component analysis (PCA) -(PC)2A -2DPCA -Noise model -Discriminant eigenface Enlarge the size of training set - Construct new representations - Generate novel views Local Methods Local feature-based - DCP - Feature graph Local appearance-based - Local probabilistic subspace - Neural network method - Hidden Markov Model - Subpattern–based FLDA - Analytic-to-holistic approach - Local binary model - Fractal features Hybrid methods - virtual samples + local features Local probabilistic subspace method [30] Local probabilistic subspace method [30] SOM learning based recognition [42,43] HMM method [44] Modified LDA method [47,48] Hybrid local features [49] Face recognition with local binary patterns [45] Fractal-based face recognition [46] Use directional corner points (DCP) features for recognition [37] Graph matching methods [38-41] ROCA[36], Imprecisely location method [30], E(PC)2A [25] View synthesis using prior class-specific information [33-35] Enrich face image with its projections [24] Two dimensional PCA [27] Use noise model to synthesize new face[28] Select discriminant eigenfaces for face recognition[29] Representative Works

3.1

Holistic Methods In these methods, each face image is represented as a single high dimensional vector by concatenating

the gray values of all pixels in the face. The advantages of this representation are two folds. First, it implicitly preserves all the detailed texture and shape information that are useful for distinguishing faces. Second, it can capture more global aspects of faces than local feature-based descriptions [50]. On the other hand, there are two unfavorable consequences of this representation under the condition of one sample per person: 1) it makes the dilemma between high dimensions of image data and small samples more serious; 2) since only one vector exists for each class, the within-class variation needed by many PR techniques can not be estimated directly any more. Accordingly, these problems can be addressed by roughly two ways under the holistic representation framework. The first way is trying to squeeze as much information as possible from the single face image, either in the high dimensional face space or more commonly, in the dimensionality-reduced eigenspace, as an extension of the standard PCA technique. As the second way, one can incorporate prior information by constructing novel views or different representations for each image, so that the actual training set can be effectively enlarged. 3.1.1 Extensions of Principal-Component analysis (PCA) As illustrated above, one generally cannot expect to get good generalization performance from the standard eigenface technique if only one sample per person is given. However, it is possible to use this method
8

for any sample size based on the viewpoint of computation. Therefore, extending this method for higher robustness performance becomes a natural choice. Wu and Zhou presented a method named (PC)2A to enrich the information of face space [24]. Their work was motivated in part by the projection method from the face detection field [51]. Let I(x,y) be the intensity value of an m×n image at position (x,y), the horizontal and vertical projections of the image are defined as HI ( x) = ∑ y =1 I ( x, y ) and VI ( y ) = ∑ x =1 I ( x, y ) , respectively. As can be seen from Fig.3a, these
n
m

two projections reflect the distribution of the salient facial features that are useful for face recognition. Then, the obtained projections are used to synthesize a new image (Fig.3b), defined as
M p ( x, y ) = VI ( x) HI ( y ) , where J is the average intensity of the image. This new projection image is then J

combined with the original face image to complete the information-enriching procedure (see Fig.3c). As a result, unimportant features for face recognition are faded out and the important features become more salient after the preprocessing. After that, the traditional eigenface technique is used for face recognition. Their method had been tested on a subset of FERET database [52] with one training image for each person, and they reported that this method achieves 3%-5% higher accuracy than the standard eigenface technique through using 10%-15% fewer eigenfaces. The (PC)2A method was extended by Chen et.al. later with a method named E(PC)2A (Enhanced (PC)2A [25]). Their idea is to generalize and further enhance (PC)2A based on n-order images, defined as I ( x, y ) n . A second-ordered projected image is shown in Fig.3d. It is reported that the enhanced versions of (PC)2A is more effective and more efficient than its counterpart. The essence of both (PC)2A and E(PC)2A is trying to enrich the information of eigenspace by perturbing spatially the single training sample. Another similar method is the SVD (Single Vector Decomposition) perturbation introduced by Zhang et al. [26]. According to SVD theorem [54, 69], an intensity face image I can be written as I = USV T , where U T U = V T V = E (U and V are both orthogonal matrixes, and E is identity matrix), S is a diagonal matrix consisting of singular values of I. Then the perturbed image can be defined as Iα = US α V T or I β = U ( S + β E )V T , where α、β are perturbing factors. After that, the obtained derived image can be combined with the original image for later recognition.

(a)
2

(b)

(c)

(d)

Fig.3 Some sample images in (PC) A method. (a) original face image and its horizontal and vertical profiles. (b) first-ordered projection map (c) first-ordered projection-combined image (d) second-ordered combined image

In traditional eigefance technique, how to reliably estimate the covariance matrix under the small sample size condition remains unsolved. A potential solution known as 2DPCA (two-dimensional PCA) has
9

recently been proposed by Yang et al. [27]. This method uses straightforward 2D image matrices rather than 1D vectors for covariance matrix estimation, thus claimed to be more computationally cheap and more suitable for small sample size problem. More specifically, let each face image Aj (j=1…N, N is the number of training samples) be a m × n random matrix, and the average image of all training samples be A , then the image covariance (scatter) matrix Gt can be evaluated according to Gt =
1 N

∑(A
j =1

N

j

? A)T ( Aj ? A) .

By maximizing the criterion function: J ( X ) = X T Gt X , one can obtain a set of optimal projection axis { X 1 , X 2 ,..., X d } , which are then used for feature extraction. The effectiveness and robustness of 2DPCA has been demonstrated on several well-known face image databases. The methods reviewed above actually handle the one sample problem in an indirect way, that is, the variations of expression, illumination or pose are not explicitly addressed. Therefore, their robustness performance is somehow predictable. While in real world scenarios, various certificate photographs are usually corrupted by various scratches, blur or discoloration. Jung H.C et al. [28] developed an authenticating system to handle these problems with one training image per person. The basic idea is trying to synthesize multiple new face images which imitate the corrupted images for recognition. The imitation is done by a noise model with three noise parameters controlling the degree of contrast, brightness and Gaussian blur, respectively. The synthesizing procedure is shown in Fig.4, where, by changing the values of noise parameters, several corrupted images corresponding to one sample are imitated, which essentially improves the representative of the given sample. The authors scanned 137 face images from identification cards with 300dpi to test their method. Experimental results show that the error rate is only 1.32%, indicating that the method can significantly improve the similarity between the corrupted images and the training images.

Fig.4 Synthesizing new samples with the noise model [53]

The fisherface approach [12] can be regarded as an extension of eigenface as well, in that it tries to find the most discriminative subspace in the eigensapce. However, this method fails when each person just has only one training face sample because of nonexistence of its within-class scatter. Wang et al. [29] presented a method to solve this problem by incorporating prior information of the within-class scatter from other people, based on the assumption that human being exhibits similar intraclass variation. For that purpose, a generic training set with multiple samples per person are collected and used to provide the so10

needed intra-person variation. Once the intra-person variation has been obtained, a method similar to fisher criterion is employed to select the most discriminative eigenvectors for recognition. This method was tested on a subset of FERET database with 256 subjects and a 10% performance margin over traditional eigenface method was obtained. The authors also investigated the influence of the generic training set on recognition performance and suggested that a larger sample size would be preferable. However, generic set can provide a useful or harmful bias during the learning of a new task, thus the problem of how to collect suitable generic set so as to obtain the optimal performance should be carefully considered. It is worthy mentioning that the idea of incorporating prior information is widely used to enlarge the actual training set when needed, which will be discussed in the following section. 3.1.2 Enlarge the Training Set Another main stream to solve the one sample problem using holistic features is to enlarge the size of training set. This can be done by synthesizing virtual samples. Here virtual samples include both new representations for the single training sample and new visual samples that are not previously existed in the database. It follows then that for virtual samples generation, one can choose to either construct new representations or create novel visual images, based on the single training image. As we will see later, the fundamental difference between the two methods lies in whether prior information is used: the representation construction method focuses on mining more information from the face image at hand, while visual sample generation method concentrates on learning extra information from the domain, besides the given training set. Note that, however, the boundary between the two methods is not so crisp, since novel visual image can also be thought of as a new representation of the given face. Both the two types of methods are discussed in more detail in the subsection below. 3.1.2.1 Construct new representations Representation is crucial for the success of a recognition system. Different representations are robust against different types of noises and appearance changes. However, it is generally difficult to choose the optimal representation that maximizes recognition performance. Instead, one can combine various representations of the same image so as to exploit their specific merit. Motivated by this, Torre et al. [36] presented a method named Representational Oriented Component Analysis (ROCA), basing on one sample per person. In particular, each gallery image is firstly preprocessed to mitigate the effects of light direction changes, and then, several linear and non-linear filters are applied on each image to produce its 150 representations in total. Next, an OCA classifier is built on each representation. Finally, all the OCA classifiers are combined with a weighted linear sum method to give the final decision. Several numerical methods have been introduced to improve the OCA classifier’s generalization [36]. Experimental results on 760 images from the FRGC v1.0 dataset (http://www.beebiometrics.org) show that an over 20% performance improvement over the best individual classifier can be achieved. Image perturbation is another convenient way to construct new representations (samples) for each training image. Martinez introduced a perturbation-based approach for generating new samples [30, 31], which is actually a byproduct when handling the imprecisely location problem in face image preprocessing. More specifically, given a face image x, an error range can be set for horizontal and vertical localization
11

error, respectively. Each time, by changing the horizontal or vertical coordinate value (i.e., perturb spatially the given image), a new sample is generated, accounting for the localization error at that position. As a result, a large number of new samples are created for each person. Subsequently, the standard eigenspace technique is used for feature extraction and recognition. Experimental results on a test set of 200 images with 50 persons show that this method is superior to the classical PCA approach. Other algorithms belonging to this class include E(PC)2A and SVD perturbation method described above. Theoretically, one can generate any number of imitated face images for each face by perturbing its n-ordered images. Consequently, the problem of face recognition with one training image per person becomes a common face recognition problem. In addition, these methods can potentially be used to counter the curse of dimensionality, due to the large amount of additional virtual samples. However, as pointed out by Martinez [30], one drawback of these methods that cannot be ignored is that these generated virtual images may be highly correlated and therefore the new samples should not be considered as independent training images. 3.1.2.2 Generate novel views Ideally, the really-needed virtual samples should be diverse and representative enough, i.e., they occupy different locations in the face space and represent specific variations of face images. One possible way to achieve the goal is to exploit prior information that can be learned from prototypical examples in the domain. As a simple way to create new samples, one can introduce some geometrical transformations on the original face images, such as rotation, scale transformation, and bilaterally symmetric transformation. Yang et al [55] described an approach known as symmetrical PCA: In the beginning, two virtual image sets, i.e., the even and odd symmetrical image sets are constructed, respectively, which are then input to a standard PCA algorithm for feature extraction. The different energy ratio of the obtained even and odd symmetrical principal components is employed as criterion for feature selection, due on their different sensitivities to pattern variations. On a large database of 1005 persons with single sample for each person, they reported a 90.0% recognition accuracy using only 20 eigenfaces. Another work in this direction is reported by Gutta et al [56], who used mirror image of one face to handle the half face occlusion problem. More complicated ways to create virtual samples with prior knowledge mainly aim to address the face recognition problem in uncontrolled conditions (e.g. in an outdoor environment), where pose and illumination may change significantly. We can reasonably expected that, given a training face image, synthesizing faithfully its multiple new images under different poses and different illumination conditions would greatly improve the generalization of a face recognizer. However, this is a very challenging task if only one training sample per person is given. One way to accomplish that task is by exploiting prior information about how face images transform, which can be learned through extensive experience with other faces. Poggio and Vetter considered a computational model known as linear classes in order to synthesize virtual views that can be used as additional samples [58]. Their idea is to learn class-specific image-plane transformations from examples of objects of the same class (which can be collected beforehand as generic training set), and then apply them to the real image of the new object for the purpose of virtual sample generation. In particular, A 3D face image can be represented by a vector X = ( x1 , y1 , z1 , x2 ,..., yn , zn )T ,
12

where n is number of feature points and xi , yi , zi is the corresponding x,y,z-coordinates of each point. Further, assume X ∈ R 3n is the linear combination of q 3D face images of other persons of the same dimensionality, such that X = ∑ i =1α i X i , where α i is the linear coefficient. The linear coefficients are then
q

learned for each face, and with the same coefficients, a rotated view of X can be generated by regarding it as a linear combination of the rotated views of the other objects. The above idea has been successfully extended to 2D image by Beymer at al.[33]. They presented a method called parallel deformation to generate novel views of a single face image under different poses. The transformation operator is still needed to be learned from other prototypical face images. Let the difference image of an original face X to the reference face be ΔX , and the difference images of other prototypical face X i to the same reference face be ΔX i . By applying the linear class assumption, we have:
ΔX = ∑ i =1α i ΔX i . The transformation coefficient α i can be obtained by minimized the criterion function
q

J (α ) = ΔX ? ∑ i =1α i (ΔX i ) . After the transformation has been learned, one can use it to generate novel
q

images for each single face. In the face recognition system implemented by Beymer et al., 14 virtual images are generated for each real training image, as shown in Fig.5. With both the real and virtual samples, their experiments achieved a 85% recognition rate using a test set of 620 images with 60 persons, compared to 32.1% with only real samples. These results clearly demonstrate the effectiveness of this method. Indeed, Niyogi et al. [35] have shown that incorporating prior knowledge is mathematically equivalent to introducing a regularizer in function learning, thus implicitly improving the generalization of the recognition system. Nevertheless, there are still several disadvantages of this method that needs to be mentioned. First, this method does not distinguish different transformation coefficients according to different transformations, thus one cannot generate virtual image of given pose at will. Second, since the number of virtual images needed to be generated depends on the number of modes of variation to be modeled, this approach has difficulty as the number of modes of variation grows large. Third, to enable texture rendering for novel image, one needs to know the correspondence between the real image and the referenced image. Although such correspondence can be learned using the so-called optical flow technique [59], this technique may fail to work under the condition of partial occlusions or too large variation. A possible solution to this problem that reconstructs partially damaged face images from other faces was presented in [60].

13

Fig.5. A real view (center) surrounded by virtual views generated using the parallel deformation method

Besides novel images compensating for pose variations, illumination changed images can also be synthesizing for robust face recognition using such methods as active shape model (ASM,[63]) and illumination cones method [64-66]. However, these methods generally need more than one samples of the same person for training, thus are beyond the scope of this paper. Readers are referred to the excellent survey of Zhao et al [1] for detailed description about this topic. In addition, the generated virtual samples can be used to construct class-specific face subspaces, as done in [61] and [62]. 3.1.3 Discussion Although much success has been achieved by holistic methods, only one single feature vector is used to represent each face image. Such a representation is known to be sensitive to large appearance changes due to expression, illumination, pose and partial occlusion, especially when Euclidean structure on the face space is assumed, as in most cases. One possible way to handle this problem is to adopt a more flexible non-metric similarity measure for image matching, such as l p distance (or p-distance), with 0<p<1, as suggested by [67]. It has been found that non-metric distance measures are less affected by extreme differences than Euclidean distance, thus being more robust to outliers [68]. Another way to attack this problem is to use local facial representations, due to the observation that local features are generally not as sensitive as global features to appearance changes. In the following section, we turn to local methods. 3.2 Local Methods

Local methods which use local facial features for face recognition are a relatively mature approach in the field with a long history [71, 80-82, 6, 8, 38-42]. Compared with holistic methods, local methods may be more suitable for handling the one sample problem due to the following observations: Firstly, in local methods, the original face is represented by a set of low dimensional local feature vectors, rather than one single full high dimensional vector, thus the “curse of dimensionality” can be alleviated from the beginning. Secondly, local methods provide additional flexibility to recognize a face based on its parts, thus the common and class-specific features can be easily identified [77]. Thirdly, different facial features can increase the diversity [76] of the classifiers, which is helpful for face identification. Despite those advantages, incorporating global configurational information in faces is extremely critical for the performance of a local method. Generally, there are two ways for that purpose. First, the global information can be explicitly embedded into the algorithm using such data structure as graph, where each node represents a local feature, while the edge connecting two nodes accounts for the spatial relationship between them. Face recognition are then formulated as a problem of graph matching. Alternative approach to incorporating global information is by classifier combination technique [70]. In particular, separate classifier is employed on each local feature to calculate a similarity score, and then all the similarity scores of different features can be integrated to obtain a global score for final decision. Similar to the taxonomy that successfully used in [1], we classify local methods into two categories, i.e., the local feature-based method and the local appearance-based method. The former detects local features first, and then extracts features on the located feature points. The latter simply partitions the face image
14

into sub-regions, based on which local features are directly extracted. 3.2.1 Local feature-based methods Most of earlier face recognition methods [71, 80-82] belong to this category. In these methods, usually only a single image per person is used to extract geometrical measures such as the width of the head, the distances between the eyes, and so on. The extracted features are then stored in the database as templates for later matching usage. In early 1990s, Brunelli and Poggio described a face recognition system, which can automatically extract 35 geometrical features to form a 35-dimensional vector for face representation, and the similarity matching is performed with a Bayes classifier [8]. A good recognition rate of 90% on a database of 47 subjects was reported. The storage cost of such systems is very low compared to appearance-based method. However, these methods are usually criticized for two reasons: 1) geometrical-features are hard to be extracted in some complicated cases; 2) geometrical features alone is not enough to fully represent a face, while other useful information such as the gray level values of the image is totally lost. The above two problems of geometrical methods actually suggest two research directions. The first direction focuses on how to detect facial features more robustly and more accurately. This direction is subject to a lot of studies [6, 8, 72-75]. Brunelli and Poggio [8] presented a method that uses a set of templates to detect the eye position in a new image, by looking for the maximum absolute values of the normalized correlation coefficient of these templates at each point in the test image; Rowley et al. [72] endeavored to train several specific feature detectors corresponding to each facial part (e.g. eyes, nose, mouth and profile); Wu et al [73] introduced a method to automatically locate the region of eyeglasses, using an offline trained eye region detector; Lanitis et al.[74] and Jain et al. proposed to construct statistical model for face shape. Despites all the efforts, there is still a long way to go before the method becomes really mature. In the second direction, more powerful local feature representation methods rather than the purely geometrical ones are pursued. Manjunath et al. [38] proposed a method for facial feature detection and representation, based on Gabor wavelet decomposition [78] of the face. For each detected feature points, two kinds of information are stored, i.e., location information S and feature information q. Feature information contained in each feature point is defined as a vector, qi = [Qi ( x, y, θ1 ),..., Qi ( x, y, θ N )] , where N is the number of q’s predefined nearest neighbors, Qi ( x, y ,θ j ) representing the spatial and angular distance from the i-th feature points to its j-th nearest neighbor. To model the relationship among the feature points, a topological graph is constructed according to the following rule: two feature points in some spatial range with minimal distance will be connected with an edge. After the topological graph has been constructed, face recognition is formulated as a graph matching problem. In particular, the total similarity between two graphs is estimated by decomposing it into two parts: one focuses on the similarity of local features, the other on the global topology similarity. The effectiveness of this method was validated on a face dataset of 86 subjects, containing variations of expression and pose, and 86% and 94% recognition accuracies were reported in terms of the top one and top three candidate matches, respectively, showing good robust performance. However, one drawback of this method is that once the topology graph is constructed, no more modification is allowed. In fact, face images are easy to change due to different variations, thus a fixed
15

topology graph scheme is not adequate. Based on this observation, Lades et al. [39] proposed a deformable topology graph matching method, now known as Elastic Bunch Graph Matching (EBGM[40]). As in [38], a topology graph is constructed for each face first, with each node attached one or several Gabor jets. Each component of a jet is a filter response of a specific Gabor wavelet extracted at a predefined critical feature point. These locally estimated Gabor features are known robust against illumination change, distortion and scaling [78]. This is the first key factor in EBGM method. Another key point of this method lies in the graph matching, whose fist step is similar to that in [38], i.e., both the local and global similarities are considered. The novelty lies in the second step, where a deformable matching mechanism is employed, that is, each node of the template graph is allowed to vary its scale and position (thus named as brunch graph, see Fig.6) according to the appearance variations on a specific face. For the above two reasons, elastic matching method exhibits high robustness against appearance changes and has become one of the most successful algorithms in FERET competition [52] of 1996. A good comparative study on eigenface, neural work-based method and elastic matching method has recently been conducted by Zhang et al [79]. However, there are two obvious disadvantages of this method. First, it may require more computational effort than other method such as eigenface, thus being more difficult to implement in practice. Second, only information at key positions of the image (e.g., eyes, nose, moth, et al.) is used for recognition. Although this is a crucial factor that contributes to the robustness of the method, it is not clear how it can effectively handle the situation when the key positions are occluded.

(a)

(b)
matched to a face [40]

(c)

Fig.6. Illustration of brunch graph from (a) artistic point of view, (b) scientific point of view. (c) a brunch graph

Both of the above problems of EBGM have close connection with the graph-based representation of human face. Hopefully, relaxing the rigid graph constrains would help to alleviate these problems. Regarding each face as a bag of local features extracted from different feature points instead of a list of graph nodes may be an acceptable compromise between robustness and efficiency. Kepenekci et al. [41] presented an implementation of this idea, based on Gabor features. Instead of predefining fixed number of feature points in the given face image as EBGM, they used a set of Gabor filter matrix to scan local facial regions, and those feature points with higher response of Gabor filter are automatically chosen to be candidates for face representation. Since the resulting feature points are different face to face, the possibility of finding class-specific features is increased. For each feature point, besides the Gabor response values, its location is also recorded, thus implicitly considering the spatial structure of the face. This data structure makes the matching procedure very simple: just by summing the similarity scores of each pair of corre16

sponding local features first, and then assigning the label of the training image with largest total similarity score to the given test image. Experimental results on the ORL dataset show a 95.25% top 1 recognition accuracy, with only one image per person used for training. In the standard FERET test [52], the performance of this method is reported to be comparable to that of EBGM, while its computational complexity is much smaller than the latter. However, there exist some risks in its too flexible way for detecting feature points, due to possible nonexistence of corresponding features in a local area. Unlike the methods mentioned above, where local feature points are isolated and extra mechanism (e.g. topology graph) needs to be used to describe the neighboring relationship, Gao et al. [37] presented a method that directly integrates such a mechanism into the representation of local feature points, and a novel geometrical feature descriptor, named directional corner point (DCP), is obtained. More specifically, a DCP is a feature point companying with two parameters which provide the isolated point with additional structural information about the connectivity to their neighbors. As shown in Fig.7, where P is the DCP at hand, and two additional attributes i.e., δ1 , δ 2 , are used to model its local spatial relationship to its anterior neighboring corner point M and posterior neighboring corner point N, respectively. The DCP descriptor is expected to be both economical for storage and less sensitive to illumination changes. This claim was validated by the experimental results on the AR databases [105] with a satisfactory top 5 recognition of 96.43% for both the images with left light on and right light on, under the condition of only one single image per person was used as templates.

Fig.7 An illustration of DCPs [37]

In summary, local feature points based methods have been proven to be an effective method dealing with the one sample problem. However, their performance depends critically upon the accurate localization of feature points. Unfortunately, this is not a trivial task in practice, especially in some situation where the shape or appearance of a face image can be changed a lot. For example, the over-illumination would cause strong specular reflection on the face skin and thus making the shape information on faces suppressed or lost. Another example is the large expression changes such as “screaming”, which has effects on both the upper and the lower face appearance. These situations may cause much trouble to the local feature-based methods. The performance of the DCP method, for example, drops to 61.61% and 27.68% under the above two conditions, respectively. One possible method to circumvent this problem is to use local appearance-based methods, where accurate localization of feature points is generally not needed. 3.2.2 Local appearance-based methods
17

In this section, before reviewing local appearance-based methods, we give a high-level block diagram of these methods, for readers’ convenience. As shown in Fig.8, four steps are generally involved in these methods, i.e., local region partition, feature extraction, feature selection and classification. Details of these steps are given below.

Fig.8 Overall framework of local appearance-based methods

Step 1: Local regions partition At the beginning, local regions should be defined. This involved two factors, i.e., the shape and the size of the local regions. The simplest and most widely-used region shape is rectangular window [42, 43, 45, 47], as shown in Fig.9a. The windows can be either overlapped with each other [42, 44, 45] or not [42, 43, 47]. Besides rectangle, other shapes such as ellipse (Fig.9b) [30] and strip (Fig.9c) [83] can also be used. The size of local areas has direct influence on the number of local features and the robustness of the underlying method [43].

(a)

(b)

(c)

Fig.9 Typical local shapes used by the local appearance-based methods a) rectangular shape b) ellipse shape, c) strip shape

Step 2: Local feature extraction Once the local regions are defined, one has to decide how to represent the information of them. This is very critical for the performance of a recognition system. The commonly used features include gray-value features [30, 42, 43, 47, 48] and a variety of derived features, such as Gabor wavelet [38, 39, 40, 41], Harr wavelet [44], fractal features, and so on. It’s hard to give a “winner” feature descriptor that is suitable for all the applications. In general, gray-value feature is the simplest feature without loss of texture information, while Gabor features and other derived features are more robust against illumination change and some geometrical translations [38, 39]. Step 3: Feature selection
18

If lots of features are generated in the previous step, additional feature selection stage is usually needed for effectiveness and efficiency consideration. PCA [9] is a commonly used feature selection method guaranteeing minimal loss of information; LDA [47, 48] can be used for selecting the features with most discriminative power; some local statistics such as the degree of texture variation [84, 85] are also used for feature selection. Step 4: Classification The final step is face identification. Combining Classifiers is the most common way for that purpose. In particular, each component classifier is applied on one local feature, and then, the final decision is made by majority voting or linearly weighted summing. Note that the above four steps are not a must for each method. Some steps, such as the feature selection, may be cancelled or combined with other steps according to some specific situation. Now, we will give a detailed review of the local appearance-based methods. Martinez presented a local probabilistic approach to recognize partially occluded and expression variant face from a single sample per class [30]. As mentioned in previous section, lots of virtual samples accounting for localization errors are first generated using an image-perturbation method, and then, each face (including the generated face images) is divided into six ellipse-shaped local areas. Next, all the local patches at the same position of each face are grouped into a face subspace separately (thus 6 subspaces in total). For a more compact and efficient representation, each face subspace is further transformed into an eigenspace, where the distribution is estimated by means of a mixture model of Gaussians using the EM algorithm. In the identification stage, the test images are also divided into six local areas and are projected onto the above computed eigenspaces respectively. A probabilistic rather than a voting approach is used to measure the similarity of a given match. Experiments on a set of 2600 images show that the local probabilistic approach does not reduce accuracy even when 1/6 of the face is occluded (on the average). However, the mixture of Gaussians used in this method is parametric in nature, which heavily depends on the assumption that the underlying distribution can be faithfully represented with the given samples. Although lots of samples are synthetically generated as the way described above, the computational and storage costs along with the procedure of generating virtual samples may be very high (e.g. 6,615 samples per individual in [30]) when the face database is very large. Based on the above observation, Tan et al. [43] extended the local probabilistic approach by proposing an alternative way of representing the face subspace with Self-Organizing Maps (SOM, [86]). More specifically, each face image I is partitioned into M different local sub-blocks Ri
M i =1

first, then a SOM network

is trained using all the obtained sub-blocks from all the available training images irrespective of classes. After the SOM map has been trained, each sub-block Ri of the same face image I are mapped to its corresponding Best Matching Units (BMUs) by a nearest neighbor strategy, whose location in the 2D SOM topological space is denoted as a location vector li = {xi , yi } . All the location vectors from the same face can be grouped as a set, i.e., I = {li }iM1 = {xi , yi }iM1 , which is called the face’s “SOM-face” representation = = (Fig.10b). There are several advantages of this representation. Firstly, since possible faults like noise in the
19

original face image can be eliminated in the process of SOM training, this representation is robust to noise. Secondly, it is a compact way to represent face. Finally but most importantly, unlike other methods such as PCA, this representation is intuitively comprehensible in that each element of a SOM-face does have its real physical meaning through the weight vector preserved in the corresponding node of the SOM map, which can be further interpreted as a local facial patch in the input space. A similar previous SOM-based method has been proposed by Lawrence et al. [42], however, their main objective is not to provide a general representation for one sample per person problem, instead, they focused on improve the robustness of recognition system using a five-layered convolutional neural network (CNN).

a)

b)

c)

Fig.10. Example of (a) an original face image, (b)its projection (SOM-face) and (c) the reconstructed image

As explained before, LDA-based subspace methods may fail to work under the condition of only one sample per class. Chen et al. [47] proposed a method to make LDA applicable to this extreme small sample size condition. By first partitioning each face image into a set of corresponding sub-images with the same dimensionality (Fig.9a), multiple training samples (composed of all the partitioned sub-images) for each class are produced and thus FLDA can be applicable to the set of newly-produced samples. This method has been tested on a subset of FERET dataset containing 200 persons with one training image per person, and an 86.5% recognition accuracy is achieved. Huang et al [48] also proposed a LDA-based method to handle the one sample problem. Their method is similar to the spatial perturbation method developed by Martinez [30] except that only some facial components such as mouth and eyes instead of the whole face are perturbed. In this way, the number of training patterns per class is enlarged and LDA thus can be performed. Pose change is one of the most important and difficult issues for the use of automatic face recognition. To deal with this problem under the condition of one sample per person, Kanade et al. proposed a multisubregion based probabilistic approach [101]. Similar to the Bayesian method proposed by Moghaddam et al. [18], the face recognition problem at hand is formulated as a two-class pattern classification problem, i.e., whether the two faces are from the same identity or not. In particular, they divided each face into a set of subregions and then tried to construct a probabilistic model of how each local subregion of a face changes its appearance as the pose changes. Such a model is difficult to estimate reliably if adequate training information is absent. Therefore, an accessorial face dataset with many viewing angles are collected for training. After the so-needed probabilistic model is obtained under a Gaussian assumption, the utility of each subregion for the task of face recognition can be calculated and further combined for the final decision. Experiments on the CMU PIE database [102] show that the recognition performance retains within a less than 10% difference until the probe pose begins to differ more than 45 degree from the fron20

tal face. Lam K.M et al. also presented a method for pose-invariant recognition from one sample per person [49]. Their idea is to combine analytic-based and appearance-based features in a two-stage strategy (see also [103] for more discussion). In the first stage, 15 feature points were manually located on a face, which were then used to estimate a head model to characterize the rotation of the face. In the second stage, the correlation of local appearance features of the eyes, nose, and mouth were computed for face identification. Note that before the correlation computation, the previously-estimated head model was employed to provide the local components a suitable compensation for possible geometrical distortion caused by head rotation. Experiments on the ORL face dataset [17] show that under different perspective variations, the overall recognition rates of their method are over 84% and 96% for the first and the first three likely matched faces, respectively. The methods mentioned above don’t explicitly consider the relationship between local features. It is conceivable that using that information would be beneficial for a recognition system. A possible way for that purpose is to construct a flexible geometrical model over the local features as done in EBGM [38,39,40]. Motivated by this, Huang et al [89] and Heisele et al [88] proposed a component-based detection/recognition method with gray-scale local features. Unfortunately, their methods need a lot number of training samples per person, taken from different poses and lighting directions, and thus are not suitable for the problem considered in this paper here. Another interesting way to incorporate global information is the Hidden Markov Model (HMM)-based method. Rather than treating face image as a static topology graph with local features as nodes, HMMbased method characterizes face pattern as a dynamic random process with a set of parameters. Samaria et al. [83] illustrated the usefulness of HMM techniques in face recognition. In their method, a face pattern is divided into five overlapped regions, including the forehead, eyes, nose, mouth and chin as shown in Fig.9c. Then the HMM technique is introduced by regarding each region as one hidden state of a HMM model. A face pattern is then regarded as an observation sequence consisted of five states, each of which can be modeled by a multivariate Gaussian distribution, and the probabilistic transitions between states can be learned from the boundaries between regions. After the HMM has been trained, a face can be recognized by calculating the output probability of its observation sequence. One drawback of this method lies in its greedy requirement in training samples to ensure the reliability of parameter estimation. Le et al. [44] presented a method to make HMM technique applicable in one sample circumstances. Two factors contributed to the feasibility and effectiveness of their method. First, they generated a large collection of observation vectors from each image, in both vertical and horizontal directions, thus enlarging the training set. Second, the Haar wavelet transform was applied to the image to lessen the dimension of the observation vectors and improve the robustness performance. Their experiment results tested on the frontal view AR face database show that the proposed method outperforms the PCA, LDA, and Local Feature Analysis (LFA [91]) approaches. Ahonen et al. described a local-appearance-based approach, with a single, spatially enhanced feature histogram for global information representation. In their method, three different levels of locality are de21

signed, i.e., pixel-level, regional level and holistic level. The first two levels of locality are realized by dividing the face image into small regions (Fig.9a), from which the Local Binary Pattern (LBP) features [97] are extracted for efficient texture information representation. The holistic level of locality, i.e., the global description of the face, can be obtained by concatenating the regional LBP features extracted. The recognition is performed using a nearest neighbor classifier in the computed feature space with Chi square as a dissimilarity measure. Their experiments on the FERET dataset show good robust performance using five samples per person for training. Due to its local nature for representation and the simple nonparametric classifier, this method can naturally support the one sample problem. Besides LBP, other features that widely used in computer vision field can also be used in face recognition, such as fractal features. For example, Komleh et al [46] presented a method based on fractal features for expression-invariant face recognition. Their method is tested on the MIT face database with 100 subjects. One image per subject was used for training while 10 images per subject with different expressions for testing. Experimental results show that the fractal features are robust against expression variation. Relevant studies in psychophysics and neuroscience have revealed that different facial features have different degree of significance to face recognition [1]. For example, it has been found that hair, face outline, eyes and mouth are more important for perceiving and remembering faces, while the nose plays a relatively unimportant role [92]. Inspired by this finding, Brunelli and Poggio used four masks respectively to get the regions of eyes, nose, mouth and the whole face for recognition [8]. Subsequently, Pentland et al. [93] extended Brunelli and Poggio’s work by projecting each local feature onto its corresponding eigenspace and using the obtained eigenfeatures for recognition. Experimental results of both studies indicate that these facial features are indeed important for face identification, which are conformed to the outcome of Shepherd et al. [92] in the field of neuroscience. However, defining in advance the same regions for all the classes seems to be inconsistent with our intuition that each class should have its own class-specific features, which are really meaningful to the recognition. In this sense, automatic feature selection, as a specific case of feature weighting techniques [95], may be more appropriate. Automatic feature selection, however, is generally very difficult for high-dimensional, unstructured face image. Tan et al. [96] proposed a possible solution to this problem. Their idea is to transform the high dimensional face image data to lower dimensional space first, and then select features in the latter space. The task of feature selection may be much easier due to the simplification of the feature space. This method is actually a direct extension to their SOM-face algorithm [43]. More specifically, after representing each face image as a set of nodes in low-dimensional SOM topological space, some simple statistics based on the class distribution of face image data (e.g. the number of faces absorbed by each node) is computed and used to identify important local features of face images. Their experimental results on AR database reveal that up to 80% sub-blocks of a face can be safely removed from the probe set without loss of the classification accuracy. This could be particularly useful when a compact representation of face is needed, such as in the application of smart card, where the storage capability is very limited. Geng and Zhou [94] also proposed a method named SEME to automatically select the appropriate regions for face recognition, based on the idea from ensemble learning field that the facial regions which are both accurate
22

and diverse should be chosen as candidates. Martinez described a local area weighting based method for expression-invariant face recognition [3032]. His work was motivated in part by the fact that different face expressions influence different parts of the face more than others. This information can be incorporated into the classifier as prior knowledge. To do that, the author built a learning mechanism to search for those areas that are less affected by a given emotion within a group of people, and an accessorial dataset were employed for this purpose. Once the soneeded weights have been obtained, they can be used for recognition within the same group of individuals. Their experiments on the AR database with 50 persons show a significant increase on the recognition rate than the un-weighted version, even when the expression is largely changed on the probe image. However, one drawback of this method is that the facial expression of the testing data must be given, which suggests that additional expression detection may be performed before the identification could be really carried out. 3.2.3 Discussion We have reviewed and classified local methods dealing with the one sample problem into two major categories, i.e., feature-based and appearance-based method. However, in some specific situation, these two categories are not so distinct, due to the fact that local regions are actually consisted of a series of pixels, among which interesting feature points could be detected. Furthermore, other categorical standards may be applicable, such as the types of local features, the recognition method, or the way to model global information. Nevertheless, we believe that since the manner to extract local feature is the start point of any local method and has global influence on the followed processes, using it as the categorical guideline is both sufficient and appropriate. Before ending this section, it should be noted that although local methods are proven to be an effective way to handle the one sample problem, some common problems are still unsolved in these method. For example, it is not clear which kind of local features and in which way to incorporate global information is more appropriate for a given application scenario. In addition, most methods mentioned above are only robust against some variations while not against others. For instance, EGBM is known robust against changes of expression, illumination and pose, but not against occlusion; Local probabilistic method and SOM-face are robust against large expression variation and occlusion, but not against pose changes. A possible way to further improve the robustness performance of a recognition system, as pointed out by Zhao et al. [1], may lie in the combination of different methods, called hybrid methods here. 3.3 Hybrid methods

Hybrid methods are those approaches using both holistic and local features. The key factors that influence the performance of hybrid methods include how to determine which features should be combined and how to combine, so as to preserve their advantages and avert their disadvantages at the same time. These problems have close relationship with the multiple classifier system (MCS [70]) and ensemble learning [100] in the field of machine learning. Unfortunately, even in these fields, these problems remain unsolved. In spite of this, numerous efforts made in these fields indeed provide us some insights into solving these problems, and these lessons can be used as guidelines in designing a hybrid face recognition system. For example, components of a hybrid system, either feature or classifier, should be both accurate and diverse,
23

such that a complementary advantages can be feasible. In fact, local features and global features have quite different properties and can hopefully offer complementary information about the classification task. Table 2 summarizes qualitatively the difference between the two types of features. We can see from the table that local features and global ones are separately sensitive to different variation factors. For instance, illumination changes may have more influence on local features, while expression changes have more impact on holistic features. For these observations, hybrid methods that use both holistic and local information for recognition may be an effective way to reduce the complexity of classifiers and improve their generalization capability.

Table 2 Comparison of the local features and global features’ sensitiveness to variations Variation factors Small variations Large variations Illuminations [98] Expressions [30,43] Pose [88] Noise [99] Occlusion [30,43] Local features not sensitive sensitive very sensitive not sensitive sensitive Very sensitive not sensitive Holistic features sensitive very sensitive sensitive sensitive very sensitive Sensitive very sensitive

Despite the potential advantages, the work in this the category is still relatively few, possibly due to the difficulties mentioned above, while typical hybrid methods in traditional sense (i.e., multiple samples per person), such as flexible appearance models [104], hybrid LFA [91], are generally not suitable for handling the one sample problem. We hope more researching efforts could be engaged in this method, and in doing so, we believe that the potential power of hybrid method would put forth sooner or later. Before ending the discussion of this topic, we note that, from some point of view, local methods can be regarded as a specific kind of hybrid method, since global information is usually incorporated in some way into the algorithm. In local probabilistic method [30], for example, novel training samples are first generated for each person with the holistic method, and then local method is utilized for recognition.

4. Performance Evaluation To quantitatively assess and fairly compare the methods that aim at addressing the one sample per person problem, algorithms should be tested on the same benchmark dataset according to a standard testing procedure. Unfortunately, such a requirement is seldom satisfied in practice. Although numerous algorithms have been developed, most of them have been tested on different datasets in a different manner. In this section, we review the recognition performance reported by these algorithms and a few issues that need to be carefully considered when evaluating algorithms under the condition of only one sample per person. Table 3 summarizes the reported performance among several reviewed methods dealing with the one
24

sample problem. According to the suggestion from [53], in this table, several statistics directly relating to the experimental results are selected to describe the performance of the given algorithms. These include the name of the dataset used, primary variation contained in the probe set, the total number of testing images and persons, whether accessorial dataset is used to help for training, and the performance percentages for the top one rank score. To help understand the difficulty of the given problem, the performance of standard eigenface algorithm is also given here as benchmark. Note that information about training set except accessorial dataset is not mentioned here, since for most algorithms considered here, the number of persons in the gallery set is the same as that in the probe set, and the number of training image per person is only one. It is also worthy mentioning that from the perspective of performance evaluation, however, the experimental statistics listed here may not fully characterize the behavior of these methods, and furthermore, it is really difficult to declare a “winner” algorithm from this table.

Table 3 Summary of reported experimental results of various algorithms
Method Face dataset No. testing persons parallel deformation [33] local probabilistic subspace [30] -SOMface [43] -2DPCA [27] 1D-DHMM [44]
2

of

No. testing images 620 600

of

Accessorial dataset used? Y Y

Top match score 85.0% 82.3%

one

Benchmark score 32.1% 70.2%

Primary variations in the Testing dataset

N/A AR

62 100

Pose Expression, Time

AR AR AR AR AR

100 100 100 100 120

400 600 400 600 1440

N N N N N

71.0% 93.7% 76.0% 74.8% 89.8%

33.0% 70.2% 33.0% 70.2% 67.2%

Occlusion Expression, Time Occlusion Expression, Time Expression, Illumination, Time

(PC) A [24] E(PC)2A [25] SVD Perturbation [26] Modular FLDA [47] Component LDA [48] EBGM [40] LBP [45] Discriminant PCA [29] analytic-to-holistic approach [49] Face-specific Subspace

FERET FERET FERET FERET FERET FERET FERET FERET ORL

200 200 200 200 70 1196 1196 256 40

200 200 200 200 350 1196 1196 914 160

N N N N N N N Y N

83.5% 85.5% 85.0% 86.5% 78.6% 95.0% 97.0% 72.0% 84.0%

83.0% 83.0% 83.0% 83.0% 32.0% 79.7% 79.7% 74.0% 74.7%

N/A N/A N/A N/A Expression, Illumination Expression Expression Expression, Time Pose

Yale

15

150

N

95.3%

74.7%

Expression, Illumination

25

[61]

Firstly, according to FERET, basically two types of face recognition methods, i.e., gallery insensitive and gallery sensitive, can be classified. The gallery set consists of a set of known individuals, while the probe set is input images to be labeled. Note that gallery set is not necessarily the training set. If an algorithm’s training procedure is completed prior to the start of the test, such an algorithm is called gallery insensitive, otherwise gallery sensitive. FERET design principles force each algorithm to be gallery insensitive to ensure that each algorithm has a general representation for faces, not a representation tuned to a specific gallery. This requirement is desired for commercial systems, since users are usually not care about how the system at hand is trained. However, if one cannot adjust system parameters according to the gallery set, some problems may be encountered. From the perspective of pattern recognition, it is always assumed that some close relationship exists between the training set and the probe set. For example, intuitively, a recognition system trained on a dataset of American people may not be directly used for the face identification of Chinese people. In other words, only a large amount of representative training samples were pre-collected such that the gallery is fully covered, the gallery-insensitiveness could be really possible. This seems to deviate the goal of the algorithm evaluation to a pursuit of high quality training set, and algorithms having good training set are obvious in the ascendant of those not. Therefore, we urge members of the community to give a detailed description of training set (including accessorial set), if they want their algorithms to be evaluated fairly as a learning mechanism rather than a commercial system. Besides, the turning parameters involved should not be ignored as well, since they may have a direct effect on the classification performance. Secondly, most of current algorithms don’t report how their performance is influenced by the number of training images per person. This is particular important for one sample problem, since in this case, the efficiency of learning from only a single face image is critical. Indeed, we should not ignore the following question: at least how many training images are required to achieve certain performance in a particular task? Fortunately, this question has begun to drawn attention from researchers most recently [114]. Thirdly, for some applications, especially real-time application, time taken is a very important factor. In [115], Ting Shan builds a real-time automatic person identification system using a gallery of still images containing only one frontal view image per person. Finally, the evaluation criteria should also take the specific application scenarios into account. Different applications present different constraints and need different processing requirements. For example, when recognizing National ID card, credit card, passport, driver license, et al., the shooting condition is usually controllable and the quality of photographs is relatively good; while in the situation of video surveillance, the obtained image may be both small in size and blurred, and the background of pictures can be very complicated as well. Therefore, the probe set should be statistically as similar as possible to the images in real world. In summary, for a fair and effective performance evaluation of the methods of one sample per person, great attention should be paid on the protocols, the training and testing data sets. If existed evaluation
26

protocol such as FERET is applied, special care should also be taken on whether it is suitable for the one sample problem. Needless to say, some suggestions in FERET are appropriate for the problem considered here. For example, in its close-universe model, not only ‘is the top match correct’ but also ‘is the correct answer in top n matches? ’ are considered, which indeed agrees with the general application scenarios of one sample per person.

5. Discussion and Conclusion Face recognition from one sample per person is an important but a challenging problem both in theory and for real-world applications. In this paper, we attempt to provide a comprehensive survey of current researches on this problem. Firstly, we discussed what the one sample problem is, how it affects the current face recognition algorithms, and revealed its important implications for both face recognition and artificial intelligence research. Secondly, we described the current state of the art in dealing with this problem with a three-categories-based taxonomy, i.e., holistic, local and hybrid methods. This, we believe, could help us to develop a clear understanding of this particular problem. Thirdly, we reviewed the relative recognition performance of current algorithms and discussed several issues that need to be carefully considered when deciding how to evaluate a proposed method. Specifically, we urge the community to report the effect of training-set size on the performance of the underlying algorithm. Finally, it is worthy mentioning that some closely related problems are deliberately ignored in this paper, such as face detection and normalization of the images. For these topics, we can refer to [6] and [1] for a detailed discussion. Obviously, the one sample per person problem and the multiple samples per person problem are twoaspects of the same problem. Separately considering the two problems is motivated in part by the observation that the major part of current research still focuses on the latter problem, while the former does not gain enough deserved attention from the face recognition community. We have revealed in this paper that face recognition algorithms under the assumption of multiple training samples per person may not be applied directly to solve the one training sample per person problem. The relationship between such two problems is beyond the number of training samples per person, and it lies in such aspects as application scenarios, algorithm designing methodology and performance evaluation. Table 4 briefly summarizes these two problems. We can see from this table that there are both commonness and diversity between them. Diversity indicates the necessity of studying the one sample problem, while the commonness suggests the potential value of multiple samples problem in dealing with the one sample problem and vice versa. We believe that bridging these two aspects of face recognition problem could help us designing more efficient algorithms. For example, we can generate additional samples as the way in [33] from a single sample per person, and use them for facilitating the learning of manifold structure of face images in low-dimensional space, where the requirement of training samples is usually demanding. Table 4 Comparison of recognition from one sample per person and from multiple samples per person One sample per person problem Multiple samples per person problem

27

Major applications

Smart cards (national ID, drivers’ licenses, passports, credit card, et al.), law enforcement, surveillance

Information security, entertainment, human-machine interaction, et al.

Advantages

Lower costs of collecting, storing, and processing samples

More training samples available, higher robustness, plenty of statistical tools for use

Disadvantages

Small size of training samples, lower robustness performance, fewer available methods and tools.

Higher collecting, training and storing costs

Although the one sample problem is by far not solved, some promising research directions can be suggested. Firstly, the solution of this problem may benefit from other related fields, especially from the field of biometrics, where other biometric features, such as iris, fingerprints and speech can be used along with the single training sample to construct more robust recognition system [106]. Secondly, one can seek alternative methods that have the capability to provide more information to the recognizer so as to compensate for the limited representative of a single image. Among them, we believe that at least two methods likely influence the future developments of the one sample problem, i.e., the 3D recognition techniques and techniques that can exploit information from other faces. 3D enhanced approaches can help to overcome sensitivity to geometric and lighting changes of faces. In [113], a 3D Spherical Harmonic Basis Morphable Model (SHBMM) is proposed, with which, both face synthesis and recognition can be feasible even if only one single image under unknown lighting is available. The idea of using prior knowledge, also referred to as learning to learn [87] in the machine learning field, has been widely used and studied in the last decade. Recent results [112] show interesting cases in handwritten digit recognition, where, by exploiting prior knowledge, a recognition system trained on only a single example for each class can achieve better performance than that of the one using thousands of training examples per character. Thirdly, one of the major contributions from Turk M et al [10] is that they successfully bring the face recognition into the field of pattern recognition in an efficient and convenient manner, such that a large amount of mature methods in PR, including Bayesian method, PCA, LDA, SVM, et al., can be immediately used for face recognition. However, if only a single sample per person is presented to the system, most of pattern recognition methods are helpless. Therefore, we think that the final solution of this problem depends heavily on the advances of related areas, such as pattern recognition, machine learning and machine vision. Finally, although there do exist some results from statistical learning theory [111], more study needs to be carried out on how the generalization capacity of an algorithm is influenced by the training set size (especially with extremely small size). It is also meaningful to theoretically investigate how much information can be gained when additional training samples could be presented to the system. Nevertheless, these difficulties do not mean that addressing this problem cannot be achieved under the current technique framework. As shown in Table 3, considerable efforts in this field are very encouraging.
28

Actually, recent studies [57] have shown that a single training image, especially the front-view images, contains plentiful information that could be used for recognition. Moses et al [110] also experimentally revealed that human has the ability to generalize the recognition of faces to novel images, from just a single view.

Acknowledgements We thank Jianbin Tan from LASER, Dept. of Computer Science, UMASS for previewing the paper and providing us many useful suggestions. This work was supported by the National Natural Science Foundation of China under the Grant No. 60271017, the National Science Fund for Distinguished Young Scholars of China under the Grant No. 60325207, the Natural Science Foundation of Jiangsu Province under the Grant No. BK2005122.

References [1] Zhao, W. , Chellappa, R., Phillips, P. J. and Rosenfeld, A., Face Recognition: A Literature Survey, ACM Computing Survey, December Issue (2003) 399-458.

[2] Chellappa R, Wilson C L, Sirohey S. Human and machine recognition of faces: a survey. Proceedings of the IEEE, 83(5) (1995) 705-740. [3] Daugman J. Face and gesture recognition: Overview, IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7)(1997) 675-676. [4]Samal A. and Iyengar P. A., Automatic recognition and analysis of human faces and facial expressions: A survey, Pattern Recognition, 25(1) (1992) 65–77. [5] Valentin D., Abdi H., et al, Connectionist models of face processing: a survey, Pattern Recognition, 27(9) (1994) 1209-1230. [6] Yang,M. H.,Kriegman, D., and Ahuja,N. Detecting faces in images: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(1) (2002) 34–58. [7] Pantic M. and Rothkrantz L.J.M., Automatic analysis of facial expressions: the state of the art, IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12) (2000) 1424-1445. [8] Brunelli, R. and Poggio, T. Face recognition: Features versus templates. IEEE Trans. Pattern Analysis and Machine Intelligence, 15 (10) (1993) 1042-1062. [9] Sirovich.L and Kirby,M. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A 4(3) (1987) 519-524. [10] Turk M, Pentland A. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1) (1991) 7186. [11] Jain, A.K. and Chandrasekaran, B., Dimensionality and sample size considerations in pattern recognition practice. In Handbook of Statistics, P.R. Krishnaiah and L.N. Kanal, Eds., 2 (1982) 835-855. [12] Belhumeur, P., Hespanha, J. and Kriegman, D. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7) (1997) 711720. [13] Swets, D.L., Weng., J. Using discriminant eigenfeatures for image retrieval. IEEE Trans. Pattern Analysis and Machine Intelligence, 18(8) (1996) 831-836. [14]Lu, J., Plataniotis, K.N.and Venetsanopoulos, A.N.. Face Recognition using kernel direct discriminant analysis algorithms. IEEE Trans. on Neural Networks, 14(1) (2003) 117-126. [15]Zhao W.,Chellappa R., Phillips P.J., Subspace Linear Discriminant Analysis for Face Recognition,
29

Technical Report CAR-TR-914,Center for Automation Research,University of Maryland,1999. [16]Wang, X., Tang, X., Unified subspace analysis for face recognition. Proc.9th IEEE Internat. Conf. on ComputerVision, (2003) 679–686. [17]Ferdinando Samaria, Andy Harter. Parameterisation of a Stochastic Model for Human Face Identification. Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, (December 1994) 138-142. [18] Moghaddam, B., Pentland. A. ,“Probabilistic visual learning for object representation. “IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7) (1997) 696-710. [19] Phillipls, P. J., Support vector machines applied to face fecognition. Adv. Neural Inform. Process. Syst. 11(03) (1998) 809. [20] Li, S. Z. and Lu, J. , Face recognition using the nearest feature line method. IEEE Trans. Neural Netw. 10(2) (1999) 439–443. [21] Liu, C., Wechsler H., Evolutionary pursuit and its application to face recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 22 (6) (2000) 570–582. [22]He,X., Yan,X, Hu,Y,et.al., Face Recognition Using Laplacianfaces, IEEE Trans. Pattern Analysis and Machine Intelligence, 27(3) (2005) 328-340. [23]Martinez, A. and Kak, A. C., PCA versus LDA. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(2) (2001) 228–233. [24]Wu, J. and Zhou, Z.-H., Face Recognition with one training image per person. Pattern Recognition Letters, 23(14) (2002) 1711-1719. [25] Chen, S. C., Zhang D.Q., and Zhou Z.-H. Enhanced (PC)2A for face recognition with one training image per person. Pattern Recognition Letters, 25(10) (2004) 1173-1181. [26] Zhang D.Q., Chen S.C., and Zhou Z.-H.. A new face recognition method based on SVD perturbation for single example image per person. Applied Mathematics and Computation, 163(2)(2005) 895-907. [27]Yang J., Zhang D., Frangi A. F. & Yang J., "Two-dimensional PCA: A new approach to appearancebased face representation and recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, 26(1) (2004) 131-137. [28] Jung H.C.,Hwang B.W., and Lee S.W., Authenticating Corrupted Face Image Based on Noise Model, Proceedings of the sixth IEEE international conference on Automatic Face and Gesture Recognition,(2004)272. [29] Wang J., Plataniotis K.N. and Venetsanopoulos A.N., Selecting discriminant eigenfaces for face recognition, Pattern Recognition Letters, 26(10)(2005) 1470-1482. [30]Martinez, A.M. Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Analysis and Machine Intelligence 25(6) (2002) 748-763. [31] Martinez A.M., Recognizing Expression Variant Faces from a Single Sample Image per Class, Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), (2003)353-358. [32] Martinez A.M.,Matching Expression Variant Faces, Vision Research, 43(9) (2003) 1047-1060. [33]Beymer D. and Poggio T., Face Recognition from One Example View,Science, 272(5250) (1996). [34]Vetter, T., Synthesis of novel views from a single face image. International Journal of Computer Vision, 28(2) (1998) 102-116. [35]Niyogi, P.; Girosi, F.; Poggio, T. Incorporating prior information in machine learning by creating virtual examples, Proceedings of the IEEE, 86(11)(1998) 2196-2209. [36]Frade, F. De la Torre, Gross R., Baker S., and Kumar V., Representational Oriented Component Analysis (ROCA) for Face Recognition with One Sample Image per Training Class, In Proceedings, IEEE Conference on Computer Vision and Pattern Recognition 2(June 2005) 266-273. [37] Gao Y.and Qi Y., Robust visual similarity retrieval in single model face databases, Pattern Recognition, 38(7) (2005) 1009-1020. [38]Manjunath, B. S.,Chellappa, R., and Malsburg, C. V. D. A feature based approach to face recognition.
30

In Proceedings, IEEE Conference on Computer Vision and Pattern Recognition 1(1992) 373–378. [39]Lades M., Vorbruggen J, Buhmann J, Lange J., Malsburg von der, and Wurtz R., “Distortion invariant object recognition in the dynamic link architecture,” IEEE Trans. Comput., 42(3) (1993) 300-311. [40]Wiskott L, Fellous,Kruger N. and Malsburg C.von, Face Recognition by Elastic Bunch Graph Matching, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(7)( July, 1997) 775-779. [41]Kepenekci B. , Tek F. B., G. Bozdagi Akar , Occluded Face Recognition based on Gabor Wavelets, ICIP 2002, Sept 2002, Rochester, NY, MP-P3.10 [42]Lawrence, S.,Giles, C.L., Tsoi, A. and Back, A.. Face recognition: A convolutional neural-network approach. IEEE Trans. on Neural Networks, 8(1) (1997) 98-113. [43]Tan X., Chen S.C., Zhou Z.-H., and Zhang F. Recognizing partially occluded, expression variant faces from single training image per person with SOM and soft kNN ensemble. IEEE Transactions on Neural Networks, 16(4)(2005) 875-886. [44]Le, H.-S., Li, H., Recognizing frontal face images using Hidden Markov models with one training image per person, Proceedings of the 17th International Conference on Pattern Recognition(ICPR04), 1 ( 2004) 318 – 321. [45]Ahonen T, Hadid A & Pietik?inen M , Face recognition with local binary patterns. Computer Vision, ECCV 2004 Proceedings, Lecture Notes in Computer Science 3021, Springer, (2004) 469-481. [46]Komleh H.E., Chandran V. and Sridharan S., Robustness to expression variations in fractal-based face recognition, Proc. of ISSPA-01, Kuala Lumpur, Malaysia, 13-16 August, 1 (2001) 359-362. [47]Chen S.C., Liu J., and Zhou Z.-H.. Making FLDA applicable to face recognition with one sample per person. Pattern Recognition, 37(7) (2004) 1553-1555. [48]Huang J., Yuen P. C., Chen W.S., Lai J.H.: Component-based LDA Method for Face Recognition with One Training Sample. AMFG (2003) 120-126. [49]Lam Kin-Man; Yan Hong, An analytic-to-holistic approach for face recognition based on a single frontal view, IEEE Trans. Pattern Analysis and Machine Intelligence, 20(7)(1998) 673-686. [50]O’Toole,A.J.,Abdi.H. Low-dimensional representation of faces in higher dimensions of the face space. Optical Society of America,10(3)(1993) 405-411. [51]Kotropoulos C. and Pitas I., “Rule-Based Face Detection in Frontal Views,” Proc. Int’l Conf. Acoustics, Speech and Signal Processing, 4 (1997) 2537-2540. [52] Phillips P. J., Moon H., Rizvi S., and Rauss P. The FERET Evaluation Methodology for Face Recognition Algorithms: IEEE Trans. Trans. Pattern Analysis and Machine Intelligence, 22(10) (2000) 1090-1103. [53] Phillips J.P. and Newton E.M.. "Meta-Analysis of Face Recognition Algorithms." Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, D.C., (May 2002) 235-241. [54] Golub G.H., Loan C.F. Van. Matrix Computations. Johns Hopkins University Press, Baltimore, Maryland, (1983). [55] Yang, Q. and Ding,X., Symmetrical PAC in Face Recognition. ICIP02,V(2)(2002) 97-100. [56]Gutta S., Wechsler H.,” Face Recognition Using Asymmetric Faces”, David Zhang, Anil K. Jain(Eds.);ICBA2004,LNCS 3072(2004) 162-168. [57] Gross R., Shi J., and Cohn J., Quo Vadis Face Recognition? Third Workshop on Empirical Evaluation Methods in Computer Vision, (December, 2001) 119-132. [58]T. Poggio and T. Vetter, Recognition and structure from one 2D model view: Observations on prototypes, object classes and symmetries, Artificial Intell. Lab., MIT, Cambridge, MA, A.I. Memo no. 1347(1992). [59] Beymer D. and Poggio T., Image representations for visual learning, Science, 272(5270)(1996) 1905– 1909. [60]Hwang.B.W and Lee.S.W., Reconstruction of Partially Damaged Face Images Based on a Morphable Face Model. IEEE Trans. Pattern Analysis and Machine Intelligence, 25(3)(2003)365-372.
31

[61]Shan S.G., Gao W., Zhao D., Face Identification Based On Face-Specific Subspace, International Journal of Image and System Technology, Special issue on face processing, analysis and synthesis, 13(1)(2003) 23-32. [62]Wen G., Shan S., et.al., Virtual Face Image Generation For Illumination And Pose Insensitive Face Recognition, Proc. of ICASSP2003, HongKong, IV(2003) 776-779. [63] Cootes,T.F.,Edwards,G.J.,and Taylor,C.J., Active appearance models. IEEE Trans. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(6) (2001) 681–685. [64]Geoghiades,A.S.,Belhumeur,P.N., and Kriegman, D.J., Illumination-based image synthesis: Creating novel images of human faces under differing pose and lighting. In Proceedings,Workshop on MultiView Modeling and Analysisof Visual Scenes, (1999) 47–54. [65] Geoghiades,A.S.,Belhumeur,P.N., and Kriegman, D.J., From few to many: Illuminationcone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(2001) 643–660. [66] Geoghiades,A.S., Kriegman, D.J., and Belhumeur,P.N. Illumination cones for recognition under variable lighting: Faces. In Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, (1998) 52–58. [67]Donahue,M.,Geiger,D.,Hummel,R.,and Liu,T., Sparse Representations for Image Decompositions with Occlusions, IEEE Conf. on Comp. Vis. and Pat. Rec. 1(1996) 7-12. [68]Jacobs D.W., D. Weinshall, and Y. Gdalyahu. Classification with Non-Metric Distances: Image Retrieval and Class Representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(6)(2000) 583-600. [69] Hong, Z. Algebraic feature extraction of image for recognition, Pattern Recognition, 24(1991) 211219. [70]Kittler, J., Hatef, M., Duin, R.P.W. and Matas, J., On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3)(1998) 226–239. [71]Kelly, M. D. Visual identification of people by computer. Tech. rep. AI-130, Stanford AI Project, Stanford, CA.(1970). [72]Rowley H.A., Baluja S.,and Kanade T., Neural Network-Based Face Detection, IEEE Trans. Pattern Analysis and Machine Intelligence ,20(1)(1998)23-38. [73]Wu C., C.Liu, et al, Automatic Eyeglasses Removal from Face Image. IEEE Trans. Pattern Analysis and Machine Intelligence , 26(3((2004) 322-336. [74]Lanitis, A., Taylor, C. J., and Cootes, T. F. Automatic face identification system using flexible appearance models. Image Vis. Comput. 13(5) (1995) 393–401. [75] Jain A. K., Zhong Y., and Lakshmanan S., Object matching using deformable templates, IEEE Trans. Pattern Analysis and Machine Intelligence, 22(1)(2000) 4-37. [76]Kuncheva, L.I, Whitaker, C.J.: Feature subsets for classifier combination: an enumerative experiment. In: Kittler, J., Roli, F. (eds.): Lecture Notes in Computer Science, Vol. 2096. Springer, Berlin (2001) 228-237 [77]Villeala P.R., J.uan Humberto Sossa Azuela, Improving Pattern Recognition Using Several Feature Vectors. Lecture Notes in Computer Science, Springer-Verlag Heidelberg,2002. [78] Lee T. S., Image representation using 2-d Gabor wavelets, IEEE Trans.On Pattern Analysis and Machine Intelligence, 18(10) (Oct. 1996) 959-971.. [79] Zhang J., Yan Y., Lades M., Face recognition: eigenface, elastic matching and neural nets, Proc. IEEE 85 (9) (September 1997) 1423-1435. [80] Kanade T., Picture processing by computer complex and recognition of human faces, Technical Report, Kyoto University, Department of Information Science, 1973. [81] Goldstein A.J., Harmon L.D., Lesk A.B., Identification of human faces, Proc. IEEE 59(5) (1971) 748760. [82] Kaya Y., Kobayashi K., A basic study on human face recognition, Frontiers of Pattern Recognition, S.
32

Watanabe, (1972) 265-289. [83]Samaria F. Face segmentation for identification using Hidden Markov Models, British Machine Vision Conference, BMVA Press, (1993) 399-408. [84]K?lsch T., Keysers D., Ney H., and Paredes R.: Enhancements for Local Feature Based Image Classification. In International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, I (August 2004) 248-251. [85] Kim, Y.K. and J.B. Ra, Adaptive learning method in self organizing map for edge preserving vector quantization, IEEE Transactions on Neural Networks, 6 (1)(1995) 278-280. [86]Kohonen, T. Self-Organizing Map, 2nd edition, Berlin: Springer-Verlag, 1997. [87]Thrun, S. and Pratt, L.. Learning to learn: introduction and overview. In Thrun,S. and Pratt, L., editors, Learning to learn. Kluwer Academic Publishers, 1998. [88]Heisele, B., Serre, T., Pontil, M., and Poggio, T. Component-based face detection. In Proceedings, IEEE Conference on Computer Vision and Pattern Recognition. 1 (2001) 657-662. [89]Huang, J., Heisele, B., and Blanz, V. Component-based face recognition with 3D morphable models. In Proceedings, International Conference on Audio- and Video-Based Person Authentication. [90] Mallat S. G.. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 11(7) (1989) 674-693. [91]Penev, P. and Atick, J. Local feature analysis: A general statistical theory for objecct representation. Netw.: Computat. Neural Syst. 7(1996) 477–500. [92]Shepherd, J.W.,Davidies, G.M., and Ellis,H. D. Studies of cue saliency. In Perceiving and Remembering aces, G. M. Davies, H. D. Ellis, andJ. W. Shepherd, Eds. Academic Press, London,U.K.(1981) [93]Pentland A, Moghaddam B, Starner T. View-based and modular eigenspaces for face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, 1 (1994) 84-91. [94] Geng X. and Zhou Z.-H. Image region selection and ensemble for face recognition. Journal of Computer Science & Technology 21 (2006) 116-125. [95]Wettschereck D., Aha D. W., and Mohri T. A review and comparative evaluation of feature weighting methods for lazy learning algorithms. Technical Report AIC-95-012, Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, Washington, D.C., (1995). [96] Tan X., Chen S.C., Zhou Z.-H., and Zhang F. Feature Selection for High Dimensional Face Image Using Self-Organizing Maps. In: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05), Hanoi, Vietnam, LNAI 3518(2005) 500-504. [97]Ojala, T., Pietik¨ainen, M., M¨aenp¨a¨a, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002) 971–987. [98]Hallinan P.W., et al., Two-and three-dimensional patterns of the face, Natick, MA: A K Peters,Letd. 1999. [99]Costen N.P., Cootes T.F., Taylor C.J., Compensating for ensemble-specific effects when building facial models, Image and Vision Computing, 20(2002) 673-682. [100]Zhou Z.-H., Wu J., and Tang W., Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1-2)(200) 239–263. [101] Kanade T. and Yamada A., Multi-Subregion Based Probabilistic Approach Toward Pose-Invariant Face Recognition. In Proc. Of IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA), July.16-20, Kobe Japan, (2003) 954-959. [102] Sim T., Baker S. and Bsat M., The CMU Pose Illumination and Expression (PIE) Database. In proc. of the 5th International Conference on Automatic Face and Gesture Recognition, (2002). [103]Craw I., Costen N.,et al. How should Faces for Automatic Recognition? IEEE Transaction on Pattern Analysis and Machine Intelligence. 21(8) (1999) 725-736. [104]Lanitis A., Taylor C.J., and Cootes T.F., Automatic face identification system using flexible appear33

ance models. Image Vis. Comput. 13(1995) 393-401. [105]Martínez A. M. and Benavente R., “The AR Face Database,” CVC Technical Report,no.24,June 1998 [106] Choudhury T., Clarkson B., Jebara T. and Pentland A., “Multimodal Person Recognition Using Unsconstrained Audio and Video”, International Conference an Audio and Video-based Biometric Authentication, ,Washington D.C. , (1999) 176-181. [107] Jain A.K. and Chandrasekaran B., Dimensionality and Sample Size Considerations in Pattern Recognition Practice, in: Handbook of Statistics, vol. 2, ed. P.R. Krishnaiah and L.N. Kanal, , NorthHolland, Amsterdam, (1987) 835 - 855. [108] Raudys S.J. and Jain A.K., Small sample size effects in statistical pattern recognition: recommendations for practitioners, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3) (1991) 252-264. [109] Duin R.P.W., Small sample size generalization, in: G. Borgefors (eds.), SCIA'95, Proc. 9th Scandinavian Conf. on Image Analysis, Uppsala, Sweden, June 6-9, 2(1995) 957-964. [110]Moses, Y., Ullman, S., and Edelman, S. Generalization to novel images in upright and inverted faces. Perception, 25(1996) 443-462. [111] Vapnik, V.. Statistical Learning Theory. John Wiley & Sons ,(1998). [112] Erik Miller and Nick Matsakis and Paul Viola, "Learning from One Example Through Shared Densities on Transforms." Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 1 (2000) 464-471. [113] Zhang L, Samaras D., Face Recognition from a Single Training Image under Arbitrary Unknown Lighting Using Spherical Harmonics, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3) (March, 2006) 351-363. [114]Jonathon P., Flynn P.J., Scruggs, T., Bowyer K.W., Overview of the Face Recognition Grand Challenge, IEEE Conference on Computer Vision and Pattern Recognition, San Diego, I (June 2005) 947954. [115] Ting Shan, Automatic Multiple-View Person Identification System under Unconstrained Environment Using One Sample per person, Ph.D confirmation seminar in the School of Information Technology & Electrical Engineering (ITEE), University of Queensland, Aug. 2005.

34


相关文章:
【最新】基于多重人脸数据库的人脸识别的新方法_图文.doc
Y. Zhang, “Face Recognition from a Single Image Per Person: A Survey,” Institution of Automation, Chinese Academy of Sciences, Beijing, 2006. [4] ...
...for single image per person face recognition_图....pdf
pseudo fisherface method for single image per person face recognition_电子/...Zhang, “Face recognition from a single image per person: A survey”, ...
Face Recognition with Image Sets Using Manifold Density ....pdf
a set of a person’s face images is ... has been on recognition from single images. ...A survey of face recognition algorithms and ...
Survey Paper Face Detection and Face Recognition_2003.pdf
Using a pre-stored image database, the face recognition system should be able to identify or verify one or more persons in the scene. Before face ...
...face recognition from a single training image per person ....pdf
Robust face recognition from a single training image per person with kernel-based som-face_专业资料。Abstract. In this paper, a kernel-based SOM-face ...
face recognition using passive stereo vision.pdf
face recognition method approaches are capable to trace and recognize person ...Therefore, two separate PCA transformation is calculated, one for the image ...
...Recognition by Sparse Local Features from a Single Image ....pdf
Robust Face Recognition by Sparse Local Features from a Single Image under ...In addition, “one sample per person” is another challenging problem for ...
A new face recognition method based on svd perturbation for ....pdf
A new face recognition method based on svd perturbation for single example image per person_专业资料。At present, there are many methods for frontal view...
Face Recognition A Literature Survey.pdf
Face Recognition A Literature Survey_IT/计算机_专业资料。Face Recognition: ...images of a scene, identify or verify one or more persons in the scene ...
Face recognition in dynamic scenes.pdf
Much research e ort has been concentrated on face recognition tasks in which only a single image or at most a few images of each person are available...
face recognition with moments.pdf
face recognition with moments_IT/计算机_专业资料。...one exemplar 2.5D image per person is available.... a survey, Pattern Recognition Letters 28 (14) ...
Face recognition using the nearest feature line method_图文_....pdf
Face recognition using the nearest feature line ...image of an object from a single 2-D view of...ve images per person are randomly chosen from ...
Face recognition_ component-based versus global approaches_....pdf
Face recognition_ component-based versus global ...split the images of each person into view-speci... and expression variant faces from a single ...
Video Indexing Using Face Detection and Face Recognition ....pdf
Video Indexing Using Face Detection and Face Recognition Methods_专业资料。...A single HMM is trained for each person on the training images using the...
Support vector machines applied to face recognition.pdf
Support vector machines applied to face recognition_专业资料。Face recognition ...applications there can be only one training sample (image) of each person....
A Near-infrared Image Based Face Recognition System.pdf
A Near-infrared Image Based Face Recognition ...cantly from person to person; and recognition is...be approximately linearly combined into a single s...
A SMART CAMERA FOR FACE RECOGNITION_图文.pdf
Face recognition is one of those applications. In this paper we show that... “Face detection: a survey,” Computer Vision and Image Understanding, vol...
Face recognition_ A literature survey.pdf
ROSENFELD University of Maryland As one of the most successful applications of image analysis and understanding, face recognition has recently received signi?...
GABOR WAVELET BASED POSE ESTIMATION FOR FACE RECOGNITION.pdf
GABOR WAVELET BASED POSE ESTIMATION FOR FACE RECOGNITION_专业资料。Abstract. ...image per person is available during recognition, and 3) single image based...
Integrated face and gait recognition from multiple ....pdf
Beyond One Still Image F... 暂无评价 29页 免费 Face and Human Gait ...recognition we place virtual cameras to capture a side-view of the person....
更多相关标签: