Machine Tool Condition Monitoring Based on an Adaptive Gaussian Mixture Model
School of Mechatronic Engineering and Automation, Shanghai University, #149 Yan Chang Road, S
hanghai 200072, People’s Republic of China e-mail: firstname.lastname@example.org
Indirect, online tool wear monitoring is one of the most dif?cult tasks in the context of industrial machining operation. The challenge is how to construct an effective model that can consistently exemplify the degradation propagation of tool performance (i.e., tool wear) based on a continuous acquisition of multiple sensor signals. This paper proposes an adaptive Gaussian mixture model (AGMM) to provide a comprehensible and robust indication (i.e., Kullback–Leibler (KL) divergence) for quantifying tool performance degradation. Based on dynamic learning rate, parameter updating, and merge and split of Gaussian components, AGMM is capable of online adaptively learning the dynamic changes of tool performance in its full life. Furthermore, the performance changes of tools are quanti?ed by measuring the distance between two density distributions approximated by the AGMM and the baseline GMM trained by the normal data, respectively. Experimental results of its application in a machine tool test demonstrate the effectiveness of the AGMM-based KL-divergence indication for assessment of tool performance degradation. [DOI: 10.1115/1.4006093] Keywords: tool condition monitoring, adaptive learning, Gaussian mixture model, similarity measure
In a machine center, due to thermal fracturing, abrasion, attrition, diffusion chemical wear, and grain-pullout, the cutting tool gradually wears out, losses its sharpness, and becomes blunt. If cutting continues with a worn cutting tool, it will break causing considerable damage to the workpiece and even to the machine tool set-up. Therefore, tool condition monitoring (TCM) is crucial to the ef?cient operation of machining process where the machine tool is subject to continuous wear. The main aim of TCM is to use appropriate sensor signals and tool performance prediction techniques to identify and predict the cutting tool state, so as to reduce loss brought about by tool wear or failure . Basically, the tool wear procedure consists of different slopes, that is, an initial wear stage, a progressive wear stage, and a rapid wear stage. In the rapid wear stage, the presence of a large tool wear drastically increases the tool temperature, causing rapid deterioration of the cutting tool . It is of prime importance to detect and assess accurately the presence and propagation of tool wear, especially at their early stage, in tools to prevent the sequent damage and reduce costly downtime. Therefore, automatic monitoring systems that can reliably operate across a variety of conditions are highly desired and have been the focus of signi?cant amount of research. Generally, tool wear monitoring methods can be classi?ed into two categories: direct and indirect methods. In the recent years, a few researchers have worked on the applications of laser-based and video-based online vision system for direct TCM . But high cost and inconsistency due to variation in illumination have prevented this method from being implemented in real-world applications. Some indirect approaches, which rely on changes in
Contributed by the Manufacturing Engineering Division of ASME for publication in the JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING. Manuscript received March 18, 2011; ?nal manuscript received January 30, 2012; published online April 24, 2012. Assoc. Editor: Robert Gao.
the signals of sensors associated with tool wear, are more commonly used for TCM. These measurement signals are often from cutting force , machine vibration , acoustic emission (AE) , or various combinations of these signals . Based on these signals, various approaches such as arti?cial neural network , support vector machine , model-based technique , hidden Markov model (HMM) [1,11], etc., have been developed for TCM. However, these approaches generally applied these models with ?xed structure to implement tool wear monitoring and prediction. Recent attempts in developing TCM for milling [12,13] lack the sensor fusion strategies in the true sense proposed in some innovative works, because a particular sensor signal may not work well for a TCM [14,15]. Multiple sensor fusion [16,17] of several measurements at various levels is known to give more robust and accurate estimation of wear than those based on single sensor measurements [18,19]. Fusion of force and AE signals was attempted in Ref. . In Ref. , fusion of force and vibration signals was attempted. Fusion of forces, vibrations, and AE signals were used together to get better performance . There have been some review papers over the last decade on this sensor fusion issue [14,15,21,22]. Given this situation, researchers recommend that efforts in TCM development should in the future center on trying to extract the most valuable information from the monitoring signals. A big challenge for TCM is how to effectively online evaluate the tool performance states (i.e., tool wear) based on the multiple sensor fusion, so that failure can be predicted and prevented. Li et al.  used wavelet transforms and fuzzy techniques to monitor tool breakage and wear conditions in real time based on the spindle and feed motor current signals. Wang et al.  proposed a modeling framework for tool wear monitoring in machining process using HMM and codebook-based feature extraction method. Camci and Chinnam  developed an HMM-based prognostics JUNE 2012, Vol. 134 / 031004-1
Journal of Manufacturing Science and Engineering C 2012 by ASME Copyright V
model for tracking and forecasting the evolution of tool performance states and impending failures. However, these proposed methods generally do not consider the online adaptation of the proposed TCM models to improve their effectiveness and applicability. Adaptive learning of TCM models for the current states of tool wear can generally reduce the interventions of the machine operators and then improve their engineering applicability in the realworld applications. Therefore, an online adaptive strategy that learns the tool performance degradation in TCM models is needed. Driving by the desire of improved machine uptime, high quality products, and reduced interventions of the machine operators, in this study, an adaptive GMM (AGMM)-based monitoring model is developed to implement online machine tool performance degradation assessment. GMM is based on unsupervised learning to organize itself according to the nature of the input data with complicated distribution (e.g., multimodal or nonlinear distribution). The use of GMM with multisensor signal fusion that monitors tool performance degradation without a prior knowledge of abnormal patterns is appealing in real-world applications. Based on the constructed baseline GMM for the normal state of tools, an adaptive GMM with dynamic learning rate, adaptive updating of model parameters, and merge and split of Gaussian components is developed to adaptively learn the dynamic changes of tool performance. A quanti?cation index, i.e., Kullback–Leibler (KL) divergence, based on similarity measurement between two probability density distributions of Gaussian components is further developed to online assess the tool performance degradation. The experimental results illustrate that the AGMM-based quanti?cation index is robust and effective for tool performance degradation assessment. The rest of the paper is organized as follows. Section 2 presents a system for tool performance degradation assessment. The features generation from multisensor signals and feature extraction by principal component analysis (PCA) are presented in Sec. 3. Section 4 proposes an AGMM-based quanti?cation index for tool performance degradation assessment. In Sec. 5, a series of tool data are used to evaluate the effectiveness of the proposed approach. Finally, concluding remarks are given in Sec. 6.
online adapted GMMs to calculate KL-divergence. The online performance assessment is to use AGMM updated on the baseline GMM to online adaptively learn the new performance states of tools, and then the quanti?cation index KL-divergence between the current AGMM and the baseline GMM constructed of?ine is calculated to implement online performance assessment. The detailed information about the proposed system will be presented in the following sections (i.e., Secs. 3 and 4).
3 Original Feature Generation and Feature Extraction
This section presents a brief discussion of original feature generations from time domain and wavelet domain as they will be used throughout the paper. The sensor signals are typically nonstationary because of the intermittent cutting process of milling. The time-frequency domain analysis is used to provide how the frequency changes with time. Wavelet transform is a very powerful tool in time-frequency domain analysis. The wavelet energy can represent the characteristics of signals and, thus, is used as input features of the proposed model, which is calculated as follows WEj ?
j 1X ?dn ?t??2 Nj k?1 j;k
Tool Performance Degradation Assessment System
In this section, we proposed an AGMM-based model for quantifying degradation propagation of tool performance. The system framework for the methodology is shown in Fig. 1, which includes two key parts, i.e., of?ine modeling and online tool performance assessment. The of?ine modeling is to use historic data from healthy tools to extract principal components (PCs) by PCA from original features and then to construct a baseline GMM, which describes the data distribution space of the healthy tools. In the of?ine modeling procedure, several key eigenvectors extracted by PCA are kept for PCs-based feature extraction in online phrase, and then the baseline GMM will be used to compare with the
where djn;k ?t? is the wavelet coef?cient localized at 2j k in scale 2j , and n is the oscillation parameter, where n ? 1; 2; …; Nj is the number of coef?cients at each scale. We used a wavelet function “db5” from Daubechies family wavelets  in ?ve levels and then generated wavelet energy of the ?rst two levels as two original features. In general, the selection of wavelet functions is dif?cult for different real-world applications. This issue has been addressed well in recent years [26,27]. We use the Daubechies system of wavelets, because it has the advantages of orthogonality, compact support in the time domain, and computational simplicity. Daubechies wavelets are often used and show their effectiveness in tool wear monitoring and diagnosis [28–30]. The prior experimental results show that the ?rst two-level wavelet energy features can represent all the generated wavelet energy features on the time-frequency domain and thus are used as two original features in this study. Time domain method usually involves statistical features that could be sensitive to tool wear, such as root mean square (RMS), kurtosis, skewness, crest factor, and peak-peak, impulse factor, and margin factor. Thus, this study takes a multidomain approach by involving the time and timefrequency domain features as inputs to the proposed model. The proposed method excludes other useful features, such as the peak magnitude in frequency-domain, other wavelet functions (e.g., morlet, haar)-based wavelet energy, etc. The technique presented
AGMM-based prognostics system for tool performance degradation assessment
031004-2 / Vol. 134, JUNE 2012
Transactions of the ASME
Table 1 Original features for tool performance assessment (x is signal and N is the number of samples) Domain Time domain Features RMS (Frms ) Kurtosis (Fk ) Formula RMS ? Kurtosis ?
N ?1 1X x2 N i?0 i
N ?1 1X ? 4 ?xi ? x ? =r4 N i?0 N ?1 1X ? 3 ?xi ? x ? =r3 N i?0
Skewness (Fs ) Crest factor (Fcf ) Peak-to-Peak (Fpp ) Impulse factor (Fif )
maxjxi j RMS Peak-to-peak ? xmax ? xmin Crest factor ? Impulse factor ? maxjxi j
N ?1 1X jxi j N i?0
Margin factor (Fmf )
Margin factor ?
N ?1 1X jxi j1=2 N i?0 N
j 1X ?d n ?t??2 Nj k?1 j;k
in this paper, while based on the nine initial features (see Table 1), is not limited to them only. This original feature set still has noises, and its dimension is still too big to be used as inputs of GMM. Due to the dimension reduction and global information preservation capability of PCA, it is used on the original features to extract the new effective features (i.e., PCs) as the inputs of GMM. The high-dimensional data space is reduced into low-dimensional data space but retaining majority global variation information in the projected data set. In general, GMM is dif?cult to describe probability density distribution of high dimension and sparse data. The reduction of data dimension by PCA alleviates these dif?culties due to reduction of the number of parameters to be determined in GMM.
where pm 2 ?0; 1??8m ? 1; 2; …; M? are the mixing proportions P subject to M m?1 pm ? 1. For the Gaussian mixtures, each component density p?xjhm ? is a normal probability distribution, where each component is denoted by the parameters hm ? ?lm ; Sm ?, the mean vector lm , and the covariance matrix Sm . We encapsulate these parameters into a parameter vector to get / ? ?p1 ; …; pM ; h1 ; …; hM ?. For the estimation problem we assume a training set X ? fx?1? ; …; x?n? g with n independent and identically distributed samples of the random variable x. Learning aims at ?nding the ? number of components M and the optimum vector /? ? ?p? 1 ; …; p M ; ? ? h1 ; …; hM ? that maximizes the likelihood function (i.e., log likelihood probability (LLP)) log?p?Xj/?? ? log
n Y i?1
4 Adaptive GMM-Based Tool Performance Degradation Assessment
Since no prior information is available regarding the wear severity at various stages during wear progression of tools, an adaptive GMM is developed to online learn those performance changes under assumption that only the healthy data are available. Different with those adaptive GMM algorithms proposed in Refs. [31,32], some new adaptive learning schemes, i.e., dynamic learning rate, split and merge of Gaussian components, and acceptance probability for an adapted GMM, are proposed to improve the adaptively learning performance of AGMM. The detailed information about the proposed model is presented in the following subsections (i.e., Secs. 4.1–4.4). 4.1 Gaussian Mixture Model. Let X ? ?X1 ; …; Xd ? be a d-dimensional random variable, with x ? ?x1 ; …; xd ? representing one particular outcome of X. It is said that X follows a ?nite mixture distribution when its probability density function p?x? can be written as a ?nite weighted sum of known densities. In cases where each component is the Gaussian density, X follows a Gaussian mixture /. A GMM p?xj/? is the weighted sum of M > 1 components p?xjhm ? p?xj/? ?
M X m? 1
The usual choice for obtaining the optimum vector /? of the mixture parameters is the EM algorithm . The usual EM consists of an E-step and an M-step. EM is a powerful statistical tool for ?nding maximum likelihood solutions to problems involving observed and hidden variables. 4.2 Adaptive Gaussian Mixture Model Algorithm. The basic concept for different adaptive Gaussian mixture learning approaches is best understood in terms of the recursive formulation with a learning rate schedule  h?t? ? ?1 ? b?t?? ? h?t ? 1? ? b?t? ? r?x?t?; h?t ? 1?? (4)
where the model, at time t, h?t?, is updated by a local estimate r?x?t?; h?t ? 1? is at a rate controlled by b?t?. For Gaussian components without updating, split and merge operations are implemented. This ensures that the effective learning for each component is applied throughout all the stages of the system learning. The details of the proposed AGMM algorithm are presented in Secs. 4.2.1–4.2. 4.2.1 The Basic Recursive Filter Algorithm. When observations change either gradually or abruptly, the GMM model will not be ?xed and will need to be updated. When the observation JUNE 2012, Vol. 134 / 031004-3
pm p?xjhm ?
Journal of Manufacturing Science and Engineering
changes rapidly, the updating rate should be large, whereas when the change is slow, and thus the essential observation information is valid for a long period, the updating rate of the model should be small in magnitude. Such updating scheme for learning factor b?t?
signi?cantly improves the convergence speed and model accuracy with almost no adverse effects. The selection criterion that determines which Gaussian components need to be updated online in AGMM is as follows
8m?1;:::;M Pm ?
8 > ?t? ?t? ?t? > > < pm p?X jhm ? > > > : 0
And the Gaussian component m is updated; if M t? jX?t? ? lm j X 1 jX?t? ? l? m j < t t jSm j M jSm j m? 1 And the Gaussian component m is not updated otherwise
In order to improve the updating ef?ciency, Winner-take-all approach is used. If the input X?t? matches the component i i ? arg maxfpj g
22.214.171.124 Split operation: M ! M ? 1 Suppose that the i th component is chosen to split, AGMM will generate two new components j0 and k0 from the current observations in the i th component. The parameter values of the components j0 and k0 at time t are set as pj0 ? bi ? pi
?t? ?t? ?t? ?t?1?
The parameters of the i th component are updated as follows pi ? ?1 ? bi ?pi
?t? li ?t? Si ?t? ?t? ?t?1?
pk0 ? ?1 ? bi ? ? pi lk 0 ? li
?t? Sk0 ?t? ?t?1?
lj0 ? mean?X?t? ?; (6) ?
?t? li ??X?t? ?t? Sj0
?t?1? ?1 ? q?li ? qX?t? ?t?1? ?1 ? q?Si ? q?X?t?
?t? li ?T
where bi ? bmax ? ?bmax ? bmin ? exp??jjX?t? ? /BMC ?t ? 1?jj= !! n 1 X ?j? Xtrain ? /BMC ?j? n j?1
t? q ? bi p?X?t? jh? m ? ?t? ?t? ?t?
where e, e0 , and e00 are some small and random perturbation vectors or matrices. Because the covariance matrices must be positive de?nite and Eq. (10) cannot ensure this, it de?nes them as follows Sj0 ? Sk0 ? det ?Si ?1=n In
?t? ?t? ?t?
126.96.36.199 Merge operation: M ! M ? 1. Suppose that the l th and m th components are selected to merge, the parameters of the merged component m0 is calculated directly from the original l th and m th components. In AGMM, the parameters of the component m0 are computed as follows  pm0 ? pl
?t? ?t? ?t?1? ?t?1? ? pm
bi is the learning factor of the Gaussian component i at time t, bmax and bmin are the predetermined maximum and minimum learning factors, respectively, /BMC is the best match component ?j? (BMC) of X?t? in AGMM, Xtrain is the j sample in the training data set, jj:jj is Mahanalobis distance, and q is the learning factor ?t? of p and bi . 4.2.2 Split and Merge of Gaussian Components. When those components in GMM do not match new observations, they usually overpopulate in some old regions based on those old observations but underpopulate in new regions based on new observations. The dif?culty of passing through some low likelihood regions prevents them from getting to the expected new regions. To overcome this problem, split and merge of components are implemented when some components cannot match the new observations. This enables AGMM to adapt the new observations by modifying the mixture model structure, not by moving components through the old regions, which is of great dif?culty and less effective. In this study, we choose the component with the least likelihood value to be split to adapt new observations Jsplit ?k? ? min L?hk ?
lm0 ? ?pl
?t?1? ?t?1? ?t? ?t?1? ?t?1? ll ? pm lm ?=pm0 ?t? ?t?1? ?t?1? ?t?1? ?t? ?t?1? ?t? Sm0 ? fpl ?Sl ? ?ll ? lm0 ??ll ? lm0 ?T ? ?t? ?t? ?t? t?1? t?1? ?t?1? ?t?1? ??S? ? ?lm ? lm0 ??lm ? lm0 ?T ?g=pm0 ? ?p? m m
(12) 4.2.3 Acceptance Probability. After implementation of the parameter updating and split and merge operation, the new GMM model /?t? is generated. The acceptance probability Pa for /?t? is proposed to prevent poor adaptation operation Pa ? 1= exp?L?/?t?1? ; X? ? L?/?t? ; X?? (13)
In this way, if the operation increases the value of model evaluation L (i.e., LLP), it will be accepted de?nitely. Oppositely, it will not be rejected directly but accepted via certain probability (if rand?t? < Pa , rand?t? $ U ?0; 1?). This mechanism enables the model to have certain jumping capability and simultaneously prevents the model from diverging to overly worse state. 4.3 The Procedure of AGMM Algorithm. In this section, we summarize the AGMM algorithm and further illustrate the algorithm ?ow as shown in Fig. 2. Step (1): Initialization: Set minimum Mmin and maximum Mmax number of allowed components in AGMM, set minimum bmin and maximum bmax limitation of b?t?, and set other parameters by the random initialization method. Set t ? 0 (t is a counter). Transactions of the ASME
The merge criterion  is used by AGMM as follows, de?ning that if the posterior probabilities of two components are similar, the merge operation happens Jmerge ?i; j? ? max Pi ?X?Pj ?X?T
031004-4 / Vol. 134, JUNE 2012
Flow chart of AGMM algorithm
Step (2): New observation input X?t?: The new observation moves in the moving window t, and old observation moves out the window to generate a new window vector X?t?. Step (3): Parameter updating of Gaussian component: Based on the new vector X?t?, calculate the updating criterion by using Eq. (5). The parameters of those Gaussian components matching the updating criterion are updated by using Eq. (6). To improve updating ef?ciency, Winner-take-all option where only a single best matching component is selected for parameter updating is typically used. Denote the new adapted model as /?t?0 . Step (4): Split operation: If the number of components in AGMM is less than Mmax and random probability rand?t? < 0:5, rand?t? $ U ?0; 1?, generate two components based on the selected component by using Eq. (10). Denote the new parameters as /?t?00 and go to step (5). Step (5): Merge operation: If the number of components in AGMM is larger than Mmin and rand?t? < 0:5, rand?t? $ U ?0; 1?, let the selected two candidate components merge into one new component by using Eq. (12). Denote the new adapted component as /?t?00 and go to step (6). Step (6): Acceptance sampling: Calculate the acceptance probability Pa by using Eq. (13), if rand?t? Pa , accept /?t?0 ? /?t?00 and go to step (7); otherwise, discard /?t?00 and go to (7). Step (7): Input new observation X?t ? 1? to AGMM and go to step (3) for updating AGMM model again. Journal of Manufacturing Science and Engineering
4.4 Tool Performance Assessment Index. Once the baseline GMM (GMMbs ) and the current adapted GMM (GMMadp ) are created, which describes the normal state and the current state of the tool performance, respectively, the quanti?cation index is provided to evaluate whether the current state of the tool is in degraded state. The quantifying index can be estimated by measuring the similarity (i.e., distance measure) between the two probability density functions (PDFs) described by the baseline GMM (GMMbs ) and the current adapted GMM (GMMadp ), respectively. The smaller the distance, the more similar are the two PDFs of the healthy state space and current state space. In this study, the KL divergence  is used as the degradation index of the distance, which is an information-theoretically motivated measure between two PDFs. The KL-divergence between two Gaussian component distributionsP (GA and GB ) with means lA and lB and covariances P A and B is X 3 ! j ?1 X X 6 B 7 6 7 KL?pA ?x?jjpB ?x?? ? 4log X 5 ? Tr j
B A A ?1 X B
? ?lA ? lB ?T
?lA ? lB ? ? N ?
JUNE 2012, Vol. 134 / 031004-5
Table 2 Run case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Experimental conditions Feed 0.5 0.5 0.25 0.25 0.5 0.25 0.25 0.5 0.5 0.25 0.25 0.5 0.25 0.5 0.25 0.5 Material 1-cast iron 1-cast iron 1-cast iron 1-cast iron 2-steel 2-steel 2-steel 2-steel 1-cast iron 1-cast iron 1-cast iron 1-cast iron 2-steel 2-steel 2-steel 2-steel
Depth of cut 1.5 0.75 0.75 1.5 1.5 1.5 0.75 0.75 1.5 1.5 0.75 0.75 0.75 0.75 1.5 1.5
the quanti?cation indication of tool performance degradation in this study.
Fig. 3 Distance components
Experiment and Result Analysis
where pA ?x? andP pB ?x ? are two distributions of GA and GB , ?1 P respectively, Tr ? A ? is to calculate the trace of the B P 1P matrix ? A , and N is the dimensionality of the input feature B vector x. This calculation theory of the proposed quanti?cation indication is presented in Fig. 3. If the two distributions overlap extensively, the KL-divergence will be near 0, which means that the tool performance is normal. Otherwise, if the two distributions rarely overlap, the con?dence value will become signi?cantly big, which means a certain abnormal situation happens. In order to improve the sensitivity and reliability of the KLdivergence to the slight degradation of tool performance, exponentially weighted moving average statistic with a weight constant 0.1 on the obtained KL-divergences is further used as
In order to investigate the effectiveness of the proposed system for machine tool performance assessment, a data set from a milling machine under various operating environments was used to implement this experiment [37,38]. The machine used in the experiments is a Matsuura machine center MC-510V for face milling. Tool wear was investigated in a regular cut as well as entry and exit cuts. Six sensors, namely, two AE sensors (one on the table and the other on the spindle), two vibration sensors (one on the table and the other on spindle), and two motor current sensors (ac spindle motor current and dc spindle motor current), were setup on machine to determine the state of tool performance. The data sampling rate is 250 Hz for all sensors and the data length is 9000 points for each sampling (i.e., each insert running). The sensor data were collected for each running. A high speed acquisition board (MIO-16 from National Instrument) with maximum sampling rate of 100 KHz was used for all sensor signals. A 70 mm face mill with six inserts using two types of inserts (Kennametal K420 and KC710) for rough operations. KC710 is
Fig. 4 Signals of one running sample (entry-milling-exit cutting procedure) from two sensors: (a) AE sensor on table and (b) vibration sensor on table
031004-6 / Vol. 134, JUNE 2012
Transactions of the ASME
coated with multiple layers of titanium carbide, titanium carbonitride, and titanium nitride (YiC/TiC-N/TiN) in sequence. For tool wear measurement, ?ank wear VB (i.e., the distance from the cutting edge to the end of the abrasive wear on the ?ank face of the tool) was chosen. The ?ank wear was observed during the experiment by taking the insert out of the tool at roughly every 3 min and physically measuring the wear with the help of a microscope. The machining conditions are listed in Table 2. There are 16 cases (i.e., tools) with varying number of runs. The number of runs is dependent on the degree of ?ank wear that was measured between runs at irregular intervals up to a wear limit (and sometimes beyond). A total of 167 experimental runs from the 16 cases were conducted with cutting speed 200 m/min, two different depths of cut 1.5 mm/min and 0.75 mm/min, two different feeds 0.5 mm/rev and 0.25 mm/rev, two different material cast iron and stainless steel J45, and an inserts of type KC710. All experiments were done a second time with the same parameters and a second set of inserts. The size of the workpieces was 483 mm ? 178 mm ? 51 mm. In this experiment, the main variables of operating conditions are the cutting parameters: i.e., the depth of cut and feed rate. The tests with the same material are selected as an experimental group, while the depth of cut and feed rate are varied. As a result, run cases 1–4 and 9–12 form the ?rst group (named G1), and run cases 5–8 and 13–16 form the second group (G2). The two groups are separated based on the difference of material (i.e., cast iron and steel). For G1, cases 1–4 and cases 9–12 further form two subgroups (named G11 and G12), respectively. For G2, cases 5–8 and cases 13–16 further form two subgroups (named G21 and G22), respectively. We considered the varying operating conditions to construct the proposed model for machine tool performance assessment. Thus, the developed model is more insensitive to the varying operating conditions, which is very important for utility of TCM in real-world applications. We considered two types of sensor signals, i.e., AE sensor signal and vibration sensor signal on table (i.e., AE-table and vib-table) to implement multiple sensor fusion. Approximately 9000 data points were collected for each milling run with entrymilling-exit cutting procedure. Figures 4(a) and 4(b) shows the AE signal and vibration signals under no wear situation, respectively. In this study, we only choose data during steady stage, i.e., milling stage to implement the experiment. In general, the control system of the machine can provide the starting time point of the milling and then recognize the steady stage. With the provided time point information of the control system, the data during the steady stage can be extracted more accurately from each sampling running of the tool. In this study, however, we only extracted 1024 ? 6 sample points from the 9000 data points of each milling run, which are in the rough steady stage of each run (see Fig. 4). Thus, six observation vectors with data length of 1024 points are generated for each milling run. It should be pointed out that only the normal data of each run case (i.e., the ?rst 15 observation vectors 15 ? 1024 from each tool) are used from the training data set of subgroup G11, G12, G21, and G22 as the baseline data set. Thus, four normal data sets Databsg11 , Databsg12 , Databsg21 , and Databsg22 with total 60, 60, 45, and 60 samples are generated from the four subgroups G11, G12, G21, and G22, respectively (for subgroup Databsg21 , the run case 6 is not used because it has only one time run). The four normal data sets are used as the baseline data to construct four baseline GMMs, respectively. The data of the whole life of each tool from one subgroup of G11, G12, G21, and G22 are input into AGMM to test its monitoring performance (some obvious outliers in each data set are replaced with their near samples).
Fig. 5 Inconsistent degradation patterns of RMS of AE and vibration sensors on table for the run cases 1–4 from subgroup G11: (a) RMS and (b) WE1
5.1 Tool Performance Degradation Assessment. In this section, the performance of AGMM-based KL-divergence is evaluated for degradation assessment of tool performance, and the Journal of Manufacturing Science and Engineering
comparison with regular parameters (i.e., RMS, wavelet energy, etc.) and LLP indication is also implemented. The typical features RMS and WE1 of the four tested tools of subgroups G11 on their whole life are ?rst presented in Fig. 5. The general trend of the RMS of vib-table sensor shows a decrease over time, while the general trend of the RMS of AE-table sensor shows an increase over time. However, it is very hard to observe the trend of RMS. For WE1, these are an unclear upward trend of AE-table and a clear downward trend of vibtable. Another important information revealed from Fig. 5 is the inconsistent degradation pattern of RMS and WE1. It should be pointed out that these typical features from other subgroups (i.e., G12, G21, and G22) show the similar features with those shown in Fig. 5. Although all test tools are of the same type and are tested under the same working condition, their RMS and WE1 trends still exhibit strong inconsistence. It would be very dif?cult to establish one feature-based deterministic model to accurately assess the tool performance wear states. Therefore, a practical approach to describe the tool performance degradation should JUNE 2012, Vol. 134 / 031004-7
Fig. 6 Scatter plot of training data with two principal components along with data distribution estimation of GMM for four baseline data sets: (a) Databsg11 , (b) Databsg12 , (c) Databsg21 , and (d) Databsg22
base upon the trend analysis and robust performance assessment that derive from the understanding of the historical behavior and in-process condition symptoms through effective multiple sensor fusion. Four GMM models are ?rst trained using the baseline data sets Databsg11 , Databsg12 , Databsg21 , and Databsg22 from the four subgroups G11, G12, G21, and G22, respectively. K-means was used to initialize the parameters of GMM, and the number of iterations for EM is 1000. Figure 6 presents the data distribution of the baseline data set and probability density approximation (i.e., the contour) of those Gaussian components in GMM. It can be easily seen from Fig. 6 that the baseline GMM characterizes the distributions of the healthy data space. From Fig. 6, several normal working modals exist according to the distribution of the healthy data. In real-world applications, the normal working conditions of tools could change, which results in multimodal distributions. After the construction of the baseline GMM using the healthy data set, it will continually adapt online when new samples from the tool are input to it. KL-divergence will be obtained through calculating the gap between the BMC of the current adapted GMM and that of the baseline GMM. Such testing is practical in real-world applications, where the normal data from the healthy tool are easy to collect, and then AGMM will adaptively learn the 031004-8 / Vol. 134, JUNE 2012
changes of tool performance in its residual life. Therefore, the modeling and testing scheme of the AGMM-based KL-divergence is close to the real working scenario of tools. Before performing AGMM, the key parameters of AGMM are set as follows: Mmin ? 2, Mmax ? 8, bmin ? 0:01, and bmax ? 0:10. The adaptation calculation of AGMM for new inputs is implemented and then degradation propagation of each tool in the four subgroups G11, G12, G21, and G22 is presented in Figs. 7(a)–7(d), respectively. It can be observed that the distribution of the testing feature space deviates gradually from that of the normal state, and the approximate results of AGMM are good for the degradation propagation of the tool performance. As shown in Fig. 7, the distribution of the early testing space and that of the baseline feature space extensively overlap. The BMC of the AGMMs and the BMC of the baseline GMM are also overlapped. Then, the BMC of the AGMM is deviating gradually from the BMC of the baseline GMM over time series ?ow. This approximate result of the AGMM approves its effective adaptive performance for dynamic propagation of tool performance states. To assess the degradation propagation of the tested tools, KLdivergences are calculated for each sample time point in the full life of each tool, and then a KL-divergence monitoring chart will be formed over time series ?ow. Figures 8(a)–8(d) present the Transactions of the ASME
Fig. 7 Scatter plot of samples data with two principal components along with data distribution estimation of AGMM in the full life of tools for four subgroups: (a) G11, (b) G12, (c) G21, and (d) G22
four KL-divergence charts for the full life of the used tools, respectively. It should be noted that each value of each point in Fig. 8 is the mean value of six input vectors for each run case (run case 6 is removed in Fig. 8(c), because it has only one run). From Fig. 8, it can be observed clearly that: (1) the tool performance degradation processes have been revealed obviously from health, slight degradation, to severe degradation (or failure). It can be observed that KL-divergence increases consistently as tool performance deteriorates continuously, and clearly presents the degradation propagation. (2) It is a short period from slight degradation to severe degradation occurrence of incipient wear. This means that we should take some timely maintenance measurements to respond prior to catastrophic failure once the slight degradation is beginning. (3) KL-divergence is capable of detecting the slight degradation of tools as early as possible. A signi?cant change will occur in KL-divergence charts when the slight degradation is happening in tools. This characteristic is very important to let operators know clearly that a slight performance degradation is happening, and thus an early alarm for slight degradation of tool is possible to have enough buffer time for maintenance and logistical scheduling. (4) In addition, although the life and failure modes are different from each test and each tool, KL-divergence still consistently depicts the tool degradation behavior in the Journal of Manufacturing Science and Engineering
whole run-to-failure test, i.e., for the performance states of all the tools, the KL-divergences are about near 0; for the slight degradation states, the KL-divergences are about more than 5, which is very important in the real-world applications because it provides a consistent degradation alarm and assessment scheme for different tools working under different environments. These unique robust features of AGMM-based KL-divergence indication facilitate a reliable tool performance degradation assessment. From the abnormal detection of view, the output (i.e., LLP) of the baseline GMM actually indicates how far away the current sample deriving from the healthy tool state is. Extremely low LLP value means that the current sample belongs to an unhealthy class. Thus, LLP can be used to quantify the deviation degree of current machine performance state. However, the baseline GMM will be ?xed once it is constructed based on the training data, and GMMbased LLP has no adaptive learning capacity. Thus, we further compare the performance between the adaptive GMM-based KLdivergence indication and the ?xed GMM-based LLP indication to illustrate the effectiveness of the proposed adaptive monitoring model. For purpose of comparison, the outputs (i.e., LLPs) of the baseline GMM for the full life of each tool are presented in Fig. 9. In comparison with KL-divergence, the LLP-based assessment for healthy period of tool is not robust because variation of LLPs is JUNE 2012, Vol. 134 / 031004-9
KL-divergence monitoring charts for full life of tools from four subgroups: (a) G11, (b) G12, (c) G21, and (d) G22
big. Meanwhile, LLP indications on the four charts show the inconsistence for these tested tools from the four subgroups, which could decrease the effectiveness of alarm triggering. The introduction of adaptation learning and KL-divergence in the proposed approach improves effectiveness of tool performance assessment, signi?cantly. 5.2 An Extensive Study for AGMM-Based Adaptive Monitoring. In this extensive experiment, we do not extract the normal data for of?ine modeling of the baseline GMM. Once the proposed model is used to monitor the performance states at any life phrase of the running tool, the sensor data from one or several sample runnings of the current running tool are used directly to construct the ?rst GMM (how many sample runnings are used is up to the parameter setup), and then the constructed GMM will adaptively learn the new states of the tool performance in its residual life. The current adapted GMM can be compared with all the historic adapted GMMs that are saved in a database to observe the performance degradation changes from the current time point to any historic time point. In this way, we can observe the performance degradation changes from the current time point to any historic time point, which improves engineering applicability of AGMM-based KL-divergence indication in real-world because 031004-10 / Vol. 134, JUNE 2012
the modeling and monitoring procedures are almost automatic without too many interventions of the machine operators. Without loss generalization, the data of the ?rst sample running from the running tool are used to model the ?rst GMM, and then AGMM will adaptively learn the new performance states of the tools. The newest adapted GMM is compared with all the historic adapted GMMs through calculating KL-divergence. Figures 10 and 11 present two KL-divergence charts when the ninth and the last running sample are ?nished on the run cases 1 and 3, respectively. The KL-divergence values on Figs. 10 and 11 are obtained through comparing the AGMM of the last time sample running with the AGMM of each historic sample running. It can be observed from Figs. 10 and 11 that KL-divergence decreases consistently as tool performance deteriorates continuously and presents the degradation propagation. Furthermore, the performance changes of the running tool from any historic time point to the current time point can be obtained effectively, which is important for operators to know the dynamic performance changes of tools and then to make effective maintenance measurements. Meanwhile, such modeling and monitoring scheme is more close to the requirements of the real-world application because of less interventions of the machine operators in comparison with the proposed scheme based on the comparison between the adapted GMMs and the baseline GMM. In real-world applications, the Transactions of the ASME
LLP monitoring charts for full life of tools from four subgroups: (a) G11, (b) G12, (c) G21, and (d) G22
Fig. 10 KL-divergence charts for the run case 1 at different running sample time points: (a) the ninth running sample is ?nished, and (b) the last running sample is ?nished
Journal of Manufacturing Science and Engineering
JUNE 2012, Vol. 134 / 031004-11
Fig. 11 KL-divergence charts for the run case 3 at different running sample time points: (a) the ninth running sample is ?nished, and (b) the last running sample is ?nished
proposed two modeling and monitoring schemes can be chosen according to the real requirements.
This paper has addressed tool performance assessment with the aim of avoiding unexpected failure that results in loss of production time and maintenance cost. A novel system is proposed for the tool performance degradation assessment. With multiple sensor signal fusion, an adaptive GMM is developed to online learn the dynamic changes of tool performance degradation. A similar measurementbased quanti?cation index (i.e., KL-divergence) is developed to effectively quantify the degradation states of tool performance. The experimental results illustrate that AGMM-based degradation assessment model is capable to describe the whole degradation propagation in the full life of tools. In comparison with regular feature parameters (e.g., RMS and wavelet energy) and LLP, KL-divergence shows better assessment results, i.e., the comprehensibility and effectiveness. Moreover, the effective adaptation learning of AGMM where no prior knowledge about the various failures is available makes the proposed system to be more useful in real-world applications. The experimental results illustrate the effectiveness of AGMM-based KL-divergence for machine tool performance assessment. With further extension and improvement of the proposed approach, on one hand, it can use some other feature extraction algorithms like kernel PCA, dynamic PCA, and recursive PCA to further improve the monitoring performance. On the other hand, it should be applicable to the performance assessment of other key machine components, such as bearing, gears, spindles, etc., because of their similarity to issues associated with the tool performance assessment.
This work was supported by the National Science Foundation of China (Grant No. 71001060) and the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20103108120010). The author is grateful to the BEST lab at UC Berkeley, USA, for providing the experimental data.
 Zhu, K. P., Wong, Y. S., and Hong, G. S., 2008, “Noise-Robust Tool Condition Monitoring in Micro-Milling With Hidden Markov Models,” Soft Comput. Appl. Ind., STUDFUZZ 226(23–46d), pp. 23–46.
 Noori-Khajavi, A., and Komanduri, R., 1993, “On Multisensor Approach to Drill Wear Monitoring,” CIRP Ann. Manuf. Technol., 42(1), pp. 71–74.  Wong, Y. S., Nee, A. Y. C., Li, X. Q., and Reisdorf, C., 1997, “Tool Condition Monitoring Using Laser Scatter Pattern,” J. Mater. Process. Technol., 63(1–3), pp. 205–210.  Dimla, D. E., Sr., and Lister, P. M., 2000, “On-Line Metal Cutting Tool Condition Monitoring I: Force and Vibration Analysis,” Int. J. Mach. Tools Manuf., 40(5), pp. 739–768.  Dimla, D. E., Sr., 2000, “Sensor Signals for Tool Wear Monitoring in Metal Cutting Operations—Review of Methods,” Int. J. Mach. Tools Manuf., 40(8), pp. 1073–1098.  Hutton, D. V., and Hu, F., 1999, “Acoustic Emission Monitoring of Tool Wear in End-Milling Using Time-Domain Averaging,” ASME J. Manuf. Sci. Eng., 121(1), pp. 8–12.  Toh, C. K., 2004, “Vibration Analysis in High Speed Rough and Finish Milling Hardened Steel,” J. Sound Vib., 278(1–2), pp. 101–115.  Mesina, O. S., and Langari R., 2001, “A Neuro-Fuzzy System for Tool Condition Monitoring in Metal Cutting,” ASME J. Manuf. Sci. Eng., 123(2), pp. 312–318.  Cho, S., Asfour, S., Onar, A., and Kaundinya, N., 2005, “Tool Break Detection using Support Vector Machine Learning in a Milling Process,” Int. J. Mach. Tools Manuf., 45(3), pp. 241–249.  Dutta, R. K., Kiran, G., Paul, S., and Chattopadhyay, A. B., 2000, “Assessment of Machining Features for Tool Condition Monitoring in Face Milling Using Arti?cial Neural Network,” J. Eng. Manuf., Proc. Inst. Mech. Eng. B, 214(7), pp. 535–546.  Wang, L., Mehrabi, M. G., and Kannatey-Asibu, E., Jr., 2002, “Hidden Markov Model-Based Tool Wear Monitoring in Turning,” ASME J. Manuf. Sci. Eng., 124(3), pp. 651–658.  Dutta, R. K., Paul, S., and Chattopadhyay, A. B., 2000, “Fuzzy Controlled Back Propagation Neural Network for Tool Condition Monitoring in Face Milling,” Int. J. Product. Res., 38(13), pp. 2989–3010.  Bhattacharyya, P., Sengupta, D., and Mukhopadhyay, S., 2007, “Cutting ForceBased Real-Time Estimation of Tool Wear in Face Milling Using a Combination of Signal Processing Techniques,” Mech. Syst. Signal Process., 21(6), pp. 2665–2683.  Chandrasekaran, M., Muralidhar, M., Murali Krishna, C., and Dixit, U. S., 2010, “Application of Soft Computing Techniques in Machining Performance Prediction and Optimization: A Literature Review,” Int. J. Adv. Manuf. Technol., 46(5–8), pp. 445–464.  Abellan-Nebot, J. V., and Subiro ? n, F. R., 2010, “A Review of Machining Monitoring Systems Based on Arti?cial Intelligence Process Models,” Int. J. Adv. Manuf. Technol., 47(1–4), pp. 237–257.  Owsley, L. M. D., Atlas L. E. and Bernard, G. D., 1997, “Self-Organizing Feature Maps and Hidden Markov Models for Machine-Tool Monitoring,” IEEE Trans. Signal Process., 45(11), pp. 2787–2798.  Luo, R. C., and Kay, M. G. eds., 1995, Multisensor Integration and Fusion for Intelligent Machines and Systems, Ablex Publishing Corporation, Norwood, NJ.  Ghosha, N., Ravib, Y. B., and Patrac, A., Mukhopadhyayc, S., Pauld, S., Mohantyd, A. R., and Chattopadhyay, A. B., 2007, “Estimation of Tool Wear During CNC Milling Using Neural Network-Based Sensor Fusion,” Mech. Syst. Signal Process., 21(1), pp. 466–479.  Chen, S. L., and Jen, Y. W., 2000, “Data Fusion neural Network for Tool Condition Monitoring in CNC Milling Machining,” Int. J. Mach. Tools Manuf., 40(3), pp. 381–400.
031004-12 / Vol. 134, JUNE 2012
Transactions of the ASME
 Ertunc, H. M., Loparo, K. A., and Ocak, H., 2001, “Tool Wear Condition Monitoring in Drilling Operations Using Hidden Markov Models (HMMs),” Int. J. Mach. Tools Manuf., 41(9), pp. 1363–1384.  Roth, J. T., Djurdjanovic, D., Yang, X., Mears, L., and Kurfess, T., 2010, “Quality and Inspection of Machining Operations-Tool Condition Monitoring,” ASME J.Manuf. Sci. Eng., 132(4), pp. 041015–(1–16).  Teti, R., Jemielniak, K., O’Donnell, G., and Dornfeld, D., 2010, “Advanced Monitoring of Machining Operations,” CIRP Ann.—Manuf. Technol., 59(2), pp. 717–739.  Li, X., Tso, S. K., and Wang, J., 2000, “Real-Time Tool Condition Monitoring Using Wavelet Transforms and Fuzzy Techniques,” IEEE Trans. Syst., Man, Cybern.-C: Appl. Rev., 30(3), pp. 352–357.  Camci, F., and Chinnam, R. B., 2010, “Health-State Estimation and Prognostics in Machine Processes,” IEEE Trans. Autom. Eng., 7(3), pp. 581–597.  Daubechies, J., 1988, “Orthonormal Bases of Compact Supported Wavelets,” Commun. Pure Appl. Math., 41(7), pp. 909–996.  Yan, R., and Gao, R. X., 2009, “Base Wavelet Selection for Bearing Vibration Signal Analysis,” Int. J. Wavelets Multiresolut. Inform. Process., 7(4), pp. 411–426.  Fu, S., Muralikrishnan, B., and Raja, J., 2003 “Engineering Surface Analysis With Different Wavelet Bases,” ASME J. Manuf. Sci. Eng., 125(4), pp. 844–852.  Tansel, I., Mekdeci, C., and Mclaughlin, C., 1995, “Detection of Tool Failure in End Milling With Wavelet Transformations and Neural Networks (WT–NN),” Int. J. Mach. Tools Manuf., 35(8), pp. 1137–1147.  Li, X., and Du, R., 2004, “Monitoring Machining Processes Based on Discrete Wavelet Transform and Statistical Process Control,” Int. J. Wavelets Multiresolut. Inform. Process., 2(3), pp. 299–311.
 Zhu, K., Wong, Y. S., and Hong, G. S., 2009, “Wavelet Analysis of Sensor Signals for Tool Condition Monitoring: A Review and Some New Results,” Int. J. Mach. Tools Manuf., 49(7–8), pp. 537–553.  Lee, D. S., 2005, “Effective Gaussian Mixture Learning for Video Background Subtraction,” IEEE Trans. Pattern Anal. Mach. Intell., 27(5), pp. 827–832.  Stauffer, C., and Grimson, W. E. L., 1999, “Adaptive Background Mixture Models for Real-Time Tracking,” Proceeding of Conference Vision and Pattern Recognition 1999 (CVPR99), Fort Collins, CO, June, 2, pp.246–252.  Dempster, A. P., Laird, N. M., and Rubin, D. B., 1977, “Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm,” J. R. Stat. Soc., 39(1), pp. 1–38.  Ueda, N., Nakano, R., Ghahramani, Z., and Hinton, G. E., 2000, “SMEM Algorithm for Mixture Models,” Neural Comput., 12(9), pp. 131–144.  Zhang, B., Zhang, C., and Yi, X., 2004, “Competitive EM Algorithm for Finite Mixture Models,” Pattern Recognit., 37(1), pp. 131–144.  Goldberger, J., Gordon, S., and Greenspan, H., 2003, “An Ef?cient Image Similarity Measure Based on Approximations of KL-Divergence between Two Gaussian Mixtures,” Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV’03), Nice, France, Oct., 1, pp. 487–493.  Goebel, K., 1996, “Management of Uncertainty in Sensor Validation, Sensor Fusion, and Diagnosis of Mechanical Systems using Soft Computing Techniques,” Ph.D. thesis, Department of Mechanical Engineering, University of California at Berkeley.  Agogino, A., and Goebel, K., 2007, “ “Mill Data Set,” BEST lab, UC Berkeley. NASA Ames Prognostics Data Repository,” NASA Ames, Moffett Field, CA, http://ti.arc.nasa.gov/project/prognostic-data-repository.
Journal of Manufacturing Science and Engineering
JUNE 2012, Vol. 134 / 031004-13