当前位置:首页 >> >>

Fuzzy automaton induction using neural network


International Journal of Approximate Reasoning 27 (2001) 1±26

www.elsevier.com/locate/ijar

Fuzzy automaton induction using neural network
A. Blanco, M. Delgado, M.C. Pegalajar
Received 1 September 2000; accepted 1 January 2001

*

n e Inteligencia Arti?cial, ETSI Informa tica, Departamento de Ciencias de la Computacio Universidad de Granada, Avenida de Andaluc ?a, 38, 18071 Granada, Spain

Abstract It has been shown that neural networks are able to infer regular crisp grammars from positive and negative examples. The fuzzy grammatical inference (FGI) problem however has received considerably less attention. In this paper we show that a suitable two-layer neural network model is able to infer fuzzy regular grammars from a set of fuzzy examples belonging to a fuzzy language. Once the network has been trained, we develop methods to extract a deterministic representation of the fuzzy automaton encoded in the network that recognizes the training set. ? 2001 Elsevier Science Inc. All rights reserved. Keywords: Recurrent neural network; Fuzzy recurrent neural network; Fuzzy grammatical inference; Fuzzy automaton

1. Introduction There is currently increasing interest in research ?elds such as speech recognition, object recognition in images [10], protein structure prediction [20], gene structure prediction, etc. A suitable technique for solving these problems is grammatical inference, which may be expressed in terms of extracting the grammatical rules or the associated productions to a grammar from an

*

Corresponding author. Tel.: +58-242-838; fax: +58-243-317. E-mail address: mcarmen@decsai.ugr.es (M.C. Pegalajar).

0888-613X/01/$ - see front matter ? 2001 Elsevier Science Inc. All rights reserved. PII: S 0 8 8 8 - 6 1 3 X ( 0 1 ) 0 0 0 2 8 - 7

2

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

example set. Note that the above-mentioned problems can be reformulated in terms of constructing a suitable tailored grammar from which the objects appearing in each case can be generated. To carry out the grammatical inference several techniques have been developed, including the arti?cial neural networks. In fact, the neural network approach focuses on obtaining the automaton that recognizes the example set, since the problems of inferring a grammar or the associated automaton are equivalent [16]. The ?rst models developed to do the crisp grammatical inference used feedforward topologies, which are limited to working with ?xedlength examples. Nowadays, Recurrent Neural Networks are widely used because they can handle sequences of examples with an arbitrary length [2,6,7,11,12,21,27,29,30]. In general, the grammatical inference using neural networks is carried out in two steps: 1. A neural network is trained by successive adaptation of its weights or the connections using the information provided by the examples. 2. Once the neural network is trained the automaton that is supposed to be encoded in its weights is extracted. The deterministic ?nite automata (DFAs) have shown to be suitable to model the dynamical systems, however, they are not appropriated to model a lot of actual applications that deal with vague or uncertain information. To model these systems it is necessary to use vague state transitions allowing that the system is in several states at the same time with di?erent degrees of certainty. We can model these systems with the Fuzzy Automata. Therefore, our problem consists of dealing with those cases where the available examples for the grammatical inference could have associated uncertainty or vagueness. In such cases fuzzy grammars [8] are able to express the uncertainty. As methods to infer these grammars are also needed, we propose neural networks for this task. Some neural networks are trained to compute the fuzzy membership degree [3±5,24]. In [24] instead of augmenting a second-order recurrent neural network with a layer having no linear output neurons for computing the fuzzy membership function uses an unique linear output neuron. This approach is less accurate and robust in the computation of the fuzzy membership degree. The comparison of both can be seen in [24]. On a related subject, the problem of encoding a fuzzy automaton into a recurrent neural network for the latter to simulate its behavior has also been studied [13,22,23]. In recent years, fuzzy grammars have been used to solve problems of great interest such as syntactic recognition of skeletal maturity from X-rays [25], detecting and quantifying fuzzy artery lesions from arteriograms [18], intelligent interface design [15], clinical monitoring [9], lexical analysis [19], etc. Our neural method therefore constitutes a possible tool for solving problems in the aforementioned areas or other related ones.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

3

The outline of this paper is as follows: in Section 2, de?nitions and theorems are provided. In Section 3, the neural model used for fuzzy grammatical inference is presented. The methods that we propose to extract the automaton that the trained network has encoded in its weights are presented in Section 4. Finally, some conclusions are provided. 2. Fuzzy regular grammars and fuzzy grammatical inference This section contains a summary of some basic de?nitions and theorems used in this paper. A more detailed presentation can be found in [8,26]. Similar to regular grammars, the fuzzy regular grammars are de?ned as follows: De?nition 1. A regular fuzzy grammar (RFG), G, is a four-tuple G ? ?N; T; S; P?, where N is a ?nite set of non-terminals, T is a ?nite set of terminals, N ? T ? Y, S P N is the starting symbol, and P is a ?nite set of productions h h of the form A 3 a or A 3 aB, A; B P N, a P T, h P ?0; 1?, where h is a membership degree associated to the considered production. Example 1. Let G ? ?N ; T ; S ; P ?, where · N ? fS ; A; Bg · T ? fa; bg · S is the starting symbol · P is the set of productions n 0:3 0:5 0:7 P ? S 3 aS ; S 3 aA; S 3 aB;

S 3 bS ;

0:3

S 3 bA;

0:2

A 3 b;

0:5

o 0 :4 B3b :

In the case of crisp regular grammars the strings belong or do not belong to some language regular, in the fuzzy case the strings have a membership degree to a fuzzy language. De?nition 2. The membership degree of a string x of T? in the regular fuzzy language L?G? l?x? is the maximum value of any derivation of x, where the value of a speci?c derivation of x is equal to the minimum weight in those productions chained to derive x.  ?  lG ?x? ? lG S A x ? max min ?lG ?S 3 a1 ?; lG ?a1 3 a2 ?; . . . ; lG ?am 3 x?? ?
SAx

Example 2. Let us consider the grammar G in Example 1, for which the membership degree of sequence ab is

4

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

 ?  lG ?ab? ? lG S A ab   0:5   0:7  0:5 0:4 ? max min S 3 a A ; A 3 b ; min S 3 a B ; B 3 b ?
S A ab S A ab

? max ?0:5; 0:4? ? 0:5: ? The fuzzy automata which we are interested in learning are de?ned as follows. ? De?nition 3. A ??nite fuzzy automaton (FFA) is a six-tuple, M ? ? ; Z ; Q; d; w; qS ?, where is a ?nite ? input alphabet, Z is a ?nite output alphabet, Q is a ?nite set of states, d : ?Q ? ?0; 1? 3 Q is the fuzzy state transition map, w : Q 3 Z is the output map, and qS P Q is the starting state. It should be noted that a regular fuzzy grammar as well as a ?nite fuzzy automaton is reduced to a conventional (crisp) one when all the production and transition degrees are equal to 1. De?nition 4. The set of the entire ?nite string formed by symbols in T plus the empty string of length 0 is denoted by T? . De?nition 5. A fuzzy language, L?G?, is a fuzzy subset of T? with associated membership function l : T? 3 ?0; 1?. As in the crisp case, there is also an equivalence between fuzzy ?nite automaton and fuzzy regular grammars [8]. Theorem 1. For a given fuzzy grammar G, there exists a fuzzy automaton M such that L?G? ? L?M? and vice versa. Note that the correspondence between non-terminal and terminal symbols and non-accepting and accepting FFA states, respectively is obvious. The transitions between states are weighted with the corresponding production weights h. Example ? 3. We consider the grammar G in Example 1, the fuzzy automaton M? ? ? ; Z ; Q; d; w; q0 ? (see Fig. 1) such that L?G? ? L?M? is · ? fa; bg · Z ? {Final, Non-Final} · Q ? f q0 ; q 1 ; q 2 ; q 3 g · The fuzzy state transition map: d?q0 ; a; 0:5? ? q1 ; d?q0 ; b; 0:3? ? q0 ; d?q2 ; b; 0:4? ? q3 : d? q0 ; b; 0: 2? ? q1 ; d? q0 ; a; 0: 7? ? q2 ; d? q0 ; a; 0: 3? ? q0 ; d? q1 ; b; 0: 5? ? q3 ;

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

5

Fig. 1. FFA M.

· The output map: w?A? ? \Non-final"; w?D? ? \Final" The ?nal states are represented by circles with a thicker edge (Fig. 1). · The starting state: q0 . Theorem 2. Given a finite fuzzy automaton M; there exists a deterministic finitestate automaton A with output alphabet Z  fh : h is ?a membership ? degree?g f0g that computes the membership functions l : T? 3 ?0; 1? of the language L?M? in the output of its states. Example 4. We consider fuzzy automaton M in Example 3. The DFA ? H H the H H equivalent A ? ? ; Z ; Q ; d ; w ; d0 ? is (Fig. 2): ? · ? fa; bg · Z H ? f0:0; 0:2; 0:3; 0:5g · QH ? fd0 ; d1 ; d2 ; d3 ; d5 ; d6 g · The state transition map: dH ?d0 ; a? ? d1 ; d ?d2 ; a? ? d3 ; dH ?d4 ; a? ? d3 ; dH ?d6 ; a? ? d3 ; wH ?d0 ? ? 0:0; wH ?d3 ? ? 0:0; wH ?d6 ? ? 0:2:
H

w?B? ? \Non-final";

w?C ? ? \Non-final";

dH ?d0 ; b? ? d5 ; d ?d2 ; b? ? d6 ; dH ?d4 ; b? ? d3 ; dH ?d6 ; b? ? d6 : wH ?d1 ? ? 0:0; wH ?d4 ? ? 0:3;
H

dH ?d1 ; a? ? d3 ; d ?d3 ; a? ? d3 ; dH ?d5 ; a? ? d3 ;
H

dH ?d1 ; b? ? d2 ; dH ?d3 ; b? ? d4 ; dH ?d5 ; b? ? d6 ;

· wH is the output map (membership function): wH ?d2 ? ? 0:5; wH ?d5 ? ? 0:0;

6

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 2. DFA A.

The FFA-to-DFA transformation algorithm can be found in [26]. The following corollary is a consequence of this theorem. Corollary 1. Given a regular fuzzy grammar G; there exists an equivalent h 1:0 grammar G in which the productions have the form A 3 a or A 3 aB. De?nition 6. Fuzzy grammatical inference (FGI), is de?ned as the problem of inferring the fuzzy productions that characterize a grammar or, equivalently, the states and transitions between states associated to an fuzzy automaton from a training example set. The examples are pairs such as ?Pi ; li ?, i ? 1; . . . ; n, n being the number of examples, Pi a symbol sequence and li the membership degree of Pi to the fuzzy language to which it belongs. Based on Theorem 2, the problem of obtaining the fuzzy automaton M is changed into extracting a deterministic ?nite automaton, A, that calculates the membership function to the fuzzy language for the output of its states. Fig. 3 shows a ?owchart of the fuzzy grammatical inference problem using neural networks. As can be observed, we start from a set of fuzzy examples that we use to train the neural network. Once the neural network has been trained, the task is to extract the automaton AH that the neural network has inferred by a suitable extraction method. This automaton will be equivalent to the original A that recognizes the language we are attempting to identify. To obtain the original automaton A we apply a minimizing method to the extracted automaton. Finally, the associated fuzzy grammar can be constructed.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

7

Fig. 3. Fuzzy grammatical inference.

3. Topology of the neural network The topology of the neural network model proposed to carry out the ?rst step in the fuzzy grammatical inference is shown in Fig. 4. It can easily be seen that the device is composed of two parts, a second-order recurrent neural network and an output layer. The recurrent neural network is formed of N recurrent hidden neurons, t labeled Sjt , j ? 0; . . . ; N ? 1, L input neurons, labeled Ik , k ? 0; . . . ; L ? 1 with 2 N ? L weights, wijk , i ? 0; . . . ; N ? 1, associated to the links of these neurons. At the output layer, we have M output neurons, OUTp P f0; 1g, p ? 0; . . . ; M ? 1 attached to the hidden-layer neurons by N ? M weights labeled upj , p ? 0; . . . ; M ? 1, j ? 0; . . . ; N ? 1. This neural network accepts an input sequence ordered in time. A symbol belonging to a sequence is seen in each unit of time. So, the neuron Sjt processes

Fig. 4. Neural network used for fuzzy grammatical inference.

8

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

the symbol in position t of the sequence. Each element in a sequence to be processed is sequentially encoded in the input neurons at each step in time t by means of Kronecker's delta. Let us assume that the alphabet is fa0 ; . . . ; aL?1 g. If the tth character in the sequence to be processed is ai , then it will be encoded in the input neurons exactly in time t by: Iit ? 1, Ijt ? 0 Vj ? 0; . . . ; L ? 1 being j T? i. Once the recurrent neural network has fully processed the input sequence, the value of the output neurons is obtained. In the output layer the membership degree of each processed sequence will be represented as explained below. More or fewer neurons are used depending on the level of accuracy desired in the output. If r is the accuracy required (r ? 0:1 if the accuracy is decimal, r ? 0:01 if is centesimal, etc.), we then create an output layer with M ? ?10? log?r? ? 1? neurons and activate the pth if the membership degree associated to the processed sequence is exactly p ? r. We assume a sequence x belonging to the language L?G? with membership degree 0.7 and the accuracy used is decimal (11 neurons in the output layer). This membership degree will be represented in the output neurons by the vector (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0). The dynamic of the network is summarized in the following steps: 0 1. Fix the initial values of recurrent hidden neurons at S0 ? 1 and 0 Si ? 0 Vi T? 0. 2. Determine an intermediate (provisional) value, Op , p ? 0; . . . ; M ? 1 for the output neurons using the following expressions: Op ? g?rp ?; rp ?
N ?1 ? i ?0

?1? ?2?

upi Sim ;

where Sim is the ?nal value of the Sit neurons once a sequence has been fully processed (m being the length of the input sequence). Sim is computed from Expressions (3) and (4). Sit?1 ? g?hti ?; hti ?
N ?1 ? L?1 ? j ?0 k ?0 t Ik ? Sjt ? wijk ;

?3? ?4?

where g denotes the sigmoidal function g ?x? ? 1 : 1 ? e??x? ?5?

Eq. (3) assessed in each neuron for each character belonging to the input sequence. Once a sequence has been processed Eq. (1) is computed for each neuron.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

9

3. Determine the ?nal output value of the neurons OUTp from the provisional Op by the strategy of ``all for the winner''. & 1 if Op ? maxfO0 ; . . . ; OM ?1 g; OUTp ? 0 otherwise: In the case of a tie, we propose the lexicographic criterion where the neuron with the lowest index is the winner. 4. Finally, the membership degree associated to the processed sequence is assigned according to the following procedure. Let i be the winner output neuron determined in the previous step. The membership degree associated to the processed sequence is lL?G? ?x? ? i ? r, where r assesses the desired accuracy. 3.1. Training procedure The training algorithm is an adaptation of the real-time recurrent learning (RTRL) [28]. The error function in each output neuron is 1 2 Ei ? ?Ti ? Oi ? ; 2 ?6?

where Ti is the desired value for the output neuron OUTi and Oi is the provisional value given by Eq. (1). The total error for the network is E?
M ?1 ? i?0

Ei :

?7?

The training algorithm updates the weights at the end of the presentation of each sequence when E > , where  is the error tolerance of the neural network. The modi?cation of the weights is given by Duij ? ?a o Ei ; ouij oE ; owlon ?8?

Dwlon ? ?a

?9?

where a is the learning rate. The partial derivative in Eq. (8) may be directly obtained as o Ei ? ?Ti ? Oi ? ? Oi ? ?1 ? Oi ? ? Sjm ; ouij ?10?

where Sjm is the ?nal value of neuron j once the network has processed the whole sequence. The derivatives associated to wlon are calculated by

10

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26
M ?1 N ?1   ? ? oE ?1 ? ?Tr ? Or ?gH ?rr ? urp gH hm p owlon r ?0 p ?0 4 5 m?1 N ?1 ? L?1 ? o S j m?1 m?1 m?1 ? dpl S0 In ? wpjk Ik : owlon j ?0 k ?0

?11?

Since obtaining oE=owlon requires a recursive calculation associated to oSjm?1 =owlon , we ?x an initial value of oSj0 ? 0: owlon ?12?

We can improve the training by introducing a momentum term g to take into account the increase immediately prior to the present one. The end value for the update of the weight is Dwijk ? g ? Dwold ijk ? ?1 ? g? ? Dwijk : ?13?

Once the network has been trained, if we represent the values taken by the recurrent hidden neurons at the end of the processing of each of the sequences, we can see that these values tend to form clusters, such that all the points in a certain cluster have the same membership degree (obtained at the output layer). Next, to illustrate the above developments, we present an experiment in which we show the behavior of a neural network when it learns a given set of examples. 3.2. Experiment 1 In this experiment our aim is to obtain the neural network that learns a set of examples belonging to a language. This experiment has been designed in the following way: 1. We start out from a fuzzy automaton M, from which we generate a set of examples recognized by M. 2. Construction of example set. Any example consists of a pair ?Pi ; li ? where Pi is a sequence of symbols and li is the membership degree to fuzzy language, L?M?. To build an example we carry out the following steps: (a) We obtain the length of the sequence. We generate a random number in the interval ?0; . . . ; max?, where max is the greatest length of any sequence. (b) We build the sequence of symbols. We randomly choose both the symbols from the alphabet and the sequence length that we are building. (c) We assign the membership degree to fuzzy language, L?M?, to the generated sequence. We process the sequence with the automaton M. We assign the membership degree computed by the automaton, li , to the sequence.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

11

3. Now, ``forgetting'' the initial automaton, M, our objective is to ?nd a neural network that learns the example set. Next, we develop the above steps. ? 1. Consider the FFA in Fig. 1, M ? ? ; Z ; Q; d; w; A?. 2. From this automaton an example subset belonging to its fuzzy language is constructed as follows: · We build a set of 200 random sequences formed by symbols of the alphabet, r ? fa; bg. · We process these sequences with the automaton M to associate them the membership degree to fuzzy language L?M?. 3. Now (``forgetting'' the initial automaton) our objective is to train a network to simulate (to identify) the automaton L?M? from which the set of 200 examples originates. After this, we graphically show that: · The values of the recurrent hidden neurons after processing each example tend to cluster. · All the points belonging to a cluster have the same membership degree computed by the output neurons. The neural network we have trained to learn the example set is composed of 2 recurrent hidden neurons and 11 neurons in the output layer, i.e., a decimal accuracy is required. The parameters for the network learning are: learning rate a ? 0:1, momentum term g ? 0:0 and error tolerance  ? 1:25 ? 10?7 . After the learning phase, the neural network correctly classi?ed all the examples. The weights of the links are: w000 ? 1:8335; w100 ? 2:4451; w001 ? 3:3293; w101 ? 0:1656; w010 ? ?4:4134; w011 ? 3:6870; w110 ? 3:6283; w111 ? 0:6572;

u00 ? 7:7207; u01 ? ?5:3779; u10 ? ?2:9861; u11 ? ?3:6717; u20 ? ?13:1605; u21 ? 1:7451; u30 ? ?2:2377; u31 ? 0:3205; u40 ? ?2:5178; u60 ? ?2:7711; u80 ? ?2:5826; u100 ? ?2:7506; u41 ? ?3:8479; u61 ? ?3:7368; u81 ? ?3:8162; u101 ? ?3:7466: u50 ? ?2:6343; u70 ? ?3:5444; u90 ? ?2:9185; u51 ? ?2:6843; u71 ? ?3:5633; u91 ? ?3:6905;

Fig. 5 shows the evolution of the error on the training process. The percentage evolving of correct answers throughout the training process is plotted in Fig. 6. The neural network provides a good result (95% correct answers) from cycle 5. The neural network learned all the examples in training cycle 432. Next, we graphically represent the values of the recurrent hidden neurons to process the example set. As observed in Fig. 7, the values tend to

12

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 5. Evolution of the error.

Fig. 6. Evolution of the percentage of successes.

cluster with all points within a given cluster having the same membership degree. Once the network is trained, the next task is to disclose the fuzzy automaton that the network has encoded into its weights. To do so we have developed two di?erent methods that can be used depending on the problem context and the user's requirements.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

13

Fig. 7. Plot of the values taken by the hidden neurons.

4. Extraction algorithms of fuzzy ?nite automata Next, we present two methods for extracting the automaton that the neural network has encoded into its weights, which is a generalization of the crisp method by Giles et al. [12]. In the second method, the extraction of FFA is done by means of a Kohonen self-organizing map. Both methods attempt to determine the cluster values formed by the hidden neurons as mentioned above, tend to associate depending on the membership degree (see Experiment 1). 4.1. Method 1: partitioning of the output space This algorithm constructs a sequence of nested partitions of the aforementioned value space such that each set in the ?nal one represents an automaton state. If the neural network has N hidden neurons it is obvious that for any input N their state values are represented by a point of ?0; 1? (remember that the activation function is continuous, speci?cally sigmoidal). Let us split the N ?0; 1? coordinate intervals into q (q P 2) subintervals. Then the state space is divided into exactly qN hypercubes. The key idea is that for some (optimal) q? the automaton states we are looking for are associated with some of the corresponding q?N hypercubes. Accordingly, the problem of the automaton extraction breaks down into two subproblems: 1. Determining the optimal q? . 2. Establishing a test to associate any automaton state to one of the q?N hypercubes.

14

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 8. Flowchart of trial-and-error algorithm.

To perform these two tasks we developed a trial-and-error algorithm whose ?owchart is shown in Fig. 8. Now the key task is to construct an automaton associated to a given q (denoted q-automaton). To that end we have developed a search algorithm described on a search tree with breadth ?rst search strategy. The nodes represent the states of the q-automaton and the links are transitions between states. The construction of the tree is carried out through the following procedure:  P1. The tree root represents the initial state of the q-automaton. Then it is 0 labeled by both the hypercube that contains the point (S0 ? 1, the 0 Si ? 0 Vi T? 0) and the corresponding membership degree computed at the output layer when it receives these values from the hidden recurrent neurons. Recall that this degree is encoded into an M-dimensional vector.  P2. Starting out from the current node associated to state Bi of the automaton, select a previously unprocessed symbol, n, from the alphabet fa0 ; . . . ; aL?1 g and introduce it into the trained network. After this, the hidden neuron values de?ne a point belonging to a given hypercube that we shall consider as the q-automaton state, Bj . The output of our trained network will be the membership degree of Bj . Within this general framework several particular situations can be considered: (a) When Bj , together with its membership degree, has already been visited, we only create the transition, d?Bi ; n? ? Bj , between the current state, Bi (current node) and the obtained state Bj . Thus, no new state is created in the automaton. Once this situation is reached, the tree is pruned at this node.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

15

(b) When a transition leads us to the same hypercube in which we ?nd ourselves ?Bi ? Bj ?, a cycle is created in the search tree, d?Bi ; n? ? Bi , whereat we also prune the tree at this node. (c) When the hypercube that has been obtained, Bj , together with the memberships given by the output neurons has not been visited before, a new state, Bj is e?ectively created in the q-automaton, as well as the corresponding transition d?Bi ; n? ? Bj .  P3. Repeat P2 until all the alphabet symbols are processed.  P4. If new nodes have been created whose hypercubes have not been visited before, then go back to step P2. The creation of new nodes means that new states have been created in the automaton making it necessary to study their transitions in step P2. Otherwise, the procedure ends. So, when this point is reached all the possible states and transitions of the automaton have already been extracted. Once the q-automaton has been extracted, the next task is to check whether it recognizes all the training examples or not. If the answer is a?rmative, we have found q? and thus the automaton we are looking for. In the case of negative answer the value of q is increased by one unit ?q ? q ? 1? and the above procedure is performed again. 4.2. Method 2: The Kohonen's self-organizing feature map Another alternative to cluster the state values of hidden recurrent neurons is to use Kohonen's self-organizing feature map (SOM) [17], a well-known automatic device for unsupervised learning. It is well known that the SOM is a neural network with two layers: an input layer and a competitive layer (in the form of a bidimensional grid). The SOM connections are from the input layer to the competitive layer. Each unit of the input layer is linked with all the units of the competitive layer. When an example is introduced into the input layer of the SOM, the units of the competitive grid compete to ?nd a winner unit. The winner unit will be such that its weight vector is the nearest (minimal distance) to the input vector. Based on this general idea, we have built a device of automaton extraction, directly linking the output of recurrent hidden neurons to the units of the SOM input layer. The resulting topology is shown in Fig. 9. More speci?cally, if the second-order recurrent neural network has N neurons at the hidden layer, the SOM we connect consists of: · An input layer of N units. N is the number of hidden neurons Sit of the recurrent neural network that we have previously trained. · A bidimensional grid of D ? D units. Parameter D will be obtained by trial and error, starting from D ? 2. Once the SOM is trained, a unit will represent a state of the automaton.

16

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 9. The topology of a recurrent neural network linked to a SOM.

· A weight array of D ? D ? N dimension. The weights associated to the unit ?j; k ? at the competitive layer can be expressed by a vector V?j;k? ? ?v?j;k?0 ; v?j;k?1 ; . . . ; v?j;k?i ; . . . ; v?j;k?N ?1 ? where v?j;k?i represents the connection of the input unit, i, with the competitive unit ?j; k ?. In its turn, the ?owchart of the trial-and-error algorithm proposed to obtain the SOM we are searching for appears in Fig. 10. The procedure to extract the automaton can be summarized in the following three steps: (a) Training the SOM. (b) Extracting the states and transitions of the automaton. (c) Computation of the membership function associated to the states of the automaton. Next, we will develop each of those steps.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

17

Fig. 10. SOM obtained that will represent the automaton.

4.3. Training Kohonen's self-organizing map As in the crisp case [1], the examples used to train the SOM are the vectors t t associated to the values of the recurrent hidden neurons, ?S0 ; S1 ;...; t t Si ; . . . ; SN ?1 ?, after processing each symbol of each of the training examples. The algorithm used can be described in the following steps: 1. Randomly initialize the weights between 0 and 1, v?j;k?i P ?0; 1?, j; k ? 1; . . . ; D, i ? 0; . . . ; N ? 1. Initialize the learning rate, a ? 0:9, and the neighborhood radius, c ? D. 2. Perform the following steps for each symbol, pt , belonging to each sequence of the training set, Ps ? fp1 p2 ? ? ? pt ? ? ? pm g. 2.1. Process this symbol, pt , by the recurrent neural network. Obtain the t t t output vector of the neurons of the hidden layer, ?S0 ; S1 ; . . . ; Sit ; . . . ; SN ?1 ? and use it as input in the SOM. 2.2. Determine the unit ?j; k ? of the SOM with the shortest distance, d?j;k? , between its weight vector and the output of the recurrent neural network, t t t ? S0 ; S1 ; . . . ; Sit ; . . . ; SN ?1 ? d ?l ; p ? ?
N ?1 ? ? i?0

Sit ? v?l;p?i

?2

Vl; p ? 1; . . . ; D:

2.3. After the winning unit is identi?ed, the next step is to identify the neighborhood around it. In this case, the neigborhood consists of the units that are within a square of side c centered on the winning unit ?j; k ?. The neighborhood is denoted by the set of units Nc (see Fig. 11).

18

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 11. Three di?erent neighborhoods are shown: c ? 1, 2, and 3.

2.4. Update the weights for all neurons in the neighborhood of the winning unit. The updated equation is   old t old vnew ?r ; s?i ? v ?r ; s?i ? a ? S i ? v ?r ; s?i for all units ?r; s? on the self-organizing map in the neighborhood Nc where 0 6 i 6 N ? 1. The weights of the winning unit are   old t old vnew ? v ? a ? S ? v ?j ; k ? i ?j ; k ?i i ?j ; k ?i : This adjustment results in the winning unit and its neighbors having their weights modi?ed becoming more like the input pattern. 3. Repeat Step 2 a pre-determined number of times (usually 20). 4. Set the next value of a at a ? a ? 0:1 and calculate the next neighborhood radius of the nodes in the SOM, c ? dD ? ?1 ? a?e (where dxe is the largest integer less than x). 5. If a P 0 go back to Step 2. Otherwise ?nish. 4.4. Extracting the automaton After training, the SOM has the states of the automaton we are looking for ``encapsulated'' in its nodes. The main hypothesis is that each node is associated with a possible state of the corresponding automaton. The e?ective algorithm to determine this automaton from the trained SOM is again based on a breadth ?rst search such that these nodes of the search tree will represent the states of the automaton and links between these nodes will correspond to the transitions between states. The construction of the tree is carried out in the following way: 1. The root of the tree, Fig. 12, that we are constructing has the following associated values:

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

19

Fig. 12. The root of the tree.

· A vector associated to the initial values of the recurrent hidden neurons, 0 0 0 ? S0 ; S1 ; . . . ; Si0 . . . ; SN ?1 ?. · A unit ?x; y ? of the SOM that is activated from the input 0 0 0 ? S0 ; S1 ; . . . ; Si0 ; . . . ; SN ?1 ? . This node of the tree represents the initial state of the automaton, q? x; y ?. 2. Starting out from the current node we are in (characterized by the vector ast t t sociated to the hidden neurons ?S0 ; S1 ; . . . ; Sit ; . . . ; SN ?1 ? and the unit ?j; k ?), carry out the following steps: 2.1. Introduce each symbol of the alphabet, al , l ? 0; . . . ; L ? 1, into the RNN. For each of the symbols al , the recurrent neurons will produce a t t t vector Xl ? ?S0 ; S1 ; . . . ; Sit ; . . . ; SN ?1 ?l that is taken as input in the SOM. This input, Xl , activates a unit ?m; n?l (Fig. 13). At this point, one of the following three things can occur: (a) The unit ?m; n?l has previously been activated, therefore no new states are to be considered in the automaton and only the transition d?q?j;k? ; al ? ? q?m;n?l is created now. (b) The unit ?m; n?l is activated for the ?rst time. We create a new state in the automaton, q?m;n? , as well as the associated transition, d?q?j;k? ; al ? ? q?m;n?l (c) The current node ?j; k ? is activated once again, ?j; k ? ? ?m; n?l . A cycle is thus produced and the tree must be pruned at this node. Then, only the transition d?q?j;k? ; al ? ? q?j;k? will be created.

Fig. 13. Node o?springs.

20

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

2.2. If there are units that have not been previously activated, go back to Step 2. If there are units activating for the ?rst time, new states have been created in the automaton. Therefore, the associated transitions to these states must be created. Otherwise, ?nish because all the states and transitions of the automaton have been calculated. The procedure ends when no new nodes are activated and, therefore, no new states are created. At this moment, all the transitions and states have been determined from the SOM. The ?nal task is to associate to each state of the automaton extracted the membership degree that will represent the uncertainty with that the automaton recognizes the sequence as belonging to fuzzy language. 4.5. Calculation of membership function To compute the membership function associated to the states of the automaton we process the example sequences by the recurrent network, and the outputs produced by the hidden neurons are used as inputs in the SOM. This produces the activation of a unit. The membership degree of the processed sequence to the sequence to the fuzzy language will be associated to this unit. Once this has been done, the automaton states already have their associated membership degree due to the relation of each unit belonging to SOM with the automaton states. Next, we apply the two algorithms of extracting automata to neural network obtained in Experiment 1. 4.6. Experiment 2 We start from the results of the above Experiment 1, i.e., from the neural network obtained in Experiment 1 that recognizes all the examples used for its training. From this neural network we extract the automaton encoded in its weights. As shown in Section 2, we do not extract the fuzzy automaton, M (Fig. 1), but its equivalent DFA, A (Fig. 2). In the output of their states, this DFA calculates the membership degree associated to fuzzy language to which the example set belongs. Next, we calculate the encoded DFA in the neural network using the two methods of extraction. The extracted automata will be equivalent to the original A, which can be veri?ed by applying a method of minimization of fuzzy states to obtain the original DFA A. The DFA, AH , we are searching for is equivalent to the fuzzy automaton in Fig. 1. Applying Theorem 2 to fuzzy automaton M in Fig. 1, we obtain an equivalent DFA, A, that appears in Fig. 2. Our extracted DFA, AH , must be equivalent to this.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

21

4.7. Results obtained by Method 1 We start from a quantization level q ? 2 but the extracted automaton does not recognize all the example sets. Therefore, we increase the quantization level by one unit to ?nd a level that provides an automaton that recognizes all of the examples, which turns out to be q? ? 7 (Fig. 14). A quantization level of q? ? 7 provides us with the automaton, AH , which recognizes 100% of the training examples. The extracted automaton appears in Fig. 15. In this case, automaton AH is the same as A, i.e., the original automaton is directly obtained.

Fig. 14. Partitioning of space.

Fig. 15. Automaton AH inferred by the network.

22

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 16. The search tree.

We have managed to infer an equivalent automaton, AH to original automaton, M, from the set of examples by training a neural network formed by two recurrent neurons and by subsequently applying the extraction algorithm based on the partition of the space where the recurrent neurons take their value. The search tree resulting from the algorithm is plotted in Fig. 16. 4.8. Results obtained by Method 2 We start from an SOM formed by 2 ? 2 unit. The automaton obtained does not recognize all the example sets, so we increase the size of the grid by one unit ?3 ? 3? and once again apply the algorithm. This automaton does not recognize the examples either. The automaton that correctly classi?es the training set was extracted from an SOM consisting of 4 ? 4 units. The parameters used in the training of the SOM were a momentum of g ? 0:5 and a learning rate of a ? 0:5. The weights of the trained SOM are as follows: v?1;1?0 ? 0:9890; v?2;1?0 ? 0:8622; v?3;1?0 ? 0:0200; v?1;2?0 ? 0:9654; v?2;2?0 ? 0:8622; v?3;2?0 ? 0:0772; v?1;3?0 ? 1:0000; v?2;3?0 ? 0:8622; v?3;3?0 ? 0:0135; v?1;4?0 ? 0:9877; v?2;4?0 ? 0:8622; v?3;4?0 ? 0:0140;

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

23

v?4;1?0 ? 0:2664; v?1;1?1 ? 0:6480; v?2;1?1 ? 0:9202; v?3;1?1 ? 0:9858; v?4;1?1 ? 0:9916;

v?4;2?0 ? 0:3500; v?1;2?1 ? 0:5413; v?2;2?1 ? 0:9202; v?3;2?1 ? 0:9957; v?4;2?1 ? 0:9869;

v?4;3?0 ? 0:2809; v?1;3?1 ? 0:0000; v?2;3?1 ? 0:9202; v?3;3?1 ? 0:9731; v?4;3?1 ? 0:9910;

v?4;4?0 ? 0:2448; v?1;4?1 ? 0:4250; v?2;4?1 ? 0:9202; v?3;4?1 ? 0:9782; v?4;4?1 ? 0:9920:

The labels of the nodes are shown in Table 1. Observe that nodes (1, 4), (2, 2), (2, 3), (2, 4), (4, 1) and (4, 3) are not used in the automaton since they are not activated by any of the inputs received. The search tree resulting from the algorithm is plotted in Fig. 17.

Table 1 Nodes activated and membership degrees Node (1, 1) (1, 2) 0 (1, 3) 0 (2, 1) 0 (3, 1) 0.2 (3, 2) 0.2 (3, 3) 0.2 (3, 4) 0.2 (4, 2) 0.5 (4, 4) 0.3 Membership 0 degree

Fig. 17. The search tree resulting from applying the extraction tree.

24

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

Fig. 18. Automaton extracted by the extraction method.

The automaton obtained, AHH , from applying the extraction algorithm is shown in Fig. 18. By means of a minimizing algorithm of fuzzy states, we obtain the original DFA automaton, A, from AHH and hence, we have also achieved with this method to infer an equivalent to original fuzzy automaton, M. 5. Conclusions The second-order recurrent neural network (SORNN) is a good technique to perform crisp grammatical inference. However, it is not suitable when working with fuzzy examples. An appropriate topology, made up of an SORNN linked to an output layer performs well for realizing fuzzy grammatical inference. This topology tolerates the fuzzy examples and furthermore is able not only to induce the grammatical rules de?ning such examples but also to recognize their associated fuzzy nature. During the training, the neural network encodes the automaton in its weights; its behavior is similar to the crisp case since it clusters the output of the hidden neurons. To extract the automaton induced we have proposed two alternative algorithms that have experimentally shown themselves to be e?ective. References
[1] M. Blanco, C. Mantas, M.C. Pegalajar, Extracting rules from recurrent neural network using Kohonen network, in: Proceeding of the Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'96), vol. 3, 1996, pp. 1055±1060. [2] A. Blanco, M. Delgado, M.C. Pegalajar, A genetic algorithm to obtain the optimal recurrent neural network, Internation Journal of Approximate Reasoning 23 (2000) 67±83.

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

25

[3] A. Blanco, M. Delgado, M.C. Pegalajar, Extracting rules from a (fuzzy/crisp) recurrent neural networks using a self-organizing map, International Journal of Intelligent Systems 15 (7) (2000) 595±621. [4] A. Blanco, M. Delgado, M.C. Pegalajar, Identi?cation of fuzzy dynamic systems using maxmin recurrent neural networks, Fuzzy Set and Systems, 2000, in press. [5] A. Blanco, M. Delgado, M.C. Pegalajar, A real-coded genetic algorithm for training recurrent neural networks, Neural Networks, 2000, accepted. [6] M. Casey, The dynamics of discrete-time computation, with application to recurrent neural networks and ?nite state machine extraction, Neural Computation 8 (6) (1996) 1135±1178. [7] A. Cleeremans, D. Servan-Schreiber, J. McClelland, Finite state automata and simple recurrent networks, Neural Computation 1 (3) (1989) 372±381. [8] D. Dubois, H. Prade, Fuzzy sets and systems: theory and applications, Mathematics in Science and Engineering, vol. 144, pp. 220±226, Academic Press, New York, 1980. [9] S. Friedrich, P.A. Klaus, Clinical monitoring with fuzzy automata, Fuzzy Sets and Systems 61 (1994) 37±42. [10] K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice-Hall, Englewood Cli?s, NJ, 1982. [11] C.L. Giles, C.W. Omlim, Inserting rules into recurrent neural network, in: Proc. IEEE Workshop Neural Networks Signal Process, Copenhagen, Denmark, 1992, pp. 13±22. [12] C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, Y.C. Lee, Learning and extracting ?nite state automata with second-order recurrent neural networks, Neural Computation 4 (1992) 393±405. [13] C.L. Giles, C.W. Omlim, K.K. Thornber, Equivalence in knowledge representation: automata, recurrent neural networks, and dynamical fuzzy systems, in: Proceedings of the IEEE, 1999, accepted. [14] S. Hikmet, Fuzzy command grammars for intelligent interface design, IEEE Transactions on Systems, Man, and Cybernetics 22 (5) (1992) 1124±1131. [15] J.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, MA, 1979. [16] T. Kohonen, The self-organizing map, Proceedings of the IEEE 9 (1990) 1464±1480. [17] A. Lalande, M. Jaulent, A fuzzy automaton to detect and quantify fuzzy artery lesions from arteriograms, in: Proceedings of the Sixth International Conference IPMU'96 (Information Processing and Management of Uncertainty in Knowledge-Based Systems), vol. 3, 1996, pp. 1481±1487. [18] M. Alexandre, S. Arto, S. Kai, Y. Sheng, Lexical analysis with a simple ?nite-fuzzy-automaton model, Journal of Universal Computer Science 1 (5) (1995) 292±311. [19] Q. Ning, J.S. Terrence, Predicting the secondary structure of globular proteins using neural network models, Journal of Molecular Biology 202 (1988) 865±884. [20] C. Omlin, C. Giles, Extraction of rules from discrete-time recurrent neural networks, Neural Networks 9 (1) (1996) 41±52. [21] C. Omlin, K.K. Thornber, C. Giles, Fuzzy ?nite-state automata can be deterministically encoded into recurrent neural networks, IEEE Transactions on Fuzzy Systems 6 (1) (1998) 76±89. [22] C. Omlin, K.K. Thornber, C. Giles, Representation of fuzzy ?nite-state automata in continuous recurrent neural networks, in: Proceedings of the IEEE International Conference on Neural Networks (ICNN'96), IEEE Press, Picataway, NJ, 1996, p. 1023. [23] C. Omlin, C. Giles, K.K. Thornber, Equivalence in knowledge reprepresentation: automata, recurrent neural networks, and dynamical fuzzy systems, in: L.R. Medsker, L.C. Jain (Eds.), Recurrent Neural Networks Design and Applications, The CRC Press International Series on Computational Intelligence, 2000, pp. 99±131.

26

A. Blanco et al. / Internat. J. Approx. Reason. 27 (2001) 1±26

[24] P. Amita, K.P. Sankar, Fuzzy grammars in sintactic recognition of skeletal maturity from X-rays, IEEE Transaction on Systems, Man and Cybernetics SMC-16 (5) (1986) 657±667. [25] M. Thomason, P. Marinos, Deterministic acceptors of regular fuzzy languages, IEEE Transactions on Systems, Man, and Cybernetics (3) (1974) 228±230. [26] R.L. Watrous, G.M. Kuhn, Induction of ?nite-state languages using second-order recurrent networks, Neuronal Computation 4 (1992) 406±414. [27] R.J. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Computation 1 (2) (1989) 270. [28] Z. Zeng, R. Goodman, P. Smyth, Learning ?nite state machines with self-clustering recurrent networks, Neural Computation 5 (6) (1993) 976±990. [29] Z. Zeng, R. Goodman, P. Smyth, Discrete recurrent neural networks for grammatical inference, IEEE Transactions on Neural Networks 5 (2) (1994) 320±330.



相关文章:
更多相关标签: