Feature Extraction for Low Bit Rate Image Coding Using a Generalized Wavelet Transform
Roland Wilson, Andrew D Calway, Tao-I Hsu and Peter R Meulemans, Computer Science Department,University of Warwick, Coventry CV4 7AL,ADC,Department of Computing Mathematics, University of Wales, College of Cardi ,PO Box 916,Cardi CF2 4YN. February 8, 1996
This paper describes how a generalisation of the wavelet transform - the Multiresolution Fourier Transform MFT - can be used in model-based coding, in which image features including boundary contours and textures can be extracted directly from the transform coe cients. As such, it has the potential to extend transform coding to very low bit rate, feature-based compression. Results of the work are presented to demonstrate the e ectiveness of the methods.
ABSTRACT
1 INTRODUCTION
There has recently been considerable interest in transform or sub-band coding methods based on wavelet transforms 1 , which are now proving to be at least as e ective as more conventional DCT methods, such as the normal JPEG implementation 2 . There is little evidence, howevere, that signi cantly higher compression can be achieved by any simple change of co-ordinates: there is a limit to what can be done by redistributing image energy. On the other hand, there have been various attempts over the years to try to base compression techniques directly on visually important image features, such as boundary contours or textures 3, 4 , with results of variable quality, in part because of the di culty of reliably extracting such features and in part because of lack of an adequate set of feature models. Nor has it proved easy to extend such methods to image sequences. The work reported here is an attempt to tackle the problem of modelling images in a way which better re ects the visually signi cant features, but which has the robustness and generality to overcome the limitations of earlier methods. It is based on an image representation called the Multiresolution Fourier Transform MFT, an overcomplete image representation which combines the wavelet and windowed Fourier transforms in a uni ed framework. The resulting representation is an abstraction of the multiscale representation apparently used in the mammalian visual cortex 5, 6 , which greatly simpli es the modelling of image features and the estimation of model parameters from 1
global structure
local structure
Figure 1: Multiresolution doubly a ne image model.Global structure is expressed by relations between neighbouring quadtree leaf nodes. Local model represents structure within each region de ned by a node. image data. This paper presents a summary of the basic properties of the MFT and the associated modelling techniques, which will illustrate its e ectiveness in capturing the image information which is important to viewers. Fuller descriptions of the work can be found in the references 7, 6, 10 .
2 IMAGE MODELLING BASED ON THE MFT
In order to model image features such as boundary contours and textures, two aspects of structure have to be captured: local structure relating the luminance at one pixel to that of its neighbours and global structure, which typically re ects the large scale properties of the visible surfaces in the scene. Clearly, any adequate model of images must re ect both aspects and lead to estimation procedures which are robust and computationally e ective. This combination of local and global processing requires an e ective way of dealing with a range of spatial scales, which can best be accomplished with a quadtree spatial tessellation cf. 8 . The global structure is then represented by means of interaction between leaf nodes in the quadtree, while the local model describes the behaviour of the pixels within the region corresponding to each leaf node. The general form of such models is illustrated in Fig. 1, which shows a schematic tessellation of an image region de ned by a quadtree. It also indicates the general form of the mage model, which may be put in a more formal setting in terms of the following three components: 1. The neighbour model is a geometric one relating the image in a given patch centred at ~ points i in the plane ~ ~ ~ ~ xi = wk kx , i 1 ~ ~ where x is the image intensity and wk k is a window function, by the doubly a ne 2
model
~ xi =
X
j 2Ni
ij j ij
~ ~ x T ij + i
2
3 ~ and i is a random process representing the residual error. This model is a ne both in terms of the co-ordinates, through T ij and in terms of the signal space, through ij and ~ ij . In the experiments undertaken so far, the patches correspond to image regions of the same size, ie. at the same level in the quadtree, but this is not a necessary restriction. The key point is that the relation between the two patches is a geometric one. For example, one patch might represent a boundary contour, while two of its 4-neighbours may represent the same contour, but having a di erent orientation locally, so that the matrix Aij would correspond to a rotation, as illustrated in Fig. 2. 2. The innovation model must represent the local structure among pixels and may be of various forms. The simplest general category is the linear models, de ned by 4 i = Bivi where i is the vector representing the residual error in the patch, as in 3 , B i is a matrix de ning the basis vectors for the patch and vi is a vector of white noise samples. The matrix B i will typically be parameterised to such properties as the orientation and location of the features in the patch. 3. The scale selection model is a binary Markov process de ned on the quadtree 7, 8, 9 , which is de ned by the conditional probabilities P i; j; kji=2; j=2; k , 1, which are the probabilities that a given node i; j; k in the tree is a leaf node, given the state of its `father' at scale k , 1, the node covering four siblings at level k. In general, the image structure is de ned by choice of an appropriate scale for each region of the image, such that the image intensity within that region can be adequately described by a combination of `warping' of neighbouring patches and a random component expressing the error between the a ne approximation and the intensities in that region. In e ect, this is a combination of a conventional stochastic model 11 , a `fractal' model 12 and a scale-space model 13 . Work so far has concentrated on two particularly simple examples of this model. The rst is a model of image boundaries based on a `rotation+translation' neighbour model and an innovation model which is a rst order Markov process de ned in the Fourier domain. Maximum Likelihood ML estimation procedures have been found to estimate the three key model parameters: orientation and the spatial location of the edge 7 . This work has been extended to nd an e ective procedure for identifying the scale 14 and for estimating the neighbour model parameters - in this case a rotation angle and translation. The second example is a model of texture based essentially on the neighbour model - in this case a full a ne transform, which allows not only rotation, but also scaling in two directions and shear. Model estimation proceeds by separating the translation ~ij , estimating the linear transform and constructing an approximation based on the co-ordinate transformed patch as in 3. In a homogeneous texture eld, one patch at a given scale can be used to synthesise the entire image in this way.
ij
where Ni is the neighbour set of patch i, transform
is a scaling factor, T ij is an a ne co-ordinate
~ ~ T ij = Aij +
3
rotation + shift of origin
rotation + shift of origin
3 ESTIMATION RESULTS
Figure 2: Edge neighbour model, showing neighbour relations.
The image model described above requires a novel estimation procedure because it contains three coupled components. The general estimation procedure is illustrated in Fig. 3. The rst step is estimation of the parameters of the local model, which describes the signal structure within the window. This is followed by a decision as to the correct scale of representation, based on the t between the model and the windowed data: the largest scale giving a satisfactory t is used. The nal step is estimation of the transformation between neighbouring leaves of the quadtree to obtain the global structure. The residual error is then obtained from 2. The results of Figs. 4-7 show the application of this general paradigm to boundary extraction and texture description. In both cases, the representation used is the MFT, which can be expressed in 2 , D as ~ x; ~ ; = 1=2 d~ x ~ wk vec , k exp ,|~ :~ ^~! ! 5
Z
~ In practice, the signal is sampled, as is the MFT, in space, , frequency, ~ and scale , with a dyadic ! sampling pattern for the spatial sampling as a function of scale which is identical to the dyadic wavelet transform. In e ect, the MFT uses a quadtree sampling structure to de ne spatial patches, but unlike a conventional quadtree, there is a 50 overlap between adjacent patches, with a window which is approximately a cosine Hamming window. This guarantees a stable inverse and, most importantly, ensures that the resampling needed for estimating the coe cients of a `warped' patch can be calculated using simple bilinear interpolation 7, 10 . Fig. 5 shows the local structure estimates derived from the image of Fig. 4, along with the selected scales, illustrating the quadtree tessellation. In this case, the local model is the FourierMarkov boundary model 7 . Fig. 6 shows the boundary contours obtained by neighbour processing. In Fig. 7, the original and syntheses at three scales are shown for a texture from the Brodatz album reptile skin. In this case, only the neighbour model is used, but the same `prototype patch' is used to
4
Identify local model assuming no global structure ( α =0)
ij
Select scale using model fit to windowed image
Determine neighbour model parameters and hence residual
Figure 3: General model identi cation procedure. synthesise the entire image: every block at a given scale is regarded as a neighbour of the proptotype block. Note that the synthesis quality improves as the scale of the synthesis is reduced, but this corresponds to a higher rate, since each patch requires 6 a ne parameters and the magnitude scale factor ij and the smaller the scale, the greater the ratio of parameters to pixels 10 .
4 CONCLUSIONS
The work reported here describes a new class of image models which is able to express both local and global structure, using a combination of stochastic and geometric elements. It has been established that a multiresolution image representation called the MFT is a suitable vehicle for estimating model parameters and synthesis of images from the parametric descriptions. In this respect, it represents a new way of coding images, which has the generality of an advanced transform method, such as wavelet packets 15 , but with the representational power of a model-based method. Work in hand is designed to establish whether this potential can be realised in a practical data compression system.
References
1 M. Antonini, M. Barlaud, P. Mathieu and I. Daubechies, Image coding using wavelet transform",IEEE Trans. Image Process., 1, pp. 205-220, 1992. 2 G. K. Wallace, The JPEG still picture compression standard",Commun. of th e ACM, 34, pp. 30-44, 1991. 3 D.N. Graham, Image Transmission by two-dimensional contour coding",IEEE Proc., 55, pp. 336-346,1967. 5
4 M. Kunt, A. Ikonomopoulos and M. Kocher, Second Generation Image Coding Techniques",IEEE Proc., 73, pp. 549-574 , 1985. 5 D.H. Huble,Eye, Brain and Vision, New York, Freeman, 1988. 6 R. Wilson, A.D. Calway and E.R.S. Pearson, A generalized wavelet transform for Fourier analysis: the multiresolution Fourier transform and its application to image and audio signal analysis",IEEE Trans. Inform. Theory, 38, pp. 674- 690, 1992. 7 A.D. Calway, The Multiresolution Fourier Transform: A General Purpose Tool for Image Analysis, Ph.D. Thesis, Univ. of Warwick, Ph.D. Thesis, 1989. 8 R. Wilson, M.P. Todd and A.D. Calway, Generalised quad-trees : a uni ed approach to image analysis and coding", Proc. 5th SPIE Conf. on Vis. Commun., pp.619-26, Lausanne,1990. 9 M. Basseville, A. Benveniste, K.C. Chou, S.A. Golden, R. Nikoukhah and A.S. Willsky, Modelling and estimation of multiresolution stochastic processes", IEEE Trans. Inform. Theory, 38, pp. 766-785, 1992. 10 T-I Hsu,Texture Synthesis and Analysis Using the Multiresolution Fourier Transform, Ph.D. Thesis, Univ. of Warwick, 1994. 11 A.K. Jain,Fundamentals of Digital Image Processing, Englewood Cli s, Prentice Hall, 1989. 12 M.F. Barnsley,Fractals Everywhere, Boston, Academic Pr., 1988. 13 S.C. Clippingdale,Multiresolution Image Modelling and Estimation, Univ. of Warwick, Ph.D. Thesis, 1988. 14 A.R. Davies,Image Feature Analysis Using the Multiresolution Fourier Transform, Ph.D. Thesis, Univ. of Warwick, 1993. 15 R.R. Coifman and M.V. Wickerhauser, Entropy-based algorithms for best basis selection",IEEE Trans. on Inform. Theory, 38, pp. 713-719, 1992.
6
Figure 4: Original `lena' image.
Figure 5: Block segmentation showing regions corresponding to leaf nodes of quadtree. Gray nodes indicate regions lacking boundary features. 7
Figure 6: Boundary contours found using procedure of Fig. 3 applied to results of local modelling Fig. 5.
8
(a) Original Reptile
(b) Synthesis at level 6
(c) Synthesis at level 5
(d) Synthesis at level 4
Figure 7: Synthesis of reptile skin texture using MFT.
9