lanxicy.com

第一范文网 文档专家

第一范文网 文档专家

6

Frequency Analysis and Fast Fourier Transform

This chapter introduces the properties, applications, and implementations of the discrete Fourier transform (DFT). Because of the dev

elopment of the fast Fourier transform algorithms, the DFT is now widely used for spectral analysis and fast convolution.

6.1 Fourier Series and Transform

In this section, we will introduce the representation of analog periodic signals using Fourier series and the analysis of ?nite-energy signals using Fourier transform.

6.1.1 Fourier Series

A periodic signal can be represented as the sum of an in?nite number of harmonic-related sinusoids and complex exponentials. The representation of periodic signal x(t) with period T0 is the Fourier series de?ned as x(t) =

∞

ck e jk

k=?∞

0t

,

0

(6.1) is the

where ck is the Fourier series coef?cient, 0 = 2π/T0 is the fundamental frequency, and k frequency of the kth harmonic. The kth Fourier coef?cient ck is expressed as ck = 1 T0 x(t)e? jk

T0

0t

dt.

(6.2)

For an odd function x(t), it is easier to calculate the interval from 0 to T0 . For an even function, integration 1 from ?T0 /2 to T0 /2 is commonly used. The term c0 = T0 T0 x(t) dt is called the DC component because it equals the average value of x(t) over one period.

Example 6.1: A rectangular pulse train is a periodic signal with period T0 and can be expressed as

x(t) = A, 0, kT0 ? τ/2 ≤ t ≤ kT0 + τ/2 otherwise ,

S.M. Kuo, B.H. Lee, and W. Tian

(6.3)

Real-Time Digital Signal Processing: Implementations and Applications C 2006 John Wiley & Sons, Ltd

304

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

where k = 0, ±1, ±2, . . . , and τ < T0 is the width of rectangular pulse with amplitude A. Since x(t) is an even function, its Fourier coef?cients can be calculated as 1 ck = T0

T0 2

Ae? jk 0 t

T0 ? 2

A dt = T0

e? jk ? jk

0t

τ 2

0 ?τ 2

=

Aτ sin (k 0 τ /2) . T0 k 0 τ /2

(6.4)

This equation shows that ck has a maximum value of Aτ/T0 at the DC frequency to zero as 0 → ±∞, and equals zero at frequencies that are multiples of π.

0

= 0, decays

The plot of |ck |2 shows that the power of the periodic signal is distributed among the frequency components. Since the power of a periodic signal exists only at discrete frequencies k 0 , the signal has a line spectrum. The spacing between two consecutive spectral lines is equal to the fundamental frequency 0 . For the rectangular pulse train with a ?xed period T0 , the effect of decreasing τ is to spread the signal power over the entire frequency range. On the other hand, when τ is ?xed but the period T0 increases, the spacing between adjacent spectral lines decreases.

Example 6.2: Consider a perfect sinewave expressed as

x(t) = sin(2π f 0 t). Using Euler’s formula (see Appendix A) and Equation (6.1), we obtain sin(2π f 0 t) =

∞ 1 j2π f0 t (e ? e? j2π f0 t ) = ck e jk2π f0 t . 2j k=?∞

Therefore, the Fourier series coef?cients can be calculated as ? k=1 ? 1/2 j, k = ?1 . ck = ?1/2 j, ? 0, otherwise

(6.5)

This equation indicates that the power of a pure sinewave is distributed only at the harmonics k = ±1, a perfect line spectrum.

6.1.2 Fourier Transform

We have shown that a periodic signal has a line spectrum and the space between two adjacent spectral lines is equal to the fundamental frequency 0 = 2π/T0 . As T0 increases, the line space decreases and the number of frequency components increases. If we increase the period without limit (i.e., T0 → ∞), the line spacing tends toward zero with in?nite frequency components. Therefore, the discrete line components converge into a continuum of frequency spectrum. In practice, most real-world signals, such as speech, are not periodic. They can be approximated by periodic signals with in?nite period, i.e., T0 → ∞ (or 0 → 0). Therefore, the number of exponential components in Equation (6.1) tends toward in?nity, and the summation becomes integration over the range (?∞, ∞). Thus, Equation (6.1) becomes 1 x(t) = 2π

∞

X ( )e j

?∞

t

d .

(6.6)

DISCRETE FOURIER TRANSFORM This is the inverse Fourier transform. Similarly, Equation (6.2) becomes

∞

305

X( ) =

?∞

x(t)e? j

t

dt,

(6.7)

or

∞

X( f ) =

?∞

x(t)e? j2π f t dt.

(6.8)

This is the Fourier transform of the analog x(t).

Example 6.3: Calculate the Fourier transform of function x(t) = e?at u(t), where a > 0 and u(t)

is the unit-step function. From Equation (6.7), we have

∞ ∞

X( ) =

?∞

e

?at

u(t)e

?j t

dt =

0

e?(a+ j

)t

dt

=

1 a+ j

.

For a function x(t) de?ned over a ?nite interval T0 , i.e., x(t) = 0 for |t| > T0 /2, the Fourier series coef?cients ck can be expressed in terms of X ( ) using Equations (6.2) and (6.7) as ck = 1 X (k T0

0 ).

(6.9)

Therefore, the Fourier transform X ( ) of a ?nite interval function at a set of equally spaced points on the -axis is speci?ed exactly by the Fourier series coef?cients ck .

6.2 Discrete Fourier Transform

In this section, we introduce the discrete-time Fourier transform and discrete Fourier transform of digital signals.

6.2.1 Discrete-Time Fourier Transform

The discrete-time Fourier transform (DTFT) of a discrete-time signal x(nT ) is de?ned as X (ω) =

∞ n=?∞

x(nT )e? jωnT .

(6.10)

It shows that X (ω) is a periodic function with period 2π . Thus, the frequency range of a discrete-time signal is unique over the range (?π, π) or (0, 2π).

306

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

The DTFT of x(nT ) can also be de?ned in terms of normalized frequency as X (F) =

∞ n=?∞

x(nT )e? j2π Fn .

(6.11)

Comparing this equation with Equation (6.8), the periodic sampling imposes a relationship between the independent variables t and n as t = nT = n/ f s . It can be shown that X (F) = 1 T

∞ k=?∞

X ( f ? k f s ).

(6.12)

This equation states that X (F) is the sum of an in?nite number of X ( f ), scaled by 1/T , and then frequency shifted to k f s . It also states that X (F) is a periodic function with period T = 1/ f s .

Example 6.4: Assume that a continuous-time signal x(t) is bandlimited to f M , i.e., |X ( f )| = 0 for | f | ≥ f M , where f M is the bandwidth of signal x(t). The spectrum is zero for | f | ≥ f M as shown in Figure 6.1(a). As shown in Equation (6.12), sampling extends the spectrum X ( f ) repeatedly on both sides of the f-axis. When the sampling rate f s is greater than 2 f M , i.e., f M ≤ f s /2, the spectrum X ( f ) is preserved in X (F) as shown in Figure 6.1(b). In this case, there is no aliasing because the spectrum

X(f)

? fM

0

fM

f

(a) Spectrum of an analog signal. X( f / fs )

? fs 2 ? fs ? fM

fs 2

0

fM fs

f

(b) Spectrum of discrete-time signal when the sampling theorem is satisfied.

? fs 2

X( f / fs ) fs 2

? fs ? fM

0

fM fs

f

(c) Spectrum of discrete-time signal when the sampling theorem is violated.

Figure 6.1 Spectrum replication caused by sampling: (a) spectrum of analog bandlimited signal x(t); (b) sampling theorem is satis?ed; and (c) overlap of spectral components

DISCRETE FOURIER TRANSFORM

307

of the discrete-time signal is identical (except the scaling factor 1/T ) to the spectrum of the analog signal within the frequency range | f | ≤ f s /2 or |F| ≤ 1. The analog signal x(t) can be recovered from the discrete-time signal x(nT) by passing it through an ideal lowpass ?lter with bandwidth f M and gain T . This veri?es the sampling theorem de?ned in Equation (1.3). However, if the sampling rate f s < 2 f M , the shifted replicas of X ( f ) will overlap as shown in Figure 6.1(c). This phenomenon is called aliasing since the frequency components in the overlapped region are corrupted. The DTFT X (ω) is a continuous function of frequency ω and the computation requires an in?nite-length sequence x(n). We have de?ned DFT in Section 3.2.6 for N samples of x(n) at N discrete frequencies. Therefore, DFT is a numerically computable transform.

6.2.2 Discrete Fourier Transform

The DFT of a ?nite-duration sequence x(n) of length N is de?ned as X (k) =

N ?1 n=0

x(n)e? j(2π/N )kn ,

k = 0, 1, . . . , N ? 1,

(6.13)

where X (k) is the kth DFT coef?cient and the upper and lower indices in the summation re?ect the fact that x(n) = 0 outside the range 0 ≤ n ≤ N ? 1. The DFT is equivalent to taking N samples of DTFT X (ω) over the interval 0 ≤ ω < 2π at N discrete frequencies ωk = 2πk/N , where k = 0, 1, . . . , N ? 1. The spacing between two successive X (k) is 2π/N rad (or f s /N Hz).

Example 6.5: If the signal {x(n)} is real valued and N is an even number, we can show that

X (0) = and X (N /2) =

N ?1 n=0 N ?1

x(n)

n=0

e? jπn x(n) =

N ?1

(?1)n x(n).

n=0

Therefore, the DFT coef?cients X (0) and X (N /2) are real values. The DFT de?ned in Equation (6.13) can also be written as X (k) = where

kn WN = e ? j 2π kn N N ?1 n=0 kn x(n)W N ,

k = 0, 1, . . . , N ? 1,

(6.14)

= cos

2π kn N

? j sin

2πkn N

,

0 ≤ k, n ≤ N ? 1.

(6.15)

308

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

6 W 8 = ?W 8 5 W 8 = ?W 8

1 2

7 3 W 8 = ?W 8

4 0 W 8 = ?W 8 = ?1

0 W8 = 1

3 W8 2 W8

1 W8

Figure 6.2

Twiddle factors for DFT, N = 8

kn N k 0 The parameter W N is called the twiddle factors of the DFT. Because W N = e? j2π = 1 = W N , W N , k = 0, 1, . . . , N ? 1 are the N roots of unity in clockwise direction on the unit circle. It can be shown that N /2 W N = e? jπ = ?1. The twiddle factors have the symmetry property

WN and the periodicity property

k+N /2

k = ?W N ,

0 ≤ k ≤ N /2 ? 1,

(6.16)

k+N k WN = WN .

(6.17)

Figure 6.2 illustrates the cyclic property of the twiddle factors for an 8-point DFT.

Example 6.6: Consider the ?nite-length signal

x(n) = a n , n = 0, 1, . . . , N ? 1,

where 0 < a < 1. The DFT of x(n) is computed as X (k) = =

N ?1 n=0

a n e? j(2π k/N )n =

? j2πk/N N

N ?1 n=0

ae? j2πk/N

n

1 ? ae 1 ? ae? j2πk/N

=

1 ? aN , 1 ? ae? j2π k/N

k = 0, 1, . . . , N ? 1.

The inverse discrete Fourier transform (IDFT) is used to transform the frequency domain X (k) back into the time-domain signal x(n). The IDFT is de?ned as x(n) = 1 N

N ?1 k=0

X (k)e j(2π/N )kn =

1 N

N ?1 k=0

?kn X (k)W N ,

n = 0, 1, . . . , N ? 1.

(6.18)

This is identical to the DFT with the exception of the normalizing factor 1/N and the opposite sign of the exponent of the twiddle factors.

DISCRETE FOURIER TRANSFORM

309

The DFT and IDFT de?ned in Equations (6.14) and (6.18), respectively, can be expressed in matrixvector form as X = Wx and x= 1 ? W X, N (6.20) (6.19)

where x = [x(0) x(1) . . . x(N ? 1)]T is the signal vector, the complex vector X = [X (0)X (1) . . . X (N ? 1)]T contains the DFT coef?cients, and the N xN twiddle-factor matrix (or DFT matrix) W is de?ned by 1 ?1 ? =?. ?. . 1 ? 1 1 WN . . . N ?1 WN ··· ··· .. . ··· 1 . . . ? ? ? ?, ?

2

kn W = WN

N W N ?1

0≤k,n≤N ?1

(6.21)

(N W N ?1)

and W? is the complex conjugate of the matrix W. Since W is a symmetric matrix, the inverse matrix 1 W?1 = N W? was used to derive Equation (6.20).

Example 6.7: Given x(n) = {1, 1, 0, 0}, the DFT of this 4-point sequence can be computed using

the matrix formulation as 1 ?1 X=? ?1 1 ? 1 1 W4 2 W4 3 W4 1 2 W4 4 W4 6 W4 ? ? 1 1 3? ?1 W4 ? x=? 6 ?1 W4 ? 9 1 W4 ?? ? ? 2 1 1 1 1 ? j ?1 j ??1? ?1 ? ?? ? = ? ?1 1 ?1 ? ? 0 ? ? 0 1+ 0 j ?1 ? j ? j? ?, ? j

where we used symmetry and periodicity properties given in Equations (6.16) and (6.17) to obtain 9 0 4 1 2 6 3 W4 = W4 = 1, W4 = W4 = ? j, W4 = W4 = ?1, and W4 = j. The IDFT can be computed as 1 1 ?1 x= ? 4 ?1 1 ? 1 ?1 W4 ?2 W4 ?3 W4 1 ?2 W4 ?4 W4 ?6 W4 ? ? 1 1 1 ?3 1 ?1 W4 ? j ?X = ? ?6 W4 ? 4 ? 1 ?1 ?9 1 ?j W4 ?? 2 1 1 ?1 ? j ? ? 1 ? ?? 1 ?1 ? ? 0 1+ ?1 j ? ? 1 j ? ?1? ? = ? ?. ? ?0? 0 j ?

The DFT coef?cients are equally spaced on the unit circle with frequency intervals of f s /N (or 2π /N ). Therefore, the frequency resolution of the DFT is = f s /N . The frequency sample X (k) represents discrete frequency fk = k fs , N for k = 0, 1, . . . , N ? 1. (6.22)

Since the DFT coef?cient X (k) is a complex variable, it can be expressed in polar form as X (k) = |X (k)|e jφ(k) , (6.23)

310

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

where the DFT magnitude spectrum is de?ned as |X (k)| = and the phase spectrum is de?ned as φ(k) = tan?1 Im[X (k)] . Re[X (k)] (6.25) {Re[X (k)]}2 + {Im[X (k)]}2 (6.24)

Example 6.8: Consider a ?nite-length DC signal x(n) = c, where n = 0, 1, . . . , N ? 1. From

Equation (6.14), we obtain X (k) = c

k Since W NN = e ? j 2π k N N N ?1 n=0 kn WN = c k 1 ? W NN . k 1 ? WN

k = 1 for all k, and W N = 1 for k = i N , we have X (k) = 0 for k = N ?1 n=0 kn W N = N . Therefore, we obtain

1, 2, . . . , N ? 1. For k = 0,

X (k) = cN δ(k),

k = 0, 1, . . . , N ? 1.

6.2.3 Important Properties

This section introduces several important properties of DFT that are useful for analyzing digital signals and systems. Linearity: If {x(n)} and {y(n)} are digital sequences of the same length, DFT[ax(n) + by(n)] = aDFT[x(n)] + bDFT[y(n)] = a X (k) + bY (k), (6.26)

where a and b are arbitrary constants. Linearity allows us to analyze complex signals and systems by evaluating their individual components. The overall response is the combination of individual results evaluated at every frequency component. Complex conjugate: If the sequence {x(n), 0 ≤ n ≤ N ? 1} is real valued, then X (?k) = X ? (k) = X (N ? k), 0 ≤ k ≤ N ? 1, (6.27)

where X ? (k) is the complex conjugate of X (k). Or equivalently, X (M + k) = X ? (M ? k), 0 ≤ k ≤ M, (6.28)

where M = N /2 if N is even, or M = (N ? 1)/2 if N is odd. This property shows that only the ?rst (M + 1) DFT coef?cients from k = 0 to M are independent as illustrated in Figure 6.3. For complex signals, however, all N complex outputs carry useful information. From the symmetry property, we obtain |X (k)| = |X (N ? k)|, k = 1, 2, . . . , M ? 1 (6.29)

DISCRETE FOURIER TRANSFORM

Real X(0) X(1) … Real X(M ? 2) X(M ? 1) X(M ) X(M + 1) X(M + 2) … X(N ? 1)

311

Complex conjugate

Figure 6.3

Complex-conjugate property, N is an even number

and φ(k) = ?φ(N ? k), k = 1, 2, . . . , M ? 1. (6.30)

Circular shifts: Let y(n) be a circular-shifted sequence de?ned as y(n) = x(n ? m) mod N , (6.31)

where m is the number of samples by which x(n) is shifted to the right and the modulo operation 0 ≤ (n ? m) mod N = (n ? m ± i N ) < N . (6.32)

For example, if m = 1, x(0) shifts to x(1), x(1) shifts to x(2), . . . , x(N ? 2) shifts to x(N ? 1), and x(N ? 1) shifts back to x(0). Thus, a circular shift of an N -point sequence is equivalent to a linear shift of its periodic extension. Considering the y(n) de?ned in Equation (6.31), we have

mk Y (k) = e? j(2π k/N )m X (k) = W N X (k).

(6.33)

DFT and z-transform: DFT is equal to the z-transform of a sequence x(n) of length N , evaluated on the unit circle at N equally spaced frequencies ωk = 2πk/N , where k = 0, 1, . . . , N ? 1. That is, X (k) = X (z)|

z=e

j 2π k N

( ),

k = 0, 1, . . . , N ? 1.

(6.34)

Circular convolution: If x(n) and h(n) are real-valued N -periodic sequences, y(n) is the circular convolution of x(n) and h(n) de?ned as y(n) = x(n) ? h(n), n = 0, 1, . . . , N ? 1, (6.35)

where ? denotes circular convolution. The circular convolution in time domain is equivalent to multiplication in the frequency domain expressed as Y (k) = X (k)H (k), k = 0, 1, . . . , N ? 1. (6.36)

Note that the shorter sequence must be padded with zeros in order to have the same length for computing circular convolution.

312

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

x(n) x(n ? N + 1) h(n ? 1) h(n) x(n ? 1) h(n ? N + 1)

h(n ? 2)

x(n ? 2)

Figure 6.4

Circular convolution of two sequences using the concentric circle approach

Figure 6.4 illustrates the cyclic property of circular convolution using two concentric circles. To perform circular convolution, N samples of x(n) are equally spaced around the outer circle in the clockwise direction, and N samples of h(n) are displayed on the inner circle in the counterclockwise direction starting at the same point. Corresponding samples on the two circles are multiplied, and the products are summed to form an output. The successive value of the circular convolution is obtained by rotating the inner circle of one sample in the clockwise direction, and repeating the operation of computing the sum of corresponding products. This process is repeated until the ?rst sample of inner circle lines up with the ?rst sample of the exterior circle again.

Example 6.9: Given two 4-point sequences x(n) = {1, 2, 3, 4} and h(n) = {1, 0, 1, 1}. Using the

circular convolution method illustrated in Figure 6.4, we can obtain n = 0, y(0) = 1 × 1 + 1 × 2 + 1 × 3 + 0 × 4 = 6 n = 1, y(1) = 0 × 1 + 1 × 2 + 1 × 3 + 1 × 4 = 9 n = 2, y(2) = 1 × 1 + 0 × 2 + 1 × 3 + 1 × 4 = 8 n = 3, y(3) = 1 × 1 + 1 × 2 + 0 × 3 + 1 × 4 = 7 Therefore, we obtain y(n) = x(n) ? h(n) = {6, 9, 8, 7}. Note that the linear convolution of sequences x(n) and h(n) results in y(n) = x(n) ? h(n) = {1, 2, 4, 7, 5, 7, 4}, which is also implemented in MATLAB script example6_9.m. To eliminate the circular effect and ensure that the DFT method results in a linear convolution, the signals must be zero-padded. Since the linear convolution of two sequences of lengths L and M will result

FAST FOURIER TRANSFORMS

313

in a sequence of length L + M ? 1, the two sequences must be extended to the length of L + M ? 1 or greater by zero-padding. That is, append the sequence of length L with at least M ? 1 zeros, and pad the sequence of length M with at least L ? 1 zeros.

Example 6.10: Consider the same sequences h(n) and x(n) given in Example 6.9. If those 4-point sequences are zero-padded to 8 points as x(n) = {1, 2, 3, 4, 0, 0, 0, 0} and h(n) = {1, 0, 1, 1, 0, 0, 0, 0}, the resulting circular convolution is

n = 0, y(0) = 1 × 1 + 0 × 2 + 0 × 3 + 0 × 4 + 0 × 0 + 1 × 0 + 1 × 0 + 0 × 0 = 1 n = 1, y(1) = 0 × 1 + 1 × 2 + 0 × 3 + 0 × 4 + 0 × 0 + 0 × 0 + 1 × 0 + 1 × 0 = 2 n = 2, y(2) = 1 × 1 + 0 × 2 + 1 × 3 + 0 × 4 + 0 × 0 + 0 × 0 + 0 × 0 + 1 × 0 = 4 . . . We ?nally have y(n) = x(n) ? h(n) = {1, 2, 4, 7, 5, 7, 4, 0}. This result is identical to the linear convolution of the two sequences as given in Example 6.9. Thus, the linear convolution can be realized by the circular convolution with proper zero-padding. MATLAB script example6_10.m implements the circular convolution of zero-padded sequences using DFT. Zero-padding can be implemented using the MATLAB function zeros. For example, the 4-point sequence x(n) given in Example 6.9 can be zero-padded to 8 points with the following command,

x = [1, 2, 3, 4, zeros(1, 4)];

where the MATLAB function zeros(1, N) generates a row vector of N zeros.

6.3 Fast Fourier Transforms

The drawback of using DFT for practical applications is its intensive computational requirement. To compute each X (k) de?ned in Equation (6.14), we need approximately N complex multiplications and additions. For computing N samples of X (k) for k = 0, 1, . . . , N ? 1, approximately N 2 complex multiplications and (N 2 ? N ) complex additions are required. Since a complex multiplication requires four real multiplications and two real additions, the total number of arithmetic operations required for computing N -point DFT is proportional to 4N 2 , which becomes huge for large N . kn The twiddle factor W N is a periodic function with a limited number of distinct values since

kn WN = WN (kn) mod N

,

for kn > N

(6.37)

N kn and W N = 1. Therefore, different powers of W N have the same value as shown in Equation (6.37). In addition, some twiddle factors have either real or imaginary parts equal to 1 or 0. By reducing these redundancies, a very ef?cient algorithm called the fast Fourier transform (FFT) can be derived, which requires only N log2 N operations instead of N 2 operations. If N = 1024, FFT requires about 104 operations instead of 106 operations for DFT.

314

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

x(0) x(2) x(4) x(6) N/2-point DFT

X1(0) X1(1) X1(2) X1(3)

X(0) X(1) X(2) X(3) W8 W8 W8 W8

0 1 2 3

x(1) x(3) x(5) x(7) N/2-point DFT

X2(0) X2(1) X2(2) X2(3)

?1 ?1 ?1 ?1

X(4) X(5) X(6) X(7)

Figure 6.5

Decomposition of an N -point DFT into two N /2 DFTs, N = 8

The generic term FFT covers many different algorithms with different features, advantages, and disadvantages. Each FFT algorithm has different strengths and makes different trade-offs in terms of code complexity, memory usage, and computation requirements. In this section, we introduce two classes of FFT algorithms: decimation-in-time and decimation-in-frequency.

6.3.1 Decimation-in-Time

For the decimation-in-time algorithms, the sequence {x(n), n = 0, 1, . . . , N ? 1} is ?rst divided into two shorter interwoven sequences: the even numbered sequence x1 (m) = x(2m), and the odd numbered sequence x2 (m) = x(2m + 1), m = 0, 1, . . . , (N /2) ? 1. (6.39) m = 0, 1, . . . , (N /2) ? 1 (6.38)

Apply the DFT de?ned in Equation (6.14) to these two sequences of length N /2, and combine the resulting N /2-point X 1 (k) and X 2 (k) to produce the ?nal N -point DFT. This procedure is illustrated in Figure 6.5 for N = 8. The structure shown on the right side of Figure 6.5 is called the butter?y network because of its crisscross appearance, which can be generalized in Figure 6.6. Each butter?y involves just a single k complex multiplication by a twiddle factor W N , one addition, and one subtraction.

(m ? 1)th stage

k WN

mth stage

?1

Figure 6.6

Flow graph for a butter?y computation

FAST FOURIER TRANSFORMS

x(0) x(4) x(2) x(6) N/4-point DFT N/4-point DFT

0 W8 2 W8

315

X(0) X(1) ?1 ?1

0 W8

X(2) X(3) ?1 ?1 ?1 ?1

x(1) x(5) x(3) x(7) N/4-point DFT N/4-point DFT

0 W8 2 W8

1 W8 2 W8

X(4) X(5) X(6) X(7)

?1 ?1

3 W8

Figure 6.7

Flow graph illustrating second step of N -point DFT, N = 8

Since N is a power of 2, N /2 is an even number. Each of these N /2-point DFTs can be computed by two smaller N /4-point DFTs. This second step process is illustrated in Figure 6.7. By repeating the same process, we will ?nally obtain a set of 2-point DFTs since N is a power of 2. For example, the N /4-point DFT becomes a 2-point DFT in Figure 6.7 for N = 8. Since the ?rst stage 0 uses the twiddle factor W N = 1, the 2-point butter?y network illustrated in Figure 6.8 requires only one addition and one subtraction.

Example 6.11: Consider the 2-point FFT algorithm which has two input samples x(0) and x(1). The DFT output samples X (0) and X (1) can be computed as

1

X (k) =

n=0

nk x(n)W2 ,

k = 0, 1.

0 1 Since W2 = 1 and W2 = e?π = ?1, we have

X (0) = x(0) + x(1)

and

X (1) = x(0) ? x(1).

The computation is identical to the signal-?ow graph shown in Figure 6.8. As shown in Figure 6.7, the input sequence is arranged as if each index was written in binary form and then the order of binary digits was reversed. This bit-reversal process is illustrated in Table 6.1 for the case of N = 8. The input sample indices in decimal are ?rst converted to their binary representations, the binary bit streams are reversed, and then the reversed binary numbers are converted back to decimal

?1

Figure 6.8 Flow graph of 2-point DFT

316

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Table 6.1 Example of bit-reversal process, N = 8 (3 bits) Bit-reversed sample index Binary 000 100 010 110 001 101 011 111 Decimal 0 4 2 6 1 5 3 7

Input sample index Decimal 0 1 2 3 4 5 6 7 Binary 000 001 010 011 100 101 110 111

values to give the reordered time indices. Most modern DSP processors (such as the TMS320C55x) provide the bit-reversal addressing mode to ef?ciently support this process. For the FFT algorithm shown in Figure 6.7, the input values are no longer needed after the computation of output values at a particular stage. Thus, the memory locations used for the FFT outputs can be the same locations used for storing the input data. This observation supports the in-place FFT algorithms that use the same memory locations for both the input and output numbers.

6.3.2 Decimation-in-Frequency

The development of the decimation-in-frequency FFT algorithm is similar to the decimation-in-time algorithm presented in the previous section. The ?rst step is to divide the data sequence into two halves, each of N /2 samples. The next step is to separate the frequency terms X (k) into even and odd samples of k. Figure 6.9 illustrates the ?rst decomposition of an N -point DFT into two N /2-point DFTs. Continue the process of decomposition until the last stage consists of 2-point DFTs. The decomposition and symmetry relationships are reversed from the decimation-in-time algorithm. The bit reversal occurs at the output instead of the input and the order of the output samples X (k) will be rearranged as Table 6.1. Figure 6.10 illustrates the butter?y representation for the decimation-in-frequency FFT algorithm.

x(0) x(1) x(2) x(3)

x1(0) x1(1) x1(2) x1(3) N/2-point DFT

X(0) X(2) X(4) X(6)

x(4) x(5) x(6) x(7)

?1 ?1 ?1 ?1

0 x2(0) W 8

X(1) X(3) N/2-point DFT X(5) X(7)

x2(1) x2(2) x2(3)

1 W8 2 W8 3 W8

Figure 6.9

Decomposition of an N -point DFT into two N /2 DFTs

IMPLEMENTATION CONSIDERATIONS

317

(m ? 1)th stage

?1

k WN

mth stage

Figure 6.10

Butter?y network for decimation-in-frequency FFT algorithm

The FFT algorithms introduced in this chapter are based on two-input, two-output butter?y computations, and are classi?ed as radix-2 FFT algorithms. It is possible to use other radix values to develop FFT algorithms. However, these algorithms only work well for some speci?c FFT lengths. In addition, these algorithms are more complicated than the radix-2 FFT algorithms and the programs for real-time implementation are not widely available for DSP processors.

6.3.3 Inverse Fast Fourier Transform

The FFT algorithms introduced in the previous sections can be modi?ed to ef?ciently compute the inverse FFT (IFFT). By complex conjugating both sides of Equation (6.18), we have x ? (n) = 1 N

N ?1 k=0 kn X ? (k)W N ,

n = 0, 1, . . . , N ? 1.

(6.40)

This equation shows we can use an FFT algorithm to compute the IFFT by ?rst conjugating the DFT coef?cients X (k) to obtain X ? (k), computing the DFT of X ? (k) using an FFT algorithm, scaling the results by 1/N to obtain x ? (n), and then complex conjugating x ? (n) to obtain the output sequence x(n). If the signal is real valued, the ?nal conjugation operation is not required.

6.4 Implementation Considerations

Many FFT routines are available in C and assembly programs for some speci?c DSP processors; however, it is important to understand the implementation issues in order to use FFT properly.

6.4.1 Computational Issues

The FFT routines accept complex-valued inputs; therefore, the number of memory locations required is 2N for N -point FFT. To use the available complex FFT program for real-valued signals, we have to set the imaginary parts to zero. The complex multiplication has the form (a + jb)(c + jd) = (ac + bd) + j(bc + ad), which requires four real multiplications and two real additions. The number of multiplication and the storage requirements can be reduced if the signal has special properties. For example, if x(n) is real, only N /2 samples from X (0) to X (N /2) need to be computed as shown by complex-conjugate property. In most FFT programs developed for general-purpose computers, the computation of twiddle factors kn W N de?ned in Equation (6.15) is embedded in the program. However, the twiddle factors only need to be computed once during the program initialization stage. In the implementation of FFT algorithm on DSP processors, it is preferable to tabulate the values of twiddle factors so that they can be looked up during the computation of FFT.

318

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

The complexity of FFT algorithms is usually measured by the required number of arithmetic operations (multiplications and additions). In practical real-time implementations with DSP processors, the architecture, instruction set, data structures, and memory organizations of the processors are critical factors. For example, modern DSP processors usually provide bit-reversal addressing and a high degree of instruction parallelism to implement FFT algorithms.

6.4.2 Finite-Precision Effects

From the signal-?ow graph of the FFT algorithm shown in Figure 6.7, X (k) will be computed by a series of butter?y computations with a single complex multiplication per butter?y network. Note that some butter?y networks with coef?cients ±1 (such as 2-point FFT in the ?rst stage) do not require multiplication. Figure 6.7 also shows that the computation of N -point FFT requires M = log2 N stages. There are N /2 butter?ies in the ?rst stage, N /4 in the second stage, and so on. Thus, the total number of butter?ies required is N N + + · · · + 2 + 1 = N ? 1. 2 4 (6.41)

The quantization errors introduced at the mth stage are multiplied by the twiddle factors at each subsequent stage. Since the magnitude of the twiddle factor is always unity, the variances of the quantization errors do not change while propagating to the output. The de?nition of DFT given in Equation (6.14) shows that we can scale the input sequence with the condition |x(n)| < 1 N (6.42)

to prevent the over?ow at the output because |e? j(2π/N )kn | = 1. For example, in a 1024-point FFT, the input data must be shifted right by 10 bits (1024 = 210 ). If the original data is 16 bits, the effective wordlength of the input data is reduced to only 6 bits after scaling. This worst-case scaling substantially reduces the resolution of the FFT results. Instead of scaling the input samples by 1/N at the beginning of the FFT, we can scale the signals at each stage. Figure 6.6 shows that we can avoid over?ow within the FFT by scaling the input at each stage by 0.5 because the outputs of each butter?y involve the addition of two numbers. This scaling process provides a better accuracy than the scaling of input by 1/N . An alternative conditional scaling method examines the results of each FFT stage to determine whether to scale the inputs of that stage. If all of the results in a particular stage have magnitude less than 1, no scaling is necessary at that stage. Otherwise, scale the inputs to that stage by 0.5. This conditional scaling technique achieves much better accuracy, however, at the cost of increasing software complexity.

6.4.3 MATLAB Implementations

As introduced in Section 3.2.6, MATLAB provides the function fft with syntax

y = fft(x);

to compute the DFT of x(n) in the vector x. If the length of x is a power of 2, the fft function employs an ef?cient radix-2 FFT algorithm. Otherwise, it uses a slower mixed-radix FFT algorithm or even a DFT. An alternative way of using fft function is

y = fft(x, N);

IMPLEMENTATION CONSIDERATIONS

Spectrum of 50 Hz sinewave 60

319

50

40 Magnitude

30

20

10

0

10

20

40 30 Frequency index, k

Spectrum of 50 Hz sinewave

50

60

Figure 6.11

to specify N -point FFT. If the length of x is less than N, the vector x is padded with trailing zeros to length N. If the length of x is greater than N, the fft function only performs the FFT of the ?rst N samples. The execution time of the fft function depends on the input data type and the sequence length. If the input data is real valued, it computes a real power-of-2 FFT algorithm that is faster than a complex FFT of the same length. The execution is fastest if the sequence length is exactly a power of 2. For example, if the length of x is 511, the function y = fft(x, 512) will be computed faster than fft(x) which performs 511-point DFT. It is important to note that the vectors in MATLAB are indexed from 1 to N instead of from 0 to N ? 1 as given in the DFT and IDFT de?nitions.

Example 6.12: Consider a sinewave of frequency f = 50 Hz expressed as

x(n) = sin(2π f n/ f s ), n = 0, 1, . . . , 127, where the sampling rate f s = 256 Hz. We analyze this sinewave using a 128-point FFT given in the MATLAB script (example6_12.m), and display the magnitude spectrum in Figure 6.11. It shows the frequency index k = 25 corresponding to the spectrum peak. Substituting the associated parameters into Equation (6.22), we veri?ed that the line spectrum is corresponding to 50 Hz. The MATLAB function ifft implements the IFFT algorithm as

y = ifft(x);

or

y = ifft(x,N);

The characteristics and usage of ifft are the same as those for fft.

320

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

6.4.4 Fixed-Point Implementation Using MATLAB

MATLAB provides a function qfft for quantizing an FFT object to support ?xed-point implementation. For example, the following command,

F = qfft

constructs a quantized FFT object F with default values. We can change the default settings by

F = qfft('Property1',Value1, 'Property2',Value2, ...)

to create a quantized FFT object with speci?c property/value pairs.

Example 6.13: We can change the default 16-point FFT to 128-point FFT using the following

command:

F = qfft('length',128)

We then obtain the following quantized FFT object in the command window:

F = Radix Length CoefficientFormat InputFormat OutputFormat MultiplicandFormat ProductFormat SumFormat NumberOfSections ScaleValues = = = = = = = = = = 2 128 quantizer('fixed', quantizer('fixed', quantizer('fixed', quantizer('fixed', quantizer('fixed', quantizer('fixed', 7 [1]

'round', 'floor', 'floor', 'floor', 'floor', 'floor',

'saturate', 'saturate', 'saturate', 'saturate', 'saturate', 'saturate',

[16 [16 [16 [16 [32 [32

15]) 15]) 15]) 15]) 30]) 30])

This shows that the quantized FFT is a 128-point radix-2 FFT for the ?xed-point data and arithmetic. The coef?cients, input, output, and multiplicands are represented using Q15 format [16 15], while the product and sum use Q30 format [32 30]. There are seven stages for N = 128, and no scaling is applied to the input at each stage by the default setting ScaleValues = [1]. We can set a scaling factor 0.5 at the input of each stage as follows:

F.ScaleValues = [0.5 0.5 0.5 0.5 0.5 0.5 0.5];

Or, set different values at speci?c stages using different scaling factors.

Example 6.14: Similar to Example 6.12, we used a quantized FFT to analyze the spectrum of

sinewave. In example6_14a.m, we ?rst generate the same sinewave as in Example 6.12, then use the following functions to compute the ?xed-point FFT with Q15 format:

FXk = qfft('length',128); qXk = fft(FXk, xn); % Create quantized FFT object % Compute Q15 FFT in xn vector

IMPLEMENTATION CONSIDERATIONS

321

When we run the MATLAB script, we receive the following warning messages reported in MATLAB command window:

Warning: 1135 overflows in quantized fft. Max Min NOverflows Coefficient 1 -1 7 Input 0.9999 -0.9999 0 Output 2 -2 16 Multiplicand 2 -2 1063 Product 1 -1 0 Sum 2.414 -2.414 56

NUnderflows 6 0 32 91 0 0

NOperations 254 128 256 3584 3584 4480

Without proper scaling, the FFT has 1135 over?ows, and thus the FFT results are wrong. We can modify the code by setting the scaling factor 0.5 at each stage as follows (see example6_14b.m):

FXk = qfft('length',128); % Create quantized FFT object FXk.ScaleValues = [0.5 0.5 0.5 0.5 0.5 0.5 0.5]; % Set scaling factors qXk = fft(FXk, xn); % Compute Q15 FFT of xn vector

When we run the modi?ed program (example6_14b.m), there are no warnings or errors. The spectrum plot displayed in Figure 6.12 shows that we can perform FFT properly using 16-bit processors with adequate scaling factor at each stage.

Spectrum stimation using quantized FFT 60

50

40 Magnitude

30

20

10

0

10

20

30

40

50

60

Frequency index, k

Figure 6.12 Sinewave spectrum computed using the quantized 16-bit FFT

322

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

6.5 Practical Applications

In this section, we will introduce two important FFT applications: spectral analysis and fast convolution.

6.5.1 Spectral Analysis

The inherent properties of the DFT directly affect its performance on spectral analysis. The spectrum estimated from a ?nite number of samples is correct only if the signal is periodic and the sample set exactly covers one or multiple period of signal. In practice, we may have to break up a long sequence into smaller segments and analyze each segment individually using the DFT. As discussed in Section 6.2, the frequency resolution of N -point DFT is f s /N . The DFT coef?cients X (k) represent frequency components that are equally spaced at frequencies f k as de?ned in Equation (6.22). One cannot properly represent a signal component that falls between two adjacent samples in the spectrum, because its energy will spread to neighboring bins and distort their spectral amplitude.

Example 6.15: In Example 6.12, the frequency resolution ( f s /N ) is 2 Hz using a 128-point FFT and sampling rate 256 Hz. The line component at 50 Hz can be represented by X (k) at k = 25 as shown in Figure 6.11. Consider the case of adding another sinewave at frequency 61 Hz (see example6_15.m). Figure 6.13 shows both spectral components at 50 and 61 Hz. However, the frequency component at 61 Hz (between k = 30 and k = 31) does not show a line component because its energy spreads into adjacent frequency bins.

Spectra of 50 and 61 Hz sinewaves 60

50

50Hz

40 Magnitude 61Hz 30

20

10

0

10

20

30 40 Frequency index, k

50

60

Figure 6.13

Spectra of sinewaves at 50 and 61 Hz

PRACTICAL APPLICATIONS

323

A solution to this spectral leakage problem is to have a ?ner resolution f s /N by using a larger FFT size N . If the number of data samples is not suf?ciently large, the sequence may be expanded to length N by adding zeros to the tail of true data. This process is simply the interpolation of the spectral curve between adjacent frequency components. Other problems relate to the FFT-based spectral analysis including aliasing, ?nite data length, spectral leakage, and spectral smearing. These issues will be discussed in the following section.

6.5.2 Spectral Leakage and Resolution

The data set that represents the signal of ?nite length N can be obtained by multiplying the signal with a rectangular window expressed as x N (n) = w(n)x(n) = x(n), 0 ≤ n ≤ N ? 1 , 0, otherwise (6.43)

where the rectangular function w(n) is de?ned in Equation (4.33). As the length of the window increases, the windowed signal x N (n) becomes a better approximation of x(n), and thus X (k) becomes a better approximation of the DTFT X (ω). The time-domain multiplication given in Equation (6.43) is equivalent to the convolution in the frequency domain. Thus, the DFT of x N (n) can be expressed as

N

X N (k) = W (k) ? X (k) =

l=?N

W (k ? l)X (k),

(6.44)

where W (k) is the DFT of the window function w(n), and X (k) is the true DFT of the signal x(n). Equation (6.44) shows that the computed spectrum X N (k) consists of the true spectrum X (k) convoluted with the window function’s spectrum W (k). Therefore, the computed spectrum of the ?nite-length signal is corrupted by the rectangular window’s spectrum. As discussed in Section 4.2, the magnitude response of the rectangular window consists of a mainlobe and several smaller sidelobes. The frequency components that lie under the sidelobes represent the sharp transition of w(n) at the endpoints. The sidelobes introduce spurious peaks into the computed spectrum, or to cancel true peaks in the original spectrum. This phenomenon is known as spectral leakage. To avoid spectral leakage, it is necessary to use different windows as introduced in Section 4.2.3 to reduce the sidelobe effects.

Example 6.16: If the signal x(n) consists of a single sinusoid cos(ω0 n), the spectrum of the in?nite-length sampled signal is

X (ω) = 2π δ(ω ± ω0 ), ?π ≤ ω ≤ π, (6.45)

which consists of two line components at frequencies ±ω0 . However, the spectrum of the windowed sinusoid can be obtained as X N (ω) = 1 [W (ω ? ω0 ) + W (ω + ω0 )], 2 (6.46)

where W (ω) is the spectrum of the window function.

324

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Equation (6.46) shows that the windowing process has the effect of smearing the original sharp spectral line δ(ω ? ω0 ) at frequency ω0 and replacing it with W (ω ? ω0 ). Thus, the power has been spread into the entire frequency range by the windowing operation. This undesired effect is called spectral smearing. Thus, windowing not only distorted the spectrum due to leakage effects, but also reduced spectral resolution.

Example 6.17: Consider a signal consisting of two sinusoidal components expressed as x(n) = cos(ω1 n) + cos(ω2 n). The spectrum of the windowed signal is

X N (ω) = 1 [W (ω ? ω1 ) + W (ω + ω1 ) + W (ω ? ω2 ) + W (ω + ω2 )], 2 (6.47)

which shows that the sharp spectral lines are replaced with their smeared versions. If the frequency separation, ω = |ω1 ? ω2 |, of the two sinusoids is ω≤ or f ≤ fs , N (6.49) 2π N (6.48)

the mainlobe of the two window functions W (ω ? ω1 ) and W (ω ? ω2 ) overlap. Thus, the two spectral lines in X N (ω) are not distinguishable. MATLAB script example6_17.m uses 128-point FFT for signal with sampling rate 256 Hz. This example shows that two sinewaves of frequencies 60 and 61 Hz are mixed. From Equation (6.49), the frequency separation 1 Hz is less than the frequency resolution 2 Hz, thus these two sinewaves are overlapped as shown in Figure 6.14.

Spectra of mixing 60 and 61 Hz sinewaves 70 60 50 Magnitude 40 30 20 10 0

10

20

30 40 Frequency index, k

50

60

Figure 6.14

Spectra of mixing sinewaves at 60 and 61 Hz

PRACTICAL APPLICATIONS

325

To guarantee that two sinusoids appear as two distinct ones, their frequency separation must satisfy the condition ω> 2π N or f > fs . N (6.50)

Thus, the minimum DFT length to achieve a desired frequency resolution is given as N> 2π fs = . f ω (6.51)

In summary, the mainlobe width determines the frequency resolution of the windowed spectrum. The sidelobes determine the amount of undesired frequency leakage. The optimum window used for spectral analysis must have narrow mainlobe and small sidelobes. The amount of leakage can be substantially reduced using nonrectangular window functions introduced in Section 4.2.3 at the cost of decreased spectral resolution. For a given window length N , windows such as rectangular, Hanning, and Hamming have relatively narrow mainlobe compared with Blackman and Kaiser windows. Unfortunately, the ?rst three windows have relatively high sidelobes, thus having more leakage. There is a trade-off between frequency resolution and spectral leakage in choosing windows for a given application.

Example 6.18: Consider the 61 Hz sinewave in Example 6.15. We can apply the Kaiser window with N = 128 and β = 8.96 to the signal using the following commands:

beta = 8.96; wn = (kaiser(N,beta))'; x1n = xn.*wn; % Kaiser window % Generate windowed sinewave

The magnitude spectra of sinewaves with the rectangular and Kaiser windows are shown in Figure 6.15 by the MATLAB script example6_18.m. This shows that the Kaiser window can effectively reduce the spectral leakage. Note that the gain for using Kaiser window has been scaled up by 2.4431 in order to compensate the energy loss compared with using rectangular window. The timeand frequency-domain plots of the Kaiser window with length N = 128 and β = 8.96 are shown in Figure 6.16 using WinTool. For a given window, increasing the length of the window reduces the width of the mainlobe, which leads to better frequency resolution. However, if the signal changes frequency content over time, the window cannot be too long in order to provide a meaningful spectrum.

6.5.3 Power Spectrum Density

Consider a sequence x(n) of length N whose DFT is X (k), the Parseval’s theorem can be expressed as E=

N ?1 n=0

|x(n)|2 =

1 N

N ?1 k=0

|X (k)|2 .

(6.52)

The term |X (k)|2 is called the power spectrum that measures the power of signal at frequency f k . Therefore, squaring the DFT magnitude spectrum |X (k)| produces a power spectrum, which is also called the periodogram. The power spectrum density (PSD) (power density spectrum or simply power spectrum) characterizes stationary random processes. The PSD is very useful in the analysis of random signals since it provides

326

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Spectrum of 61 Hz sinewave 60

50

Kaiser window

40 Magnitude

30

20

Rectangular window

10

0

10

20

30 40 Frequency index, k

50

60

Figure 6.15

Spectra obtained using rectangular and Kaiser windows

a meaningful measure for the distribution of the average power in such signals. There are different techniques for estimating the PSD. Since the periodogram is not a consistent estimate of the true PSD, the averaging method can reduce statistical variation of the computed spectra. One way of computing the PSD is to decompose x(n) into M segments, xm (n), of N samples each. These signal segments are spaced N /2 samples apart; i.e., there is 50 % overlap between successive segments. In order to reduce spectral leakage, each xm (n) is multiplied by a nonrectangular window

Time domain 40 1 Magnitude (dB) 0.8 Amplitude 0.6 0.4 0.2 0 20 0 ?20 ?40 ?60 ?80 Frequency domain

?100 20 40 80 60 Samples 100 120 ?120 0 0.2 0.4 0.6 0.8 Normalized frequency (× π rad/sample) Mainlobe width (?3dB): 0.025391

Leakage Factor: 0 %

Relative sidelobe attenuation: ? 66 dB

Figure 6.16

Kaiser window of N = 128 and β = 8.96

PRACTICAL APPLICATIONS

327

(such as Hamming) function w(n) of length N . The PSD is a weighted sum of the periodograms of the individual overlapped segment. The MATALAB Signal Processing Toolbox provides the function psd to estimate the PSD of the signal given in the vector x using the following statement:

h = spectrum.periodogram; psd(h,x,'Fs',Fs); % Create a periodogram object % Plots the two-sided PSD by default

where Fs is the sampling frequency.

Example 6.19: Consider the signal x(n) which consists of two sinusoids (140 and 150 Hz) and noise. This noisy signal is generated by example6_19.m adapted from the MATLAB Help menu. The PSD can be computed by creating the following periodogram object:

Hs=spectrum.periodogram;

The psd function can also display the PSD (Figure 6.17) as follows:

psd(Hs,xn,'Fs',fs,'NFFT',1024)

Note that we can specify the window used for computing PSD. For example, we can use Hamming window as follows:

Hs = spectrum.periodogram('Hamming');

Power spectral density estimate via periodogram 0 140Hz ?10 150 Hz ?20 Power/frequency (dB/Hz) ?30 ?40 ?50 ?60 ?70 ?80

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Frequency (kHz)

Figure 6.17 PSD of two sinewaves embedded in noise

328

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

For a time-varying signal, it is more meaningful to compute a local spectrum that measures spectral contents over a short-time interval. We use a sliding window to break up a long sequence into several short blocks xm (n) of N samples, and then perform the FFT to obtain the time-dependent frequency spectrum at each segment m as follows: X m (k) =

N ?1 n=0 kn xm (n)W N ,

k = 0, 1, . . . , N ? 1.

(6.53)

This process is repeated for the next block of N samples. This technique is called the short-term Fourier transform, since X m (k) is just the spectrum of the short segment of xm (n) that lies inside the sliding window w(n). This form of time-dependent Fourier transform has several applications in speech, sonar, and radar signal processing. Equation (6.53) shows that X m (k) is a two-dimensional sequence. The index k represents frequency, and the block index m represents segment (or time). Since the result is a function of both time and frequency, a three-dimensional graphical display is needed. This is done by plotting |X m (k)| using grayscale (or color) images as a function of both k and m. The resulting three-dimensional graphic is called the spectrogram. The spectrogram uses the x-axis to represent time and the y-axis to represent frequency. The gray level (or color) at point (m, k) is proportional to |X m (k)|. The Signal Processing Toolbox provides a function spectrogram to compute spectrogram. This MATLAB function has the form

B = spectrogram(a,window,noverlap,nfft,Fs);

where B is a matrix containing the complex spectrogram values |X m (k)|, and other arguments are de?ned in the function psd. More overlapped samples make the spectrum move smoother from block to block. It is common to pick the overlap to be around 25 %. The spectrogram function with no output arguments displays the scaled logarithm of the spectrogram in the current graphic window.

Example 6.20: The MATLAB program example6_20.m loads the speech ?le timit2.asc, plays it using the function soundsc, and displays the spectrogram as shown in Figure 6.18. The color bar on the right side indicates the signal strength in dB. The color corresponding to the lower power in the ?gure represents the silence and the color corresponding to the higher power represents the speech signals.

6.5.4 Fast Convolution

As discussed in Chapter 4, FIR ?ltering is a linear convolution of ?lter impulse response h(n) with the input sequence x(n). If the FIR ?lter has L coef?cients, we need L real multiplications and L ? 1 real additions to compute each output y(n). To obtain L output samples, the number of operations (multiplication and addition) needed is proportional to L 2 . To take advantage of ef?cient FFT and IFFT algorithms, we can use the fast convolution algorithm illustrated in Figure 6.19 for FIR ?ltering. Fast convolution provides a signi?cant reduction in computational requirements for higher order FIR ?lters, thus it is often used to implement FIR ?ltering in applications having a large number of data samples. It is important to note that the fast convolution shown in Figure 6.19 produces the circular convolution discussed in Section 6.2.3. In order to produce a linear convolution, it is necessary to append zeros to the signals as shown in Example 6.10. If the data sequence x(n) has ?nite duration M, the ?rst step is to pad data sequence and coef?cients with zeros to a length corresponding to an allowable FFT size N

PRACTICAL APPLICATIONS

329

Figure 6.18

Spectrogram of speech signal

(≥ L + M ? 1), where L is the length of h(n). The FFT is computed for both sequences to obtain X (k) and H (k), the corresponding complex products Y (k) = X (k)H (k) are calculated, and the IFFT of Y (k) is used to obtain y(n). The desired linear convolution is contained in the ?rst L + M ? 1 terms of these results. Since the ?lter impulse response h(n) is known as a priori, the FFT of h(n) can be precalculated and stored as ?xed coef?cients. For many applications, the input sequence is very long as compared to the FIR ?lter length L. This is especially true in real-time applications, such as in audio signal processing where the FIR ?lter order is extremely high due to high-sampling rate and input data is very long. In order to use the ef?cient FFT and IFFT algorithms, the input sequence must be partitioned into segments of N (N > L and N is a size supported by the FFT algorithm) samples, process each segment using the FFT, and ?nally assemble the output sequence from the outputs of each segment. This procedure is called the block-processing operation. The cost of using this ef?cient block processing is the buffering delay. More complicated

x(n)

FFT

X(k) Y(k) y(n)

IFFT

h(n)

H(k) FFT

Figure 6.19 Fast convolution algorithm

330

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

m?1 m m+1 m?1 m m+1 discarded ym+1 (n) L ym?1 (n) ym (n) xm+1 (n) 0 xm?1 (n) L N?1 xm (n)

Figure 6.20

Overlap data segments for the overlap-save technique

algorithms have been devised to have both zero latency as direct FIR ?ltering and computational ef?ciency [10]. There are two techniques for the segmentation and recombination of the data: the overlap-save and overlap-add algorithms.

Overlap-Save Technique

The overlap-save process overlaps Linput samples on each segment. The output segments are truncated to be nonoverlapping and then concatenated. The following steps describe the process illustrated in Figure 6.20: 1. Apply N -point FFT to the expanded (zero-padded) impulse response sequence to obtain H (k), where k = 0, 1, . . . , N ? 1. This process can be precalculated off-line and stored in memory. 2. Select N signal samples xm (n) (where m is the segment index) from the input sequence x(n) based on the overlap illustrated in Figure 6.20, and then use N -point FFT to obtain X m (k). 3. Multiply the stored H (k) (obtained in Step 1) by the X m (k) (obtained in Step 2) to get Ym (k) = H (k)X m (k), k = 0, 1, . . . , N ? 1. (6.54)

4. Perform N -point IFFT of Ym (k) to obtain ym (n) for n = 0, 1, . . . , N ? 1. 5. Discard the ?rst L samples from each IFFT output. The resulting segments of (N ? L) samples are concatenated to produce y(n).

Overlap-Add Technique

The overlap-add process divides the input sequence x(n) into nonoverlapping segments of length (N ? L). Each segment is zero-padded to produce xm (n) of length N . Follow the Steps 2–4 of the overlap-save method to obtain N -point segment ym (n). Since the convolution is the linear operation, the output sequence y(n) is the summation of all segments.

OVERLAP-ADD TECHNIQUE MATLAB implements this ef?cient FIR ?ltering using the overlap-add technique as

y = fftfilt(b, x);

331

The fftfilt function ?lters the input signal in the vector x with the FIR ?lter described by the coef?cient vector b. The function chooses an FFT and a data block length that automatically guarantees ef?cient execution time. However, we can specify the FFT length N by using

y = fftfilt(b, x, N)

Example 6.21: The speech data timit2.asc (used in Example 6.20) is corrupted by a tonal noise at frequency 1000 Hz. We design a bandstop FIR ?lter with edge frequencies of 900 and 1100 Hz, and ?lter the noisy speech using the following MATLAB script (example6_21.m):

Wn = [900 1100]/4000; % Edge frequencies b = fir1(128, Wn, 'stop'); % Design bandstop filter yn = fftfilt(b, xn); % FIR filtering using fast convolution soundsc(yn, fs, 16); % Listen to the filtered signal spectrogram(yn,kaiser(256,5),200,256,fs,'yaxis') % Spectrogram

MATLAB program example6_21.m plays the original speech ?rst, plays the noisy speech that is corrupted by the 1 kHz tone, and then shows the spectrogram with the noise component (in red) at 1000 Hz. In order to attenuate that tonal noise, a bandstop FIR ?lter is designed (using the function fir1) to ?lter the noisy speech using the function fftfilt. Finally, the ?lter output is played and its spectrogram is shown in Figure 6.21 with the 1000 Hz tonal noise being attenuated.

Figure 6.21

Spectrogram of bandstop ?lter output

332

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

6.6 Experiments and Program Examples

In the section, we will implement the DFT and FFT algorithms for DSP applications. The computation of the DFT and FFT involves nested loops, complex multiplication, and complex twiddle-factor generation.

6.6.1 Floating-Point C Implementation of DFT

kn For multiplying a complex data sample x(n) = xr (n) + j xi (n) and a complex twiddle factor W N = cos (2π kn/N ) ? j sin (2πkn/N ) = Wr ? j Wi de?ned in Equation (6.15), the product can be expressed as kn x(n)W N = xr (n)Wr + xi (n)Wi + j[xi (n)Wr ? xr (n)Wi ],

(6.55)

where the subscripts r and i denote the real and imaginary parts of complex variable. Equation (6.55) can be rewritten as X (n) = X r (n) + j X i (n), where X r (n) = xr (n)Wr + xi (n)Wi X i (n) = xi (n)Wr ? xr (n)Wi . (6.57a) (6.57b) (6.56)

The C program listed in Table 6.2 uses two arrays, Xin[2*N] and Xout[2*N], to hold the complex input and output samples. The twiddle factors are computed at run time. Since most of real-world applications contain only real data, it is necessary to compose a complex data set from the given real data. The simplest way is to zero out the imaginary part before calling the DFT function. This experiment computes 128-point DFT of signal given in ?le input.dat, and displays the spectrum using the CCS graphics. Table 6.3 lists the ?les used for this experiment. Procedures of the experiment are listed as follows: 1. Open the project ?le, float_dft128.pjt, and rebuild the project. 2. Run the DFT experiment using the input data ?le input.dat. 3. Examine the results saved in the data array spectrum[ ] using CCS graphics as shown in Figure 6.22. The magnitude spectrum shows the normalized frequencies at 0.125, 0.25, and 0.5. 4. Pro?le the code and record the required cycles per data sample for the ?oating-point implementation of DFT.

6.6.2 C55x Assembly Implementation of DFT

We write assembly routines based on the C program listed in Table 6.4 to implement DFT on TMS320C55x. The sine and cosine generators for experiments given in Chapter 3 can be used to generate the twiddle factors. The assembly function sine_cos.asm (see section ‘Practical Applications’ in Chapter 3) is a C-callable function that follows the C55x C-calling convention. This function has two arguments: angle and Wn. The ?rst argument contains the input angle in radians and is passed to the

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.2 List of ?oating-point C function for DFT

333

#include <math.h> #define PI 3.1415926536 void floating_point_dft(float Xin[], float Xout[]) { short i,n,k,j; float angle; float Xr[N],Xi[N]; float W[2]; for (i=0,k=0;k<N;k++) { Xr[k]=0; Xi[k]=0; for(j=0,n=0;n<N;n++) { angle =(2.0*PI*k*n)/N; W[0]=cos(angle); W[1]=sin(angle); Xr[k]=Xr[k]+Xin[j]*W[0]+Xin[j+1]*W[1]; Xi[k]=Xi[k]+Xin[j+1]*W[0]-Xin[j]*W[1]; j+=2; } Xout[i++] = Xr[k]; Xout[i++] = Xi[k]; } }

C55x assembly routine via the temporary register T0. The second argument is the pointer to Wn passed by the auxiliary register AR0, for which the computed results will be stored upon return. The calculation of the angle depends on two variables, k and n, as follows: Angle = (2π/N )kn. (6.58)

As shown in Figure 3.30, the ?xed-point representation of value π for sine–cosine generator is 0x7FFF. Thus, the angle used to generate the twiddle factors can be expressed as Angle = (2 · 0x7FFF/N )kn (6.59)

Table 6.3 Files

File listing for experiment exp6.6.1_floatingPoint_DFT Description C function for testing ?oating-point DFT experiment C function for 128-point ?oating-point DFT algorithm C function computes magnitude of 128 DFT results C header ?le for DFT experiment DSP project ?le DSP linker command ?le Data ?le

float_dft128Test.c float_dft128.c float_mag128.c float_dft128.h float_dft128.pjt float_dft128.cmd input.dat

334

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Figure 6.22

Input signal (top) and the magnitude spectrum (bottom)

Table 6.4 ; ; ; ; ; ;

List of C55x assembly implementation of DFT

DFT_128 – 128-point DFT routine Entry T0: AR0: pointer to complex input buffer AR1: pointer to complex output buffer Return: None

.def dft_128 .ref sine_cos N .set 128 TWOPIN .set 0x7fff>>6 .bss Wn,2 .bss angle,1 .text _dft_128 pshboth XAR5 bset SATD mov #N-1,BRC0 mov #N-1,BRC1 mov XAR0,XAR5 mov XAR0,XAR3 mov #0,T2 rptb outer_loop-1 mov XAR3,XAR5 mov #TWOPIN<<#16,AC0 mpy T2,AC0

; 2*PI/N, N=128 ; Wn[0]=Wr, Wn[1]=Wi ; Angle for sine-cosine function

; Save AR5 ; Repeat counter for outer loop ; Repeat counter for inner loop ; AR5 pointer to sample buffer ; ; ; ; k = T2 = 0 for(k=0;k<N;k++) { Reset x[] pointer hi(AC0) = 2*PI/N

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.4 mov mov mov mov rptb mov mov add mov amov call bset macm40 macm40 masm40 macm40 inner_loop mov mov add outer_loop popboth bclr ret #0,AC2 #0,AC3 #0,*(angle) AC0,T3 inner_loop-1 *(angle),T0 *(angle),AC0 T3,AC0 AC0,*(angle) #Wn,XAR0 _sine_cos SATD *AR5+,*AR0,AC2 *AR5-,*AR0+,AC3 *AR5+,*AR0,AC3 *AR5+,*AR0-,AC2 (continued )

335

; Xr[k] = 0 ; Xi[k] = 0 ; angle=2*PI*k/N ; for(n=0;n<N;n++) { ; T0=2*PI*k*n/N

; ; ; ; ; ; ; ; ; hi(AC2<<#-5),*AR1+ hi(AC3<<#-5),*AR1+ #1,T2 ; XAR5 SATD

Update angle AR0 is the pointer to Wn sine_cos(angle, Wn) sine_cos turn off FRCT & SATD XR[k] + Xin[n]*Wr XI[k] + Xin[n+1]*Wr XI[k] + Xin[n+1]*Wr - Xin[n]*Wi XR[k] + Xin[n]*Wr + Xin[n+1]*Wi End of inner loop

End of outer loop

for n = 0, 1, . . . , N ? 1 and k = 0, 1, . . . , N ?1. The C55x assembly routine listed in Table 6.4 computes 128-point DFT. This assembly program calls the assembly routine sine_cos to compute the twiddle factors. Table 6.5 lists the ?les used for the C55x assembly DFT experiment. This experiment uses C55x assembly program to compute 128-point DFT and magnitude spectrum. The zero-overhead loops allow program to initialize the loop counters BRC0 and BRC1 outside the nested loops. When implementing nested loops, the inner loop uses BRC1 while the outer loop uses BRC0. Procedures of the experiments are listed as follows: 1. Open the project ?le, asm_dft128.pjt, and rebuild the project. 2. Run the DFT project using the input data ?le input.dat.

Table 6.5 Files asm_dft128Test.c dft_128.asm mag_128.asm sine_cos.asm asm_dft128.h asm_dft128.pjt asm_dft128.cmd input.dat File listing for experiment exp6.6.2_asm_DFT Description C function for testing DFT experiment Assembly function for 128-point DFT Assembly function computes magnitude spectrum Assembly function computes twiddle factors C header ?le for DFT experiment DSP project ?le DSP linker command ?le Data ?le

336

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

3. Examine the results stored in spectrum[ ] using CCS graphics. It should have the same peaks in Figure 6.22 with the normalized frequencies at 0.125, 0.25, and 0.5. 4. Pro?le the C55x assembly language implementation of the DFT and compare the required cycles per data sample with the ?oating-point C experiment result obtained in previous experiment.

6.6.3 Floating-Point C Implementation of FFT

This experiment implements the complex, radix-2, decimation-in-time FFT algorithm using ?oatingpoint C. The ?oating-point FFT function is listed in Table 6.6. The ?rst argument is a pointer to the complex data array. The second argument passes the number of exponential values of the radix-2 FFT. The third argument is the pointer to the twiddle factors. The last argument is a ?ag used to determine if a scale is needed during the computation of the FFT algorithm. As explained in Section 6.4.2, the input data is scaled by 0.5 at each FFT stage. The twiddle factors are computed at the end of the function to reduce computational requirement. This FFT routine can be used for inverse FFT with the scale factors set to 1.0.

Table 6.6

List of ?oating-point C FFT function

void fft(complex *X, unsigned short EXP, complex *W, unsigned short SCALE) { complex temp; /* Temporary storage of complex variable */ complex U; /* Twiddle factor W^k */ unsigned short i,j; unsigned short id; /* Index for lower point in butterfly */ unsigned short N=1<<EXP;/* Number of points for FFT */ unsigned short L; /* FFT stage */ unsigned short LE; /* Number of points in sub DFT at stage L and offset to next DFT in stage */ unsigned short LE1; /* Number of butterflies in one DFT at stage L. Also is offset to lower point in butterfly at stage L */ float scale; scale = 0.5; if (SCALE == 0) scale = 1.0; for (L=1; L<=EXP; L++) /* FFT butterfly */ { LE=1<<L; /* LE=2^L=points of sub DFT */ LE1=LE>>1; /* Number of butterflies in sub-DFT */ U.re = 1.0; U.im = 0.; for (j=0; j<LE1;j++) { for(i=j; i<N; i+=LE) /* Do the butterflies */ { id=i+LE1; temp.re = (X[id].re*U.re - X[id].im*U.im)*scale; temp.im = (X[id].im*U.re + X[id].re*U.im)*scale;

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.6 (continued )

337

X[id].re = X[i].re*scale - temp.re; X[id].im = X[i].im*scale - temp.im; X[i].re = X[i].re*scale + temp.re; X[i].im = X[i].im*scale + temp.im; } /* Recursive compute W^k as U*W^(k-1) */ temp.re = U.re*W[L-1].re - U.im*W[L-1].im; U.im = U.re*W[L-1].im + U.im*W[L-1].re; U.re = temp.re; } } }

This experiment also uses the bit-reversal function listed in Table 6.7. This function rearranges the order of data samples according to the bit-reversal de?nition in Table 6.1 before the data sample is passed to the FFT function. This experiment computes 128-point FFT using ?oating-point C and displays the magnitude spectrum. The ?les used for this experiment are listed in Table 6.8. Procedures of the experiment are listed as follows: 1. Open the project ?le, float_fft.pjt, and rebuild the project. 2. Run the FFT experiment using the input data ?le input_f.dat.

Table 6.7 List of bit-reversal function in C

void bit_rev(complex *X, short EXP) { unsigned short i,j,k; unsigned short N=1<<EXP; /* Number of points for FFT */ unsigned short N2=N>>1; complex temp; /* Temporary storage of the complex variable */ for (j=0,i=1;i<N-1;i++) { k=N2; while(k<=j) { j-=k; k>>=1; } j+=k; if (i<j) { temp = X[j]; X[j] = X[i]; X[i] = temp; } } }

338

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Table 6.8 Files float_fftTest.c fft_float.c fbit_rev.c float_fft.h fcomplex.h float_fft.pjt float_fft.cmd input_f.dat File listing for experiment exp6.6.3_floatingPoint_FFT Description C function for testing ?oating-point FFT experiment C function for ?oating-point FFT C function performs bit reversal C header ?le for ?oating-point FFT experiment C header ?le de?nes ?oating-point complex data type DSP project ?le DSP linker command ?le Data ?le

3. Examine the results saved in spectrum[ ] using CCS graphics. The spectrum plot shows the normalized line frequency at 0.25. 4. Pro?le the FFT function and record the required cycles per data sample using the ?oating-point implementation.

6.6.4 C55x Intrinsics Implementation of FFT

In this experiment, we modify the ?oating-point C implementation of the FFT with the C55x intrinsics (see Table 5.4). We will replace the arithmetic operations with intrinsics _lsmpy, _smas, _smac, _sadd, and _ssub to implement the ?xed-point FFT. The FFT program that uses intrinsics is listed in Table 6.9. This experiment computes 128-point FFT and displays the magnitude spectrum. The program is a mix of ?xed-point C and intrinsics. Its run-time performance is greatly improved over the ?oating-point C implementation. Table 6.10 lists the ?les used for this experiment.

Table 6.9

Code segment of intrinsics implementation of FFT /* FFT butterfly */ /* LE=2^L=points of sub DFT */ /* Number of butterflies in sub DFT */

for (L=1; L<=EXP; L++) { LE=1<<L; LE1=LE>>1; U.re = 0x7fff; U.im = 0;

for (j=0; j<LE1;j++) { for(i=j; i<N; i+=LE) /* Do the butterflies */ { id=i+LE1; ltemp.re = _lsmpy(X[id].re, U.re); temp.re = (_smas(ltemp.re, X[id].im, U.im)>>SFT16); temp.re = _sadd(temp.re, 1)>>scale; /* Rounding & scale */ ltemp.im = _lsmpy(X[id].im, U.re); temp.im = (_smac(ltemp.im, X[id].re, U.im)>>SFT16);

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.9 (continued )

339

temp.im = _sadd(temp.im, 1)>>scale; /* Rounding & scale */ X[id].re = _ssub(X[i].re>>scale, temp.re); X[id].im = _ssub(X[i].im>>scale, temp.im); X[i].re = _sadd(X[i].re>>scale, temp.re); X[i].im = _sadd(X[i].im>>scale, temp.im); } /* Recursive compute W^k as W*W^(k-1) */ ltemp.re = _lsmpy(U.re, W[L-1].re); ltemp.re = _smas(ltemp.re, U.im, W[L-1].im); ltemp.im = _lsmpy(U.re, W[L-1].im); ltemp.im = _smac(ltemp.im, U.im, W[L-1].re); U.re = ltemp.re>>SFT16; U.im = ltemp.im>>SFT16; } }

Procedures of the experiment are listed as follows: 1. Open the project ?le, intrinsic_fft.pjt, and rebuild the project. 2. Run the FFT experiment using the data ?le input_i.dat. 3. Examine the results saved in spectrum[ ] using CCS graphics. The spectrum plot shows the normalized line frequency at 0.25. 4. Pro?le the intrinsics implementation of the FFT and compare the required cycles per data sample with the ?oating-point C FFT experiment result obtained in previous section.

6.6.5 Assembly Implementation of FFT and Inverse FFT

In this experiment, we use the C55x assembly routines for computing the same radix-2 FFT algorithm implemented by the ?xed-point C with intrinsics given in the previous experiment. The C55x FFT assembly routine listed in Table 6.11 follows the C55x C-calling convention. For readability, the assembly code mimics the C function closely. It optimizes the memory usage but not the run-time ef?ciency. The execution speed can be further improved by unrolling the loop and taking advantage of the FFT butter?y characteristics, but with the expense of the memory.

Table 6.10 Files intrinsic_fftTest.c intrinsic_fft.c ibit_rev.c intrinsic_fft.h icomplex.h intrinsic_fft.pjt intrinsic_fft.cmd input_i.dat File listing for experiment exp6.6.4_intrinsics_FFT Description C function for testing intrinsics FFT experiment C function for intrinsics FFT C function performs ?xed-point bit reversal C header ?le for ?xed-point FFT experiment C header ?le de?nes ?xed-point complex data type DSP project ?le DSP linker command ?le Data ?le

340

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

The assembly routine de?nes local variables as a structure using the stack-relative addressing mode. The last memory location contains the return address of the caller function. Since the status registers ST1 and ST3 will be modi?ed by the assembly routine, we use two stack locations to store the contents of these registers at entry, and they will be restored upon returning to the caller function. The complex temporary variable is stored in two consecutive memory locations by using a bracket with the numerical number to indicate the number of memory locations for the integer data type.

Table 6.11 .global ARGS .set _fft 0 List of C55x assembly implementation of FFT algorithm ; Number of variables passed via stack ; Define local variable structure ; Temporary variables (Re, Im)

FFT_var .struct d_temp .short (2) d_L .short d_N .short d_T2 .short d_ST1 .short d_ST3 .short d_AR5 .short dummy .short return_addr .short Size .endstruct fft .set 0 fft .tag FFT_var .sect ".text:fft_code" _fft: aadd #(ARGS-Size+1),SP mov mmap(ST1_55),AR2 mov mmap(ST3_55),AR3 mov AR2,fft.d_ST1 mov AR3,fft.d_ST3 mov AR5,(fft.d_AR5) btst @#0,T1,TC1 mov #0x6340,mmap(ST1_55) mov #0x1f22,mmap(ST3_55) xcc do_scale,TC1 mov #0x6300,mmap(ST1_55) do_scale mov T2,fft.d_T2 || mov #1,AC0 mov AC0,fft.d_L || sfts AC0,T0 mov AC0,fft.d_N mov XAR1,XCDP mov XSP,XAR4 add #fft.d_temp,AR4 mov XAR0,XAR1 mov T0,T1 mov XAR0,XAR5 outer_loop mov fft.d_L,T0 || mov #2,AC0

; ; ; ; ; ;

Used to save content of T2 Used to save content of ST1 Used to save content of ST3 Used to save content of ar5 Used to align stack pointer Space for routine return address

; Adjust stack for local variables ; Save ST1,ST3

; ; ; ;

Protect AR5 Check SCALE flag set Set CPL,XF,SATD,SXAM,FRCT (SCALE=1) Set: HINT,SATA,SMUL

; Set CPL,XF,SATD,SXAM (SCALE=2) ; Save T2 ; ; ; ; Initialize L=1 T0=EXP N=1<<EXP CDP = pointer to U[]

; AR4 = pointer to temp ; AR1 points to sample buffer ; Copy extended bits to XAR5 ; for (L=1; L<=EXP; L++) ; Note: Since the buffer is ; arranged in re,im pairs

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.11 sfts AC0,T0 ; neg T0 ; || mov fft.d_N,AC1 ; sftl AC1,T0 ; mov AC0,T0 ; || sfts AC0,#-1 mov AC0,AR0 ; || sfts AC0,#-1 sub #1,AC0 ; mov mmap(AC0L),BRC0 ; sub #1,AC1 ; mov mmap(AC1L),BRC1 ; add AR1,AR0 mov #0,T2 ; || rptblocal mid_loop-1 ; mov T2,AR5 ; mov T2,AR3 add AR0,AR5 ; add #1,AR5,AR2 ; add AR1,AR3 ; || rptblocal inner_loop-1 ; mpy *AR5+,*CDP+,AC0 ; :: mpy *AR2-,*CDP+,AC1 ; masr *AR5-,*CDP-,AC0 ; :: macr *AR2+,*CDP-,AC1 ; mov pair(hi(AC0)),dbl(*AR4); || mov dbl(*AR3),AC2 xcc scale,TC1 || mov AC2>>#1,dual(*AR3) ; mov dbl(*AR3),AC2 scale add T0,AR2 || sub dual(*AR4),AC2,AC1 ; mov AC1,dbl(*(AR5+T0)) ; || add dual(*AR4),AC2 ; mov AC2,dbl(*(AR3+T0)) ; inner_loop ; amar *CDP+ amar *CDP+ ; || add #2,T2 ; mid_loop ; sub #1,T1 add #1,fft.d_L ; bcc outer_loop,T1>0 ; mov fft.d_ST1,AR2 ; mov fft.d_ST3,AR3 mov AR2,mmap(ST1_55) mov AR3,mmap(ST3_55) mov (fft.d_AR5),AR5 mov fft.d_T2,T2 aadd #(Size-ARGS-1),SP ; ret (continued )

341

the index to the buffer is doubled But the repeat counters are not doubled LE=2<<L LE1=LE>>1 Init mid_loop counter BRC0=LE1-1 Initialize inner loop counter BRC1=(N>>L)-1 j=0 for (j=0; j<LE1;j++) AR5=id=i+LE1 AR5 = pointer to X[id].re AR2 = pointer to X[id].im AR3 = pointer to X[i].re for(i=j; i<N; i+=LE) AC0=(X[id].re*U.re -X[id].im*U.im)/SCALE AC1=(X[id].im*U.re +X[id].re*U.im)/SCALE AC0H=temp.re AC1H=temp.im

Scale X[i] by 1/SCALE

X[id].re=X[i].re/SCALE-temp.re X[id].im=X[i].im/SCALE-temp.im X[i].re=X[i].re/SCALE+temp.re X[i].im=X[i].im/SCALE+temp.im End of inner loop Update k for pointer to U[k] Update j End of mid-loop Update L End of outer loop Restore ST1,ST3,T2

Reset SP

342

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

We also write the bit-reversal function using C55x assembly language for improving run-time ef?ciency. Table 6.12 lists the assembly implementation of bit-reversal function. To reduce the computation of the FFT algorithm, we precalculate the twiddle factors using C function w_table.c during the setup process. In order to use the same FFT routine for the IFFT calculation, two simple changes are made. First, the conjugating twiddle factors imply the sign change of the imaginary portion of the complex samples; that is, X[i].im = -X[i].im. Second, the normalization of 1/N is handled in the FFT routine by setting the scale ?ag to zero.

Table 6.12 List of assembly implementation of bit-reversal function

.global _bit_rev .sect ".text:fft_code" _bit_rev psh mmap(ST2_55) ; bclr ARMS ; mov #1,AC0 sfts AC0,T0 ; mov AC0,T0 ; mov T0,T1 add T0,T1 mov mmap(T1),BK03 ; mov mmap(AR0),BSA01 ; sub #2,AC0 mov mmap(AC0L),BRC0 ; mov #0,AR0 ; mov #0,AR1 ; bset AR0LC ; bset AR1LC ; || rptblocal loop_end-1 ; mov dbl(*AR0),AC0 ; || amov AR1,T1 mov dbl(*AR1),AC1 ; || asub AR0,T1 xccpart swap1,T1>=#0 || mov AC1,dbl(*AR0+) ; swap1 xccpart loop_end,T1>=#0 || mov AC0,dbl(*(AR1+T0B)) loop_end ; pop mmap(ST2_55) ; ret

Save ST2 Reset ARMS bit T0=EXP, AC0=N=2EXP T0=N

Circular buffer size=2N Init circular buffer base Initialize repeat counter to N-1 Set buffer start address as offset = 0 Enable AR0 and AR1 as circular pointers Start bit reversal loop Get a pair of sample Get another pair

Swap samples if j>=i

End bit reversal loop Restore ST2

The experiment computes 128-point FFT, inverse FFT, and the error between the input and the output of inverse FFT. The ?les used for this experiment are listed in Table 6.13. Procedures of the experiment are listed as follows: 1. Open the project ?le, asm_fft.pjt, and rebuild the project. 2. Run the FFT experiment using the input data ?le input.dat. 3. Examine the FFT and IFFT input and output, and check the input and output differences stored in the array error[ ].

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.13 Files asm_fftTest.c fft.asm bit_rev.asm w_table.c asm_fft.h icomplex.h asm_fft.pjt asm_fft.cmd input.dat File listing for experiment exp6.6.5_asm_FFT Description C function for testing assembly FFT experiment Assembly function for FFT Assembly function performs bit reversal C function generates twiddle factors C header ?le for ?xed-point FFT experiment C header ?le de?nes ?xed-point complex data type DSP project ?le DSP linker command ?le Data ?le

343

6.6.6 Implementation of Fast Convolution

This experiment uses the overlap-add technique with the following steps:

r r r r r r

Pad M (N ? L) zeros to the FIR ?lter impulse response of length L where N > L, and process the sequence using an N -point FFT. Store the results in the complex buffer H[N]. Segment the input sequence of length M with L ? 1 zeros padded at the end. Process each segment of data samples with an N -point FFT to obtain the complex array X[N]. Multiply H and X in frequency domain to obtain Y. Perform N -point IFFT to get the time-domain ?ltered sequence. Add the ?rst L samples overlapped with the previous segment to form the output. Combine all the resulting segments to obtainy(n).

The C program implementation of fast convolution using FFT and IFFT is listed in Table 6.14. The ?les used for this experiment are listed in Table 6.15.

Table 6.14 for (i=0; i<L; i++) { X[i].re = LP_h[i]; X[i].im = 0; } w_table(U,EXP); bit_rev(X,EXP); fft(X,EXP,U,1); C program section for fast convolution /* Copy filter coefficient to work buffer */

/* Create Twiddle lookup table for FFT */ /* Bit reversal arrange the coefficient */ /* FFT to the filter coefficients */

for (i=0; i<N; i++) /* Save frequency domain coefficients */ { H[i].re = X[i].re <<EXP; H[i].im = X[i].im <<EXP; } /* Start FFT Convolution test */

continues overleaf

344

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Table 6.14 (continued )

j=0; for (;;) { for (i=0; i<M; i++) { X[i].re = input[j++];/* Generate input samples */ X[i].im = 0; re1[i] = X[i].re; /* Display re1[] shows all 3 freq. */ if (j==DATA_LEN) { j=0; } } for (i=i; i<N; i++) /* Fill zeros to data buffer */ { X[i].re = 0; X[i].im = 0; } /* Start FFT convolution*/ bit_rev(X,EXP); /* Arrange sample in bit reversal order */ fft(X,EXP,U,1); /* Perform FFT */ freqflt(X,H,N); /* Perform frequency domain filtering */ bit_rev(X,EXP); /* Arrange sample in bit reversal order */ fft(X,EXP,U,0); /* Perform IFFT */ olap_add(X,OVRLAP,L,M,N);/* Overlap and add algorithm */ }

Procedures of the experiment are listed as follows: 1. Open the project ?le, fast_convolution.pjt, and rebuild the project. 2. Run the fast convolution experiment using the data ?le input.dat. 3. Replace different FIR ?lter coef?cients provided in this project and pro?le the ?lter run time cycles for different ?lter lengths.

Table 6.15 Files

File listing for experiment exp6.6.6_fastconvolution Description C function for testing fast convolution experiment Assembly function for FFT Assembly function performs bit reversal Assembly function performs fast convolution Assembly function performs overlap-add C function generates twiddle factors C header ?le for fast convolution ?lter experiment C header ?le de?nes ?xed-point complex data type DSP project ?le DSP linker command ?le Data ?le Lowpass ?lter coef?cients from 8th to 512th

fast_convolution.c fft.asm bit_rev.asm freqflt.asm olap_add.asm w_table.c fast_convolution.h icomplex.h fast_convolution.pjt fast_convolution.cmd input.dat firlp8.dat - firlp512.dat

EXPERIMENTS AND PROGRAM EXAMPLES

Table 6.16 Files realtime_DSPBIOS.c fft.asm bit_rev.asm w_table.c plio.c realtime_fft.h icomplex.h plio.h lio.h rt_FFT_DSPBIOS.pjt rt_FFT_dspbioscfg.cmd rt_FFT_dspbios.cdb timit2.wav tone.wav unControl.wav File listing for experiment exp6.6.7_realtime_FFT_DSPBIOS Description C function for testing real-time FFT experiment Assembly function for FFT Assembly function performs bit reversal C function generates twiddle factors C function for interface PIP with low-level I/O functions C header ?le for FFT experiment C header ?le de?nes ?xed-point complex data type C header ?le for PIP with low-level I/O functions C header ?le for low-level I/O functions DSP project ?le DSP linker command ?le DSP/BIOS con?guration ?le Speech ?le Tone ?le Data ?le

345

6.6.7 Real-Time FFT Using DSP/BIOS

This experiment integrates the FFT and IFFT using DSP/BIOS for real-time demonstrations. The C5510 DSK takes input signal from an audio source, applies FFT, and uses the IFFT to convert it back to time domain for real-time playback. The input signals used for this experiment include speech, pure tone, and modulated tone. These signals are sampled at 8000 Hz and stored in wave ?le format. The experiment uses 128-point FFT. Table 6.16 lists the ?les used for this experiment. Procedures of the experiment are listed as follows: 1. Open the project ?le, rt_FFT_DSPBIOS.pjt, and rebuild the project. 2. Connect the line-in of the DSK with the source audio player and headphone-out to a headphone or loudspeaker.

Table 6.17 Files rt_fftFilter.c fft.asm bit_rev.asm freqflt.asm olap_add.asm olap_add.c plio.c rt_fftFilter.h icomplex.h plio.h lio.h bandstop_128tap.h rt_fftFilter.pjt rt_fftFiltercfg.cmd rt_fftFilter.cdb timit2.wav timit2_with_tone.wav File listing for experiment exp66.8_realtime_fftFilter Description C function for testing real-time FFT experiment Assembly function for FFT Assembly function performs bit reversal Assembly function performs fast convolution Assembly function performs overlap-add C function controls overlap-add process C function for interface PIP with low-level I/O functions C header ?le for fast convolution experiment C header ?le de?nes ?xed-point complex data type C header ?le for PIP with low-level I/O functions C header ?le for low-level I/O functions FIR ?lter coef?cients DSP project ?le DSP linker command ?le DSP/BIOS con?guration ?le Speech ?le Data ?le of speech + tone

346

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

(a) Speech signal with 1000 Hz tone interference, 1000 Hz tone at ---16.08 dB.

(b) After FIR filtering, the 1000 Hz tone has been reduced to ---72.75 dB.

Figure 6.23 Fast convolution for removing 1000 Hz interference from speech: (a) speech signal with the 1000 Hz tone interference, 1000 Hz tone at ?16.08 dB; (b) after FIR ?ltering, the 1000 Hz tone has been reduced to ?72.75 dB

REFERENCES 3. Run the experiment and listen to the playback of audio samples. 4. Repeat the experiment using different audio data ?les.

347

6.6.8 Real-Time Fast Convolution

This experiment uses the fast convolution with the DSP/BIOS for real-time FIR ?ltering. The ?les used for this experiment are listed in Table 6.17. The input signals used for this experiment include the speech ?les timit2.wav and timit2_with_tone.wav, which adds 1000 Hz tone to timit2.wav. Both wave ?les have sampling rate of 8000 Hz. The experiment uses 512-point FFT. The FIR ?lter was designed by MATLAB in Example 6.21. Procedures of the experiment are listed as follows: 1. Open the project ?le, exp6_rt_fftFilter.pjt. 2. Connect the line-in of the DSK with the audio source and headphone-out to a headphone or loudspeaker. 3. Open the source ?le, rt_fftFilter.c, turn off the conditional compile switch FAST CONVOLUTION, and recompile the project. This will disable the FIR ?ltering using fast convolution. 4. Run the project and listen to the audio playback. The 1000 Hz tone interference can be heard clearly because the ?lter is set in bypass mode. The magnitude spectrum shown in Figure 6.23(a) shows the strong presence of 1000 Hz tone. 5. Open the source ?le rt_fftFilter.c, set the conditional compile switch FAST CONVOLUTION on, and recompile the project. This will perform the fast convolution as FIR ?ltering in the project. 6. Rerun the project and listen to the real-time playback of audio samples. The 1000 Hz tone interference will be reduced as shown in Figure 6.23(b).

References

[1] D. J. DeFatta, J. G. Lucas, and W. S. Hodgkiss, Digital Signal Processing: A System Design Approach, New York: John Wiley & Sons, Inc., 1988. [2] N. Ahmed and T. Natarajan, Discrete-Time Signals and Systems, Englewood Cliffs, NJ: Prentice Hall, 1983. [3] S. M. Kuo and W. S. Gan, Digital Signal Processors, Upper Saddle River, NJ: Prentice Hall, 2005. [4] L. B. Jackson, Digital Filters and Signal Processing, 2nd Ed., Boston, MA: Kluwer Academic, 1989. [5] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Englewood Cliffs, NJ: Prentice Hall, 1989. [6] S. J. Orfanidis, Introduction to Signal Processing, Englewood Cliffs, NJ: Prentice Hall, 1996. [7] J. G. Proakis and D. G. Manolakis, Digital Signal Processing – Principles, Algorithms, and Applications, 3rd Ed., Englewood Cliffs, NJ: Prentice Hall, 1996. [8] A. Bateman and W. Yates, Digital Signal Processing Design, New York: Computer Science Press, 1989. [9] S. D. Stearns and D. R. Hush, Digital Signal Analysis, 2nd Ed., Englewood Cliffs, NJ: Prentice Hall, 1990. [10] W. Tian, Method for Ef?cient and Zero Latency Filtering in a Long-Impulse-Response System, European patent, WO0217486A1, Feb. 2002.

348

[11] [12] [13] [14]

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

Math Works, Inc.,Using MATLAB, Version 6, 2000. Math Works, Inc., Signal Processing Toolbox User’s Guide, Version 6, 2004. Math Works, Inc., Filter Design Toolbox User’s Guide, Version 3, 2004. Math Works, Inc., Fixed-Point Toolbox User’s Guide, Version 1, 2004.

Exercises

1. Compute the Fourier series coef?cients of cosine function x(t) = cos (2π f 0 t). 2. Compute the 4-point DFT of the sequence {1, 1, 1, 1} using the matrix equation given in Equation (6.19). 3. Compute X (0) and X (4) of 8-point DFT of sequence {1, 1, 1, 1, 2, 3, 4, 5}. 4. Prove the symmetry and periodicity properties of the twiddle factors de?ned as (a) W N

k+N /2 k = ?W N ;

k+N k (b) W N = WN .

5. Consider the following two sequences: x1 (n) = 1 and x2 (n) = n, 0 ≤ n ≤ 3. (a) Compute the linear convolution of these two sequences. (b) Compute the circular convolution of these two sequences. (c) Show how to pad zeros for these two sequences such that the circular convolution results are the same as linear convolution in (a). 6. Construct the signal-?ow diagram of FFT for N = 16 using the decimation-in-time method. 7. Construct the signal-?ow diagram of FFT for N = 8 using the decimation-in-frequency method. 8. Similar to Table 6.1, show the bit-reversal process for 16-point FFT. 9. Consider 1 s of digitized signal with sampling rate 20 kHz. It is desired to have the spectrum with a frequency resolution of 100 Hz or less. Is this possible? If impossible, what FFT size N should be used? 10. A 1 kHz sinusoid is sampled at 8 kHz. The 128-point FFT is performed to compute X (k). What is the computational resolution? At what frequency indices k we expect to observe peaks in |X (k)|? Can we observe the line spectrum? 11. A touch-tone phone with a dual-tone multifrequency (DTMF) transmitter encodes each keypress as a sum of two sinusoids, with two frequencies taken from each of the following groups: Vertical group: 697, 770, 852, 941 Hz; Horizontal group: 1209, 1336, 1477, 1633 Hz. What is the smallest DFT size N that we can distinguish these two sinusoids from the computed spectrum? The sampling rate used in telecommunications is 8 kHz. 12. Write a MATLAB script to verify the DFT and IDFT results obtained in Example 6.7.

EXERCISES

349

13. Similar to Example 6.9 and given x(n) = {1, 2, 3, 4, 5, 6, 7, 8} and h(n) = {1, 0, 1, 0, 1, 0, 1, 0}. Write MATLAB scripts that implement the following tasks: (a) linear convolution; (b) circular convolution using FFT and IFFT; and (c) fast convolution with zero padding of two sequences x(n) and h(n). 14. Similar to Example 6.14, compute ?xed-point FFT with Q15 format using the scaling factor 1/128 at stage 1. Does over?ow occur? Compare the results with scaling factor 0.5 at the input of each stage. 15. Similar to Example 6.14 but using a zero-mean, variance = 0.5, white noise instead of sinewave, compute Q15 FFT without scaling. How many over?ows occur in the quantized FFT? Try different scaling vectors, F.ScaleValues, and discuss the difference by using sinewave as input. 16. Similar to Example 6.17, use different techniques to distinguish two sinewaves at 60 and 61 Hz. 17. Write a C or MATLAB program to compute the fast convolution of a long data sequence with a short coef?cient sequence employing the overlap-save method introduced in Section 6.5.4. Compare the results with the MATLAB function fftfilt that uses overlap-add method. 18. The radix-2 FFT code used in the experiments is written in consideration of minimizing the code size. An alternative FFT implementation can be more ef?cient in terms of the execution speed with the expense of using more program memory. For example, the twiddle factor used by the ?rst stage and the ?rst group of other 0 stages is constant W N = 1. Therefore, the multiplication operations in these stages can be simpli?ed. Modify the assembly FFT routine given in Table 6.11 to incorporate this observation. Pro?le the run-time clock cycles and record the memory usage. Compare the results with those obtained by the experiment given in Section 6.6.5. 19. The radix-2 FFT is the most widely used algorithm for FFT computation. When the number of data samples is a power of 2m (i.e., N = 22m = 4m ), we can further improve the run-time ef?ciency by employing the radix-4 FFT algorithm. Modify the assembly FFT routine given in Table 6.11 to the radix-4 FFT algorithm. Pro?le the run-time clock cycles, and record the memory space usage for a 1024-point radix-4 FFT (210 = 45 = 1024). Compare the radix-4 FFT results with the results of 1024-point radix-2 FFT computed by the assembly routine.

0 20. Take advantage of twiddle factor W N = 1 to further improve the radix-4 FFT algorithm run-time ef?ciency. Compare the results of 1024-point FFT implementation using different approaches.

21. Most of the DSP applications use real input samples, our complex FFT implementation zeros out the imaginary components of the complex buffer (see experiment given in Section 6.6.5). This approach is simple and easy, but it is not ef?cient in terms of the execution speed. For real input, we can split the even and odd samples into two sequences, and compute both even and odd sequences in parallel. This approach will reduce the execution time by approximately 50 %. Given a real-value input x(n) of 2N samples, we can de?ne c(n) = a(n) + jb(n), where two inputs a(n) = x(n) and b(n) = x(n + 1) are real sequences. we can represent these sequences as a(n) = [c(n) + c? (n)]/2 and b(n) = ? j[c(n) ? c* (n)]/2, then they can be written in terms of DFTs as Ak (k) = [C(k) + C * (N ? k)]/2 and Bk (k) = ? j[C(k) ? C * (N ? k)]/2. Finally, the real input FFT can be obtained k k by X (k) = Ak (k) + W2N Bk (k) and X (k + N ) = Ak (k) ? W2N Bk (k), where k = 0, 1, . . . , N ? 1. Modify the complex radix-2 FFT assembly routine to ef?ciently compute 2N real input samples. 22. Write a 128-point, decimation-in-frequency FFT function in ?xed-point C or intrinsics and verify it using the experiment given in Section 6.6.4. Then write the FFT function using C55x assembly language and verify it using the experiment given in Section 6.6.5. 23. The TMS320C55x supports bit-reversal addressing mode. Replace the bit-reversal function with the C55x bit-reversal addressing mode for the experiment given in Section 6.6.5.

350

FREQUENCY ANALYSIS AND FAST FOURIER TRANSFORM

24. Develop an experiment to compute the PSD using the C55x simulator or DSK. Add Hamming window to compute PSD. 25. Develop an experiment for fast convolution using the overlap-save technique. Verify the fast convolution result using the data from the experiment given in Section 6.6.6. 26. Develop an experiment and compare the computational load between the experiment given in Section 6.6.6 using fast convolution method and using direct FIR ?ltering when FIR ?lter order is 512.

相关文章:

更多相关标签:

- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap2
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap4
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap5
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap3
- Real-time Digital Signal Processing
- Optical signal Real Time processing
- Digital Signal Processing - Princliples, Algorithms, and Applications - Solutions Manual
- Digital signal processing technology and applications in hearing aids
- # Wiley, Digital Signal Processing And Applications With The c6713 And c6416 Dsk (2005)
- A C++ DEVELOPMENT PLATFORM FOR REAL TIME AUDIO PROCESSING AND SYNTHESIS APPLICATIONS
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap4
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap3
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap2
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap1
- Real.Time.Digital.Signal.Processing.Implementations.and.Applications.chap5