BULETINUL INSTITUTULUI POLITEHNIC DIN IAȘI Publicat de Universitatea Tehnică "Gheorghe Asachi" din Iași Tomul LVII (LXI), Fasc. 3, 2011 Secția ELECTROTEHNICĂ. ENERGETICĂ. ELECTRONICĂ

## IMPLEMENTATION OF THE NORMALIZED LEAST MEAN SQUARES ALGORITHM ON THE CYCLONE II FIELD PROGRAMMABLE GATE ARRAY CIRCUIT

ΒY

# IOAN TUDOSĂ\*, CRISTIAN FOȘALĂU and CRISTIAN ZET

"Gheorghe Asachi" Technical University of Iaşi, Faculty of Electrical Engineering, Energetics and Applied Informatics

Received, June 12, 2011 Accepted for publication: August 16, 2011

**Abstract.** This paper presents the implementation of normalized least mean square (NLMS) adaptive filter, which can be used for signal denoising with high performances on field programmable gate array (FPGA) circuits. The implementation of filter has been designed with Quartus II integrated development environment (IDE) and tested on Cyclone II FPGA circuit from Altera, available on DE2 board. The design was created in Verilog programming language using modular structure. Filter testing is performed by using signals generated with a data acquisition board (DAQ) from National Instruments. This card was also used to acquire the processed signals delivered by DE2 platform. Filter performance is evaluated by computing signal-to-noise ratio for different filter lengths.

Key words: adaptive filter; signal processing; Cyclone II FPGA; NLMS algorithm.

### 1. Introduction

The basic idea of defining the functional role of a digital filter is to modify the amplitudes of harmonic components of a periodical signal, also to

<sup>\*</sup> Corresponding author: *e-mail*: <u>itudosa@ee.tuiasi.ro</u>

eliminate from spectrum the noisy type signals or to avoid certain harmonics from the signal. Therefore, filtering operation modifies some spectral components from the signal. The filtering performance is evaluated by the signal-to-noise ratio (SNR) at the output (Widrow & Stearns, 1985).

The least mean square (LMS) algorithm uses the method of steepest descent for computing the solution of the Wiener filter (Haykin, 2000). The LMS algorithm is one of the most important algorithms based on a stochastic gradient method (Widrow & Stearns, 1985; Diniz, 2008; Haykin, 2000). A very important feature of the algorithm is its simplicity of implementation in practice because it doesn't need to perform the correlation of the input signals. NLMS is an algorithm derived from LMS providing a faster convergence with a comparable stability.

This paper presents an efficient implementation of the NLMS algorithm using FPGA technology. Previous work on designing and implementation adaptive filters has been reported by Haykin (2000) and Widrow (1985). They tried to improve/reduce the NLMS algorithm complexity, but the real-time implementation still has limitation by signal sampling frequency. The major advantage of FPGA technology is the possibility of working at high sampling rates in comparison with classical digital signal processors (DSPs).

In the next sections, the adaptive filter structure used for signal denoising is presented, followed by description of the background theory of adaptive filtering using the LMS and NLMS algorithms on the basis of mathematical formulas. Next, the FPGA technology used in our approach along with the facilities of the DE2 development board are presented. Finally, the implementation of NLMS filter on hardware, followed by the results obtained after the practical implementation, are described and discussed.

### 2. Adaptive Filter Structure Used for Signal Denoising

To filter a noisy signal using an adaptive filter, two different filters structures may be used. First structure uses two inputs, one containing the signal which is intended to be filtered, and the second having a reference signal correlated with the unwanted signal overlapped on the signal of interest. This type of filter is used to clean of noise the signal that varies in time. An example of this kind of signals is the one delivered by an electrocardiograph. The second structure uses just one input which contains the noisy signal. This is used especially for periodical signals. This structure is depicted in Fig.1. In this figure the output, y(k), represents the filtered signal, the error signal, e(k), is the estimated noise and d(k) represents the noisy signal intended to be filtered. All input and output signals are sampled by the same sampling frequency, Fe.



#### **3. Normalized LMS Algorithm**

This algorithm is based on the same formal structure as the LMS algorithm (Widrow & Stearns, 1985; Diniz, 2008). When the amplitude's level of the input signal increases, the expected output noise will be greater due to multiplication of the gradient noise, produced by LMS estimator (Diniz, 2008). To solve this problem in a convenient way, a so-called *normalization* of the LMS algorithm was introduced. The idea of normalization is applied to coefficients vector of the adaptive filter, at the k+1-iteration of the algorithm, by the square of Euclidean norm of the input signal vector at the k iteration.

The input vector signal and filter coefficients vector, is defined as follows:

$$u(k) = [u(k) \quad u(k-1) \quad \dots \quad u(k-M+1)]^T$$
, (1)

where u(k) is the input vector signal at sampling time, k, and

$$\hat{w}(k) = [w_0(k) \quad w_1(k) \quad \dots \quad w_{M-1}(k)]^T,$$
 (2)

where  $\hat{w}(k)$  is the vector of estimated coefficients of the filter of length *M* at time *k* and  $w_M(k)$  are the coefficients.

The relation which updates the filter coefficients is

$$\hat{w}(k+1) = \hat{w}(k) + \frac{\tilde{m}}{\|u(k)\|^2} u(k) e^{*}(k) =$$

$$= \hat{w}(k) + \frac{\tilde{m}}{\|u(k)\|^2} u(k) (d^{*}(k) - u^H(k) \hat{w}(k)),$$
(3)

where  $\tilde{m}$  is a positive real scalar, and ||u(k)|| is the square Euclidean norm given by

$$\left\|u(k)\right\|^{2} = \sum_{k=1}^{M} u_{k}^{2} .$$
(4)

The  $\tilde{m}$  parameter, specific to the NLMS algorithm, divided to the Euclidian norm, gives the step of the NLMS algorithm

$$\boldsymbol{m}(k) = \frac{\widetilde{\boldsymbol{m}}}{\left\|\boldsymbol{u}(k)\right\|^2}.$$
(5)

Due to normalization and permanent change of the algorithm step-size  $\tilde{m}$  one obtains faster convergence rate with respect to the LMS algorithm one. For practical application it must be taken into account that when the vector length of the input signal (equal to the length of the transversal filter) is small, there may be numerical problems because there is a probability to divide by zero or by a very little value of  $||u(k)||^2$ . To protect the algorithm of such problems, a small constant quantity, d, to  $||u(k)||^2$  must be added. In this case, the new equation becomes

$$\hat{w}(k+1) = \hat{w}(k) + \frac{\tilde{m}}{d + \|u(k)\|^2} u(k) e^{*}(k), \ d > 0.$$
(6)

The values calculated using the LMS algorithm represent an estimation wanted to be as close as possible to the Wiener solution,  $w_0$ .

#### 4. FPGA Development Platform

The practical implementation of the proposed algorithm was accomplished using a development and education board (alias DE2) produced by Terasic, depicted in Fig. 2 as element 4. The main component is the Cyclone



Fig. 2 - Experimental test stand.

II FPGA circuit (Altera Co.) available on the board, which contains combinational logic functions and registers. The board has an embedded clock

of 50 MHz and also an audio codec WM8731 with a maximum sampling rate of 96 kHz. The audio codec is configured to work in this project as an interface between real signals generated by the NI data acquisition board (NI USB-6251), marked as 1, in Fig. 2 and the FPGA circuit. From Fig. 2 one may recognize the personal computer PC (2), which is necessary to develop the algorithm and the entire project with the Altera Quartus II version 9.0, and to program the FPGA circuit *via* JTAG interface available on DE2. In order to monitor the behavior of internal signals status on the FPGA circuit, a program called *Signal TAP Logic Analyser*, also produced by Altera, was used.

The Tektronix TDS 2024B oscilloscope 3 was employed in our scheme for watching the signals delivered by the board along with the signals acquired with NI USB-6251 card.

### **5. Practical Algorithm Implementation**

The NLMS algorithm was implemented in MATLAB/Simulink to observe the efficiency and to tune the miss-adjustments. The Simulink model is depicted in Fig. 3. One may observe from this figure the main block called *Normalized LMS*, which performs filtering and estimation of coefficients operations. In Fig. 4 is presented how the noisy signal is filtered. After 25...30 ms, the signal-to-noise ratio is improved, this meaning that the error signal, e(k), converges to zero, according to Fig. 5.



Fig. 3 - Simulink model of NLMS based FIR filter.

The experiments were performed using a noisy sinusoid with amplitude 1 summed with a Gaussian white noise, presented in Fig. 6. After the model was validated, the Verilog HDL code was generated. The source code was checked out for compatibility with Cyclone II before implementation on it.





For real implementation on FPGA circuits it should be taken into account the inter-relationship between the word-length of processed dates, the pipelining process of algorithm, the algorithmic variance used for different implementations for specific applications and the filter layout (Fig. 7). All of these can be tuned to achieve better performances of filter implementation on the FPGA circuits.



Fig. 7 – Inter-relationship between word-length, pipelining and algorithmic variance of filter architecture layout.

## 6. Obtained Results

In Fig. 8 two screenshots taken from the TDS 2024B oscilloscope are presented. In Fig. 8 a, on the first channel is shown the clean sinusoidal signal, on channel two the noisy sinus, on channel three the filtered signal obtained by using the



*a b* Fig. 8 – Experimental data acquired with TDS 2024B oscilloscope.



Fig. 9 - Experimental data expressed as output SNR vs. length.

NLMS algorithm implemented on Cyclone II and on the fourth channel the estimated noise. In Fig. 8 b is presented a triangular signal processed using the same filter. Different filter lengths were taken in order to analyse the behavior of SNR at the filter output. In Fig. 9 a comparison between three different SNR curves, obtained for three different filter lengths, is presented.

### 7. Conclusions

The paper presents the implementation of an adaptive filter using an NLMS algorithm for tuning the filter coefficients. The practical implementation was carried out on an FPGA circuit type Cyclone II. The obtained results prove the efficiency of adaptive filtering and the possibilities for tuning of the design.

Acknowledgments. This research was realized with the support of BRAIN "Doctoral Scholarships as an Investment in Intelligence" project, financed by the European Social Found and Romanian Government.

#### REFERENCES

- Diniz S.R.P., Adaptive Filtering Algorithms and Practical Implementation. Sec. Ed., Springer Sci., Rio de Janeiro, 2008.
- Haykin S., Adaptive Filter Theory. Third Ed., Prentice Hall, NY, 2000.
- Kuo M.S. et al., Real Time Digital Signal Processing Implementation and Applications. J. Wiley a. Sons Ltd., NY, 2001.
- Sayed A.H., Fundamentals of Adaptive Filtering. J. Wiley, New Jersey, 2003.
- Widrow B., Stearns D.S., Adaptive Signal Processing. Prentice Hall, NY, 1985.

\* \* http:// <u>www.altera.com</u>.

#### IMPLEMENTAREA ALGORITMULUI LMS NORMALIZAT PE CIRCUITUL FPGA CYCLONE II

#### (Rezumat)

Se studiază posibilitatea de implementare a unui filtru adaptiv bazat pe algoritmul LMS normalizat, care poate fi utilizat în filtrarea semnalelor cu performanțe ridicate la implementarea pe circuite FPGA. Implementarea filtrului a fost realizată cu mediul de programare Quartus II de la Altera fiind testată pe circuitul FPGA Cyclone II disponibil pe placa de dezvoltare de aplicații DE2. Proiectul a fost creat prin utilizarea mediului Verilog HDL utilizând structurarea modulară. Pentru testare s-a utilizat semnale simulate generate cu o placă de achiziții de date produsă de firma National Instruments și de asemenea cartela de achiziție a mai fost utilizată și pentru achițitionarea datelor prelucrare de către placa DE2. Performanțele filtrului implementat au fost evaluate prin trasarea a trei curbe raport semnal zgomot pentru diferite lungimi ale filtrului.