FPGA Implementation of Wavelet Filters for DWMT Systems

Discrete Wavelet Multi-Tone (DWMT) systems acquired attention due to their high spectral efficiency and high data rates with respect to FFT-based multitone transmission systems. The complexity of the overall system is directly related to that of the elemental building block. In the literature, wavelet filters are designed subject to constraints for minimum interference. The structure of a Minimum Interference Wavelet Filter (MIWF) is very simple even for high filter orders. In this paper, DWMT systems using a two-branch wavelet filter bank in the transmitter and its inverse at the receiver are implemented using the Spartan XC3S1200E FPGA. The details of system implementation are presented for MIWF, Daubechies, and Coiflet wavelet filters. The tests show that, with respect to the other tested systems, the MIWF-based system is simpler, faster and capable to preserve its full precision BER performance even when the filter coefficients word size is reduced to 5 bits.


INTRODUCTION
The Discrete Wavelet Transform (DWT) has been successfully used in many applications such as digital signal and image processing and digital communications including the Discrete Wavelet Multi-Tone (DWMT) systems [1]- [5]. The implementation of DWT filters on Field Programmable Gate Array (FPGA) has been increasingly developed due to advances in technology and decreasing costs [4]. Generally, the computation of the DWT is complex and it is occasionally difficult to meet the requirements of real time operation [6]. Many FPGA architectures of the DWT have been proposed in the literature to optimize the speed and resource consumption [2], [5]- [8].
Referring to DWMT systems, wavelet filters are designed in [1] to meet the requirement of minimum Inter-Carrier Interference (ICI) and Inter-Symbol Interference (ISI) in these systems.
The most interesting feature of the designed Minimum Interference Wavelet Filters (MIWF) is their simple structure. That is, an (N-1)th order MIWF consists of only two nonzero components separated by (N-2) zeros. Therefore, the hardware of DWMT systems based on the MIWFs will be simpler than those employing other types of wavelet filters.
In this paper, equivalent DWMT systems using the MIWF, Daubechies, and Coiflet filters are implemented using FPGA. The BER performance of these systems is evaluated under different test conditions. Then, they are compared to each other in terms of hardware complexity and realtime applicability related parameters to show the superiority of the MIWFs over the other tested wavelet filters. The rest of the paper is organized as follow: A brief overview on DWMT systems is given in Section II. The details of FPGA implementation of the MIWF, Daubechies and Coiflet filters are presented in Section III. The BER performance and hardware complexity of the implemented systems are presented in Sections IV and V, respectively. Finally, conclusion remarks are given in Section VI.

DWMT SYSTEMS
A DWMT system is the counterpart of the well-known orthogonal Frequency Division Multiplexing (OFDM), where the DWT is used instead of the DFT. This gives the DWMT advantages over OFDM including better spectral efficiency and higher data rates [3], [9], [10]. The core of a DWMT system is the M-band Inverse DWT (IDWT) and DWT blocks in the modulator and demodulator, respectively. The system performance is directly related to the ability of the used wavelet filters to resist ICI and ISI and stay orthogonal [1]. Generally, the wavelet modulator is constructed by iterating an elemental two-branch filter bank block (consisting of low pass and high pass filters) as a binary tree to achieve the division of the spectrum into M orthogonal bands. This is known as Wavelet Packet Modulation (WPM) [1], [5], [9]- [12]. This structure provides a significant design property which is that the overall system design is reduced to the design of the two-branch filter bank [1], [10].
The implementation of conventional wavelet filters may be subjected to the limitations of the available hardware, such as limited data size, processor speed and storage. These together with additional parameters related to the architecture of the filters, lead in turn to degradation in the performance of the DWMT system. Whereas, the MIWFs presented in [1] are promising alternatives which are capable of neglecting most of the implementation difficulties.

FPGA IMPLEMENTATION
The SpartanXC3S1200Eplatform is used to implement three DWMT systems using the MIWF, Daubechies, and Coiflet wavelet filters. As shown in Fig.(1) , the system comprises the transmitter (Tx), receiver (Rx), communication channel and in addition to memory blocks. The memory blocks, namely input ROM, noise ROM and output RAM are designed to store input data, noise and recovered data, respectively. The transmitter and receiver blocks are the point of comparison since their implementation depends on the specific employed wavelet filter. In order to evaluate the performance of the implemented systems, a communication channel used with various parameters to provide different test conditions for fair system qualification. The system controller is used to control the blocks of the system and provides synchronization.

A. I/O Memory
The input data used in this work is taken to be a BPSK signal, therefore, two bits are sufficient to represent its samples in fixed point arithmetic. Due to the limited size of the built in memory of the used FPGA, the huge amount of data used for simulation (10 6 samples×2 bits for input data, 10 6 samples×16 bits for noise and 10 6 samples×2 bits for recovered data) are stored in text files on an external storage.
The input ROM is designed to operate as a dual port memory that reads data from the auxiliary storage file. In Fig.2(a) , it outputs two consecutive samples through the data ports D A and D B simultaneously in order to feed even and odd indexed samples to the transmitter's LPF and HPF, respectively. In order to implement the cases where the input data is a sequence of zeros such as the insertion of zeros required for oversampling or for signal padding, the Synchronous Set/Reset (SSR) signal is used. This signal controls the output buffer value of the ROM (in this case zero) to be transferred rather than the data associated with the current address.
The used SSR technique neglects the need to implement hardware modules for these operations.
The opposite function of the input ROM is the output RAM, shown in Fig.2(b) . It is used to store the recovered data in a file to be compared later with the original signal stored in the input ROM.

Fig. (2): Memory circuits for (a) transmitter (b) receiver
The data is written when the signal Write Enable (WE) for each port is enabled. The down sampling process is performed implicitly via the output RAM by clocking the WE signal by half the frequency speed of the system clock.

1) MIWF
The sampled impulse responses of the 6-tap MIWF filters are given as [1] [ ] And where  .7). The detailed structure of the MIWF-based transmitter is shown in Fig.(3) .

2) Daubechies and Coiflet Filters
The Daubechies (db3) and Coiflet (coif1) are used as the 6-tap wavelet filters. The coefficients of these filters are represented by 9 bits in the form of (1.8) to have the same accuracy as in the case of MIWFs. Transmitter architecture is illustrated in Fig.(4) . For the FIR realization, each multiplier can be replaced by a 2-input LUT as shown in Table (1) . The output of both LPF and HPF is bounded to 12 bits in the form of (4.

1) MIWF
The receiver filters are implemented using the same reduction explained in the transmitter, except that the shared multiplier is implemented before the addition performed in LPF or the subtraction in HPF, as shown in Fig.(6) .
Since the values of noisy data cannot be predicted, a dedicated 18x18 multiplier must be used rather than the LUT as in transmitter filters. However, more reduction can be gained by implementing only one shared delay chain for the LPF and HPF. The final output for both filters is then thresholded by assigning +1 (=01 2 ) for positive numbers and -1 (=11 2 ) for negative numbers.

2) Daubechies and Coiflet Filters
There is no possible reduction for both db3 and coif1 receiver filters due to the unpredictable input. The implemented receiver structure for these filters is illustrated in Fig.(7) . Fig. (7): Implemented db3 and coif1 receiver filters

E. System Controller
It is a four states FSM circuit used to control the simulation of the overall implemented system through the states (disable, reset, transmitting/ receiving and complete) using the control signals shown in fig.8.

BER PERFORMANCE
The BER performance of the implemented DWMT systems is evaluated by testing them under various filter coefficient representation sizes and channel conditions. A BPSK signal consisting of 10 6 statistically independent bits is transmitted and the BER is computed by comparing the contents of the input ROM and the output RAM.
As a reference BER performance, the implemented systems are simulated using MATLAB with full precision under the same channel conditions. The simulation results for an AWGN channel are shown in Fig.(9) , where the tested systems show equivalent performance. It is interesting to note that the MIWF-based system preserves its performance even in the case of sever coefficient truncation (5 bits) whereas the Coiflet-based system requires more SNR to achieve the same level of BER. The Daubechies-based system requires about 10 dB increase in SNR to achieve a BER of 10 -6 in the cases of 9 and 15 bits filter coefficients with respect to its full precision performance, and it fails to operate properly in the case of 5 bits. This advantage of the MIWF-based system is due to the simple internal arithmetic operations with respect to db3 and coif1-based systems. That is, in an n th order MIWF filter there are only two non-zero coefficients where as in db3 and coif1 filters there are n+1 non-zero coefficients, and the truncation in their binary representation will cause more error accumulation. which is considered as a sever channel condition. The MATLAB simulation results for this channel (a with AWGN) are shown in Fig.(11) . The three tested systems converge and reach the 10 -6 BER level but at greater SNR with respect to the case of AWGN channel, with the MIWFbased system is affected by the channel distortion more than the other tested systems. For the same channel, the BER performance of the FPGA implemented systems is shown in Fig.(12) for 5, 9, and 15 bits filter coefficient sizes, and 16-bit noise samples. Clearly, the Daubechies-based system fails to withstand the truncation, loses its orthogonality and becomes unable to achieve acceptable BER in any of the tested cases. Whereas, for the same test cases, the MIWF-based system preserves its performance and it is not affected by the truncation. The BER of the Coiflet-based system degrades for 5-bit filter coefficients and becomes better as the filter coefficient size is increased.

SYSTEM COMPLEXITY
The hardware complexity of the implemented systems is compared in terms of the number of slices, flip-flops, LUTs, and multipliers consumed from the available hardware on the Spartan XC3S1200Echip. Comparisons are presented for the overall system (including the Tx, Rx, communication channel, and system controller), transmitter alone, and receiver alone, all when implemented using different filter coefficient sizes (5, 9, and 15 bits). Tables (2) through (4) present the available and consumed hardware for the tested systems.  Generally, the hardware utilization of the Daubechies and Coiflet-based systems is almost the same. This is expected since the structure of the transmitter and receiver of these filters is closely the same. The MIWF-based system requires less hardware utilization for the same test conditions. However, among the observed parameters, the most important is the number of multipliers, since they are complex in structure and the most time and power consuming elements in the system. The MIWF-based system requires only one multiplier for filter coefficient sizes of 5 and 9 bits, whereas, the Daubechies-based system requires 7 and 12 multipliers, respectively.
For 15-bit filter coefficients, the MIWF-based system requires 2 multipliers and the Daubechies and Coiflet-based systems require 24 multipliers.
The simplicity in hardware results in faster operation. Tables (2) through (4) also show the speed of operation of the implemented systems in terms of the operating frequency in MHz. The speed of the channel and system controller modules, are the same for all tested systems. They are included in the speed of the whole system given in the tables. They may limit the speed of whole system when they are slower than the transmitter or the receiver. Therefore, the speeds presented for the transmitter and the receiver are more informative. Due to its simplicity, the speed of the MIWF-based Tx is greater than that of the other systems in all tested cases. In spite of being simpler, the speed of the MIWF-based Rx is equivalent to those of the Daubechies and Coifletbased Rxs since their multipliers are implemented to operate in parallel. However, as expected, the consumed dynamic power by the MIWF-based system is always less than that consumed by the other tested systems, as shown in Tables (2) through (4) . Finally, the system complexity comparisons show that the MIWF wavelet filters are simpler, faster, less power consuming and efficient for DWMT systems.

CONCLUSION
In this paper, we present an FPGA implementation of the elemental building block of WPM structured DWMT systems. Three types of wavelet filters are used, namely, the Daubechies, Coiflet, and the MIWF. The latter has a simple structure with respect to other wavelet filters. This simplicity enables significant reductions to the computational operations need to be implemented.
The BER performance tests show the ability of the MIWF-based system to withstand sever channel and data truncation conditions, whereas, Daubechies and Coiflet-based systems encounter performance degradation. Although this relative superior performance, the MIWFbased system shows performance degradation when tested over sever channel conditions. The hardware complexity of the implemented systems is compared in terms of the number of slices, flip-flops, LUTs, and required multipliers. These comparisons show clearly the great simplicity of the MIWF with respect to the other tested wavelet filters.
Therefore, it can be concluded that a multi-band DWMT system based on the MIWF will be significantly simpler, faster, and less power consuming as compared with equivalent systems based on Daubechies and Coiflet filters. A possible future work may be the implementation of multi-band DWMT system employing the MIWF and comparing its performance and hardware complexity with DWMT systems employing other types of wavelet filters.