# **TRANSMITTER ARCHITECTURE FOR PULSED OFDM**

Kai-Chuan Chang, Gerald E. Sobelman, Ebrahim Saberinia and Ahmed H. Tewfik

Department of Electrical and Computer Engineering University of Minnesota Minneapolis, MN55455 USA

## ABSTRACT

A multi-band OFDM UWB system is the leading proposal for the new IEEE 802.15.3a standard for wireless high bit-rate personal area networks. Pulsed OFDM is an enhancement to the multi-band OFDM system that offers better performance with lower complexity and a correspondingly reduced power consumption. In this paper, the implementation issues and pipeline structures of a pulsed OFDM transmitter are presented. In particular, architectural features of the system and its components are described and analyzed.

# **1. INTRODUCTION**

Orthogonal frequency division multiplexing (OFDM) is an effective multi-carrier modulation technique that has been adopted in many communication schemes. It offers many features that make it a desirable modulation scheme for high bit-rate wireless communications [1]. In OFDM, an all-digital transceiver implementation at base-band can be achieved by using a Fast Fourier Transform (FFT) and its inverse (IFFT) to generate the sub-carriers. In a frequency-selective channel, coded OFDM is used to combat deep fading within the sub-channels. The multi-band OFDM ultra wideband (UWB) system (MB-OFDM) is a particular example of coded OFDM, which is a technique used in high bit-rate wireless networks [2]. However, in a severe multipath channel, heavy coding and interleaving are required to obtain sufficient diversity gain for such a system. Pulsed OFDM (P-OFDM) was proposed to enhance the performance of OFDM over a multipath fading channel by providing diversity gain with less complexity and power consumption than MB-OFDM systems [3]. In P-OFDM, diversity combining techniques can be used to improve the performance of the system in these difficult channels.

This paper is organized as follows: In Sections 2 and 3, brief descriptions of MB-OFDM and P-OFDM system are presented, respectively. Sections 4 and 5 describe implementation issues and design of a P-OFDM transmitter. Our conclusions are presented in Section 6.

# 2. MULTI-BAND OFDM

The multi-band OFDM ultra wideband (UWB) system (MB-OFDM) is the leading proposal for IEEE 802.15.3a high bit-rate wireless personal area networks [2]. In this proposal, the total available ultra wideband spectrum between 3.1 to 10.6 GHz is divided into fourteen sub-bands of smaller bandwidth. The bandwidth of each sub-band is 528 MHz in compliance with the FCC rules for UWB transmission. In each sub-band, a 128-tone OFDM signal is used to modulate QPSK data To combat frequency-selective fading, a symbols. convolutional encoder is used with rate equal to one-third and a constraint length of seven, and several puncturing options are specified Several modes of operation are defined, including the mandatory mode one in which only the lower 3 sub-bands are used. The main difference between MB-OFDM and a traditional OFDM system is that the data transmission is not done continually on all sub-bands. Instead, it is time-multiplexed between the different sub-bands.

### **3. PULSED OFDM**

Pulsed OFDM (P-OFDM) is a multi-carrier modulation that uses orthogonal pulsed sinusoidal waves as parallel sub-carriers to modulate the data instead of continuous sinusoids as in MB-OFDM. This can be done by up-sampling a normal OFDM signal before the D/A converter [3]. This up-sampling is done by inserting K-1 zeros between samples and the resulting P-OFDM signal is then a pulse train with a duty cycle equal to 1/K, where K is the processing gain of the P-OFDM system. As shown in [3], P-OFDM has superior performance than MB-OFDM in multipath environments while having lower complexity and power consumption. The proposed system uses only 32 sub-carriers instead of 128 as in the case of MB-OFDM to reduce complexity and a higher-rate convolution code is also used for the encoder. Furthermore, the band plan of P-OFDM will be the same as for MB-OFDM. For simplicity, only mode one operations, i.e. use of the lower 3 sub-bands, will be considered here.

# 4. SYSTEM COMPONENTS

In this section, the individual components of a P-OFDM transmitter are presented and analyzed in detail.

#### 4.1. Encoder

The rate  $\frac{1}{2}$  nonsystematic convolutional encoder has generator polynomials,  $g_0 = 171_8$  and  $g_1 = 133_8$ . It is the best rate  $\frac{1}{2}$  convolutional code with constraint length equal to 7 (same as in the MB-OFDM system), with minimum free distance equal to 10 [8]. Figure 1 shows the encoder design using D-type flip-flops with binary input X and binary outputs YA and YB. Two output flip-flops are used to remove unwanted glitches and to store the output signals. The synchronous RESET input resets the state of the encoder to the all-zero state. The outputs of the encoder can then be passed to the next stage for puncturing and serial to parallel conversion.

#### 4.2. Interleavers

Bit interleaving is used to provide robustness against burst errors. Two stages of interleaving are performed in series to combat frequency selective fading. The first stage is referred to as inter-interleaving and it permutes bits to exploit frequency diversity across the 3 lower sub-bands (i.e., inter sub-band interleaving). In the P-OFDM system, each OFDM symbol consists of 50 coded bits. The inter-interleaver interleaves 150 coded bits as follows [2]:

$$S(j) = U\{|i/50| + 3 \mod_{50}(i)\}; i, j = 0, 1, ..., 149$$
 (1)

where U(i) and S(j) are the input and output bits of the inter-interleaver, respectively. The second stage of interleaving, known as inner-interleaving, permutes the bits across data tones of each OFDM symbol within each sub-band (i.e., inner sub-band interleaving). Each 50 input bits are interleaved as follows:

$$T(j) = U\{ | i/5 | +10 \mod_{5}(i) \}; i, j = 0, 1, \dots, 49$$
(2)

where S(i) and T(j) are the input and output bits of the inner-interleaver, respectively. Figure 2 shows the interleaver operation using blocks of RAMs. The interleaving operation can be viewed as filling the RAMs in column-wise order and then outputting the bits in row-wise order.

To ensure continuous data flow, two 3-by-50 bit RAMs are required for the inter-interleaver and two 5-by-10 bits RAMs are required for the inner-interleaver. Figure 3 shows the block diagram of the interleavers, where X is the coded input bits, Y is the interleaved output bits and INN\_INDICATOR indicates if the output bits are interleaved. The RESET signal resets all four RAMs and sets the state of the interleaver to all-zero.



Figure 1: Convolutional encoder.







Figure 3: Interleavers.

## 4.3. FFT

To reduce the hardware complexity, a pipeline FFT structure can be used in the P-OFDM system. In a pipeline FFT, each stage has its own processing elements (i.e., butterflies, multipliers and commutators, etc.) and all stages are computed as soon as data are available. It is suitable for real-time applications where input data arrive in sequential order. Pipeline FFTs also have features such as simplicity and high throughput, which makes them attractive for high rate wireless applications. These architectures can be derived from the radix-2 decimation-in-frequency (DIF) signal flow graph (SFG) of the 32-point FFT shown in Figure 4a with inputs in sequential order.

The Radix-2 Multi-path Delay Commutator (R2MDC) is the classical approach to the pipeline FFT. This structure has  $\log_2 N$  stages, where N is the number of points in the FFT [4]. In the R2MDC, the input sequences are grouped into two parallel data streams which then enter the processing elements simultaneously with the correct number of delays (Figure 5a). FFT computations are done by passing computed values from one stage to the next and the results are produced in bit-reversed order as shown in Figure 4a. The Radix-2 Single-path Delay Feedback (R2SDF) pipeline FFT architecture can also be obtained from the SGF, which minimizes the number of delay In the R2SDF, a modified radix-2 elements [5]. butterfly must be used (Figure 5b). Both of these architectures have the same number of butterfly units and multipliers, but these have only a 50% utilization rate.



Figure 4: (a) Radix-2 SFG. (b) Mixed-radix 32-point DIF SFG.

In order to achieve 100% utilization of the functional units, a modified R2MDC can be used with buffered inputs as shown in Figure 6. In the buffered R2MDC, input data are stored in a size 4-by-16 sample buffer (RAM). The first stage radix-2 butterfly alternately fetches data from the first two columns of the input buffer and the last two columns of the input buffer every 16 clock cycles. Due to the parallelism of this architecture, the internal clock period of the buffered R2MDC is twice as long as input clock period, which allows more time for the processing elements to perform their computations.

In a buffered R2MDC, there are 5 stages of pipelined processing for each 32-point FFT computation. Alternate pipeline FFT architectures such as a radix-4 design are not suitable for our 32-point FFT operation due to the mismatch between the radix and the number of points [4] [6] [7]. However, an improvement can be made to reduce the number of stages by the use of a mixed-radix FFT. In the first two stages of processing, radix-4 butterflies will be used to compute the outputs while radix-2 butterflies will be employed in the last stage. Figure 4a shows the SFG for the 32-point mixed-radix FFT with inputs in digit-reversed order. Figure 7a shows the buffered Mixed-Radix MDC pipelined architecture for the 32-point FFT. The size of the input buffer will be 8-by-8 samples and the switches are more complex than in the R2MDC case. The buffered Mixed-Radix MDC can achieve a 100% utilization and its data flow timing is shown in Figure 7b for 2 frames of input samples (each frame consists of 32 samples for the 32-point FFT). Due to the parallelism of having 4 inputs each time, the internal clock period of this architecture is four times longer than the input sample clock period. Figure 8 provides a summary of all of the pipeline FFT architectures discussed.



| R = Input Clock | Bufferflies |         |          |        |             | Additional |             | Clock | # clock cycles |
|-----------------|-------------|---------|----------|--------|-------------|------------|-------------|-------|----------------|
| Rate            | radix-2     | radix-4 | switches | delays | multipliers | Components | Utilization | Rate  | for 32FFT      |
| R2MDC           | 5           | 0       | 4        | 46     | 3           | 1 MUX 2:1  | 50%         | R     | 51             |
| R2SDF           | 5           | 0       | 0        | 31     | 3           | 5 MUX 2:1  | 50%         | R     | 67             |
| MRMDC           | 2           | 2       | 2        | 80     | 4           | 1 MUX 4:1  | 25%         | R     | 42             |
| Buffered R2MDC  | 5           | 0       | 4        | 30     | 3           | 4x16 RAM   | 100%        | R/2   | 35             |
| Buffered MRMDC  | 2           | 2       | 2        | 32     | 4           | 8x8 RAM    | 100%        | R/4   | 18             |
|                 |             |         |          |        |             |            |             |       |                |

Figure 8: Summary of various pipeline FFT architectures.

#### 5. P-OFDM TRANSMITTER ARCHITECTURE

The P-OFDM transmitter can be constructed from the individual components of the previous section. First, binary data are entered into the convolutional encoder and the coded bits are re-permuted by the inter- and inner-interleavers to provide the system robustness against burst errors. The QPSK constellation mapping is used for the system to map two interleaved bits into one symbol. The inverse discrete Fourier transform (IDFT) operation can be computed with a DFT by conjugating the inputs and outputs of the DFT. Therefore, the IFFT can be implemented by adding two conjugating operation (i.e., two's complementing the imaginary part) at the inputs and outputs of the FFT. The QPSK\* constellation mapping can be used instead of QPSK to reduce number of conjugating operations to

The outputs of the pipelined FFT are only one. produced in bit-reversed or digit-reversed order as shown in Figure 4a for the 32-point FFT; therefore additional hardware is required to re-order the outputs into the correct sequence. This additional memory can be replaced by using a buffered MDC FFT with the input stream in digit-reversed order, as in Figure 4b. The buffered MDC FFT not only has high utilization but can also be used to save hardware; therefore it is more suitable for the P-OFDM system. The outputs of the IFFT are passed to the next stage for cyclic prefix, guard insertions, and serial to parallel conversion to eliminate inter-symbol and inter-channel interferences. Finally, the OFDM symbols are up-sampled and sent to the analog transmitter section, as shown in Figure 9.

The up-sampling operation in a P-OFDM system allows diversity gains to be achieved at the receiver side. The digital equivalent model of the P-OFDM system after constellation mapping is shown in Figure 10, where  $H_d(z)$  is the digital equivalent channel. The digital channel can be represented by using the polyphase decomposition as follows:

$$H_{d}(z) = \sum_{k=0}^{K-1} z^{-k} H_{k}(z^{K})$$
(3)

In terms of the polyphase decomposition representation, P-OFDM can be viewed as the parallel transmission of the same normal OFDM signal in different channels as shown in Figure 11. Then, a diversity combining technique can be used at the receiver in order to improve the performance of the overall communications system.



Figure 9: P-OFDM transmitter structure



Figure 10: Digital equivalent system for P-OFDM transmitter and channel.



Figure 11: K diversity branches by P-OFDM system.

## 6. CONCLUSIONS

As an alternative to MB-OFDM, the P-OFDM system not only has better performance over a multipath fading channel but also achieves this with less complexity and power consumption. This makes it an attractive candidate for use in high data rate wireless personal area networks between portable devices. In this paper, the implementation issues and design structure of the P-OFDM transmitter have been presented. Buffered MDC pipelined FFT structures are a good fit to the P-OFDM system architecture and achieve high throughput with reduced hardware. The performance improvement of the P-OFDM system is due to the up-sampling process after the IFFT. The up-sampling process allows K diversity branches to be used in the system. Diversity combining techniques such as maximal ratio combining and equal gain combining then can be applied at the receiver side.

Our future work will address the implementation issues and design structures of the corresponding P-OFDM receiver.

#### 7. ACKNOWLEDGMENT

This research was supported by NSF Grant No. CCR-0313224 and by an equipment grant from Intel Corporation.

## 8. REFERENCES

- [1] Richard Van Nee and Ramjee Prasad, *OFDM for Wireless Multimedia Communications*, Artech House, Boston, 2000.
- [2] Anuj Batra et al., "Multi-band OFDM physical layer proposal," merged proposal for IEEE 802.15.3a, http://ieee802.org/15/pub/Download.html, July 2003.
- [3] E. Saberinia and A.H. Tewfik, "Pulsed and non-pulsed OFDM ultra wideband wireless personal area networks," *IEEE UWBST*, 2003.
- [4] L.R. Rabiner and B. Gold, *Theory and Application of Digital Signal Processing*, Prentice-Hall, Inc., 1975.
- [5] E.H. Wold and A.M. Despain, "Pipeline and parallel-pieline FFT processors for VLSI implementation," *IEEE Trans. Comput.*, p414-426, May 1984.
- [6] A.M. Despain, "Fourier transform computer using CORDIC iterations," *IEEE Trans. Comput.*, p993-1001, Oct. 1974.
- [7] G. Bi and E.V. Jones, "A pipelined FFT processor for word-sequential data," *IEEE Trans. Acoust.*, p1982-1985, Dec. 1989.
- [8] S.B. Wicker, *Error Control Systems for Digital Communication and Storage*, Prentice Hall, 1995.