# DQPSK MODULATOR AND DEMODULATOR FOR WIRELESS NETWORK-ON-CHIP By # **Chien-Chuan Hung** A thesis submitted in partial fulfillment of the requirements for the degree of ## MASTER OF SCIENCE IN ELECTRICAL ENGINEERING ## **WASHINGTON STATE UNIVERSITY** School of Electrical Engineering and Computer Science **AUGUST 2011** | To the Faculty of Washington State University: | | |-----------------------------------------------------------------------------------------------------------|----------------------------------| | The members of the committee appointed to examine find it satisfactory and recommend that it be accepted. | e the thesis of CHIEN-CHUAN HUNG | | Thid it satisfactory and recommend that it be accepted. | | | | | | | | | | | | | | | | Partha Pratim Pande, Ph.D.,Chair | | | | | | | | | Benjamin Belzer, Ph.D. | | | | | | | | | Deuk Hyoun Heo, Ph.D. | | | | ### **ACKNOWLEDGEMENT** I would like to express my appreciation to my advisor Dr. Partha Pratim Pande, who gave me the opportunity to join his research group and to work on this research. I also would like to thanks him for his supporting in numerous ways that helped me complete this research. Special thank to Dr. Benjamin Belzer for his help when I had questions on the research. I would also like to thanks my colleagues Mr. Amlan Ganguly, Mr. Souradip Sarkar, Mr. Sujay Deb, Mr. Turbo Majumder and Mr. Kevin Chang for their help during my research. Discussing with them always helped me learn the new knowledge in our research field and understand the fundamental concept more clearly. Most importantly, I would like to thank my parents for their unconditional support. DQPSK MODULATOR AND DEMODULATOR FOR WIRELESS **NETWORK-ON-CHIP** Abstract by Chien-Chuan Hung, M.S. Washington State University **AUGUST 2011** Chair: Partha Pratim Pande The Wireless Network-on-Chip (WiNoC) is the interconnection methodology that supports the communication between large numbers of embedded cores in a single die through high bandwidth, long range, and single hop wireless links. An efficient modulator and demodulator design is crucial for the success of such WiNoC design. It requires the capability of achieving high data throughput with a limited bandwidth in a power efficient manner. To meet these design requirements, the Differential Quadrature Phase-Shift Keying (DQPSK) modulation scheme is chosen. DQPSK provides the same data throughput with less bandwidth compared to other modulation schemes, such as BPSK or BFSK. The differential property helps in reducing the complexity of demodulator circuit by eliminating the carrier recovery loop component and hence saves power. iv The complete modeling and simulation of DQPSK modulator and demodulator is presented and simulated using MATLAB SIMILINK toolbox. The design is finally implemented using VHDL and synthesized using 65nm CMOS technology libraries. # **CONTENTS** | ACKNOWLEDGEMENT | | |---------------------------------------------------------------------|------| | ABSTRACT | IV | | LIST OF FIGURES | VII | | LIST OF TABLES | VIII | | CHAPTER 1 INTRODUCTION AND OVERVIEW | 1 | | 1.1 Introduction | 1 | | 1.2 OVERVIEW | 5 | | CHAPTER 2 BACKGROUND AND RELATED WORK | 6 | | 2.1 BACKGROUND | 6 | | 2.1.1 Differential Quadrature Phase-Shift Keying (DQPSK) | 6 | | 2.1.2 Pulse shaping filter | 8 | | 2.2 Related work | 10 | | CHAPTER 3 DQPSK MODEM AND RESULTS | 13 | | 3.1 DQPSK Modulator | 13 | | 3.1.1 Phase selector and differential encoded signal $\; heta_{K}$ | 13 | | 3.1.2 In-phase and Quadrature-phase (I and Q) signals | 14 | | 3.1.3 Pulse Shaping Filter $g_{\tau}(t)$ | 15 | | 3.2 DQPSK DEMODULATOR | 19 | | 3.2.1 Pulse Shaping Filter $g_R(t)$ | 19 | | 3.2.2 Phase Comparator | 22 | | 3.2.3 Data sampler | 24 | | 3.3 DAC AND ADC | 26 | | 3.4 SIMULATION AND SYNTHESIZED RESULT | 29 | | CHAPTER 4 CONCLUSIONS AND FUTURE WORKS | 33 | | 4.1 CONCLUSIONS | 33 | | 4.2 Future work | 34 | | REFERENCE | 35 | # LIST OF FIGURES | FIGURE 1.1: DQPSK MODULATOR BLOCK DIAGRAM | 3 | |---------------------------------------------------------------------------------------------------|----| | FIGURE 1.2: DQPSK DEMODULATOR BLOCK DIAGRAM | 4 | | FIGURE 2.1: QPSK CONSTELLATION DIAGRAM | 7 | | FIGURE 2.2: ZERO ISI WITH UTILIZED OF PULSE SHAPING FILTER | 8 | | FIGURE 2.3: SINC FUNCTION IN FREQUENCY DOMAIN | 9 | | FIGURE 2.4: BLOCK DIAGRAM OF FIR FILTER | 10 | | FIGURE 2.5: BLOCK DIAGRAM OF LUT BASED PULSE SHAPING FILTER | 12 | | FIGURE 3.1: BLOCK DIAGRAM OF PHASE SELECTOR AND $\; heta_{K}$ | 14 | | FIGURE 3.2: EYE DIAGRAM OF (A) ORIGINAL RRC COEFFICIENT (B) 7 BITS RRC COEFFICIENT (C) 6 BITS RRC | | | COEFFICIENT (D) 5 BITS RRC COEFFICIENT | 16 | | FIGURE 3.3: SIMULINK MODEL FOR LUT BASE RRC FILTER | 18 | | FIGURE 3.4: DUAL EDGE TRIGGERED FLIP FLOP | 19 | | FIGURE 3.5: SIMULINK RRC FILTER AT DEMODULATOR | 20 | | FIGURE 3.6: PHASE COMPARATOR BLOCK DIAGRAM | 22 | | FIGURE 3.7: LOCATION OF TANGENT FROM SINE AND COSINE | 23 | | FIGURE 3.8: NEW PHASE DETECTOR BLOCK DIAGRAM | 24 | | FIGURE 3.9: DEMODULATOR WITH DATA SAMPLER | 25 | | FIGURE 3.10: 2-BITS ADC INPUT-OUTPUT TRANSFER CURVE | 27 | | FIGURE 3.11: DAC AND ADC SIMULINK MODEL | 28 | | FIGURE 3.12: (A) 5 BITS ADC (B) 4 BITS ADC | 29 | | FIGURE 3.13: SIMULATION RESULT FOR DQPSK MODULATOR | 30 | | FIGURE 3.14: SIMULATION RESULT FOR DQPSK DEMODULATOR | 31 | # LIST OF TABLES | TABLE 3.1: LUT OF PHASE SELECTOR AND $\; heta_{\scriptscriptstyle K} \;$ | 14 | |---------------------------------------------------------------------------|----| | TABLE 3.2: IN-PHASE AND QUADRATURE-PHASE (I AND Q) SIGNALS | 1 | | TABLE 3.3: SQUARE ERROR OF EACH SET OF BITS | 1 | | TABLE 3.4: LUT OF $\; heta_{\it K} - ( heta_{\it K} - 1)$ | 24 | | TABLE 3.5: SNR LOSS OF ADC IN DIFFERENT NUMBER OF BITS | 29 | ## **Chapter 1 Introduction and Overview** #### 1.1 Introduction Current design trend of System-on-Chip (SoC) is to integrate large number of cores in a single die. Network-on-Chip (NoC) has emerged as the preferred method to interconnect a very high number of embedded cores in a single die. Conventional NoC uses switch/routers and links to communicate between each core. Between largely separate cores, this gives rise to multi-hop communications, high latency and energy dissipation. This will become worse with technology scaling [1]. To alleviate the problems of high latency and energy dissipation, different novel NoC architectures have been proposed that include 3D NoCs, Photonic NoCs and multi-band RF NoC [2 - 4]. Though all the NoC architectures improve the latency and energy dissipation in some manners, further investigation is needed to determine their suitability for replacing and/or augmenting existing metal/dielectric-based planar multi-hop NoC architectures. Another promising direction to address the limitations of the metal/dielectric-based planar multi-hop NoC is to substitute the wired link with high bandwidth, long range, and single hop wireless links. This give rise to the so-called Wireless Network-on-Chip (WiNoC) architectures [5]. By establishing long range wireless links between distant cores and incorporating small-world (SW) network architecture, the millimeter wave Wireless Network-on-Chip (mWNoC) proposed in [6] achieves significant performance improvement in terms of network throughput, latency and energy dissipation compared to traditional NoC. We first divide the whole system into multiple small clusters of neighboring cores and call these smaller networks subnets. Subnets consist of relatively fewer cores, giving increased flexibility in designing their architectures. These subnets have NoC switches and links as in a standard NoC. As subnets are smaller networks, intra-subnet communication will have a shorter average path length than a single NoC spanning the whole system. The cores are connected to a centrally located hub through direct links and the hubs from all subnets are connected in a second level network forming a hierarchical structure. This upper hierarchical level is designed to have small-world graph characteristics constructed with both wired and wireless links. Performance of the on-chip wireless links depends on the efficient design of modulation/demodulation schemes at wireless transceivers. In this thesis, design of digital Differential Quadrature Phase Shift Keying (DQPSK) modulator and demodulator for millimeter wave Wireless Network-on-Chip (mWNoC) is presented. The current modulator and demodulator for mWNoC is predominantly On-Off Key (OOK). OOK is a simple form of Amplitude shift keying (ASK) modulation scheme and it sends the data by turning the carrier signal ON or OFF to represent binary '1' or '0'. The choice of modulation scheme is based on data throughput with same bandwidth, bit error rate (BER) and power efficiency. Quadrature Phase Shift Keying (QPSK) can achieve double data throughput with same bandwidth compared to binary modulation scheme such as BFSK and BPSK. QPSK has same power efficiency and same BER as BPSK. Power efficient M-ary modulation such as coherent FSK has advantages in terms of power, but the data throughput is lower than QPSK. The QPSK can transmit and receive differentially; this property simplifies the design of demodulator circuit by eliminating the carrier recovery loop design, which is the principal component to maintain the same carrier frequency as modulator. However, DQPSK requires 2.3 dB higher SNR to have same BER as QPSK. Figure 1.1: DQPSK modulator block diagram #### Baseband DQPSK Demodulator: Figure 1.2: DQPSK demodulator block diagram The main contribution of this work is the all-digital implementation of DQPSK modulator and demodulator. DQPSK transfers the data by mapping the two bits digital signal to one of the four modulated phase pattern. With incoming data stream in group of two bits, the modulator selects the phase from 0, $\pi/4$ , $\pi/2$ , $3\pi/4$ and add current phase with previous phase to form a differential encoded signal $\theta_K$ . The In-phase and Quadrature-phase (I and Q) signals are obtained by multiplying the $\theta_K$ to the cosine and negative sine. Both I and Q signals are transmitted and received by pulse shaping filter $g_T(t)$ at modulator and $g_R(t)$ at demodulator to maximize the SNR. The original data is recovered at phase comparator, which accepts current and delayed I and Q signals as input. The complete modeling and simulation of the DQPSK modem is carried out in MATLAB SIMULINK toolbox. The design is finally implemented using VHDL and synthesized using 65nm CMOS technology libraries. ## 1.2 Overview This thesis is organized in four chapters. Chapter 1 introduces the wireless Network-on-Chip and the motivation of designing DQPSK modulator for WiNoC to achieve higher data throughput with limited bandwidth. Chapter 2 covers the related works and background. Chapter 3 describes the design detail for each block in DQPSK, simulation result and evaluated the performance of the synthesized circuit. Chapter 4 concludes this work and presents the future research direction. ## Chapter 2 Background and Related work ## 2.1 Background #### 2.1.1 DIFFERENTIAL QUADRATURE PHASE-SHIFT KEYING (DOPSK) Modulation is wildly used in the communication system to transfer the data by changing the amplitude, frequency or phase of the carrier signal. With digital data stream as input, the digital modulation will be used to map the digital data into corresponding modulated carrier signal waveform. For example, the Binary modulation has input data in binary digit 1 or 0, where binary 0 will map to $s_0(t)$ waveform and binary 1 will map to $s_1(t)$ as output. Each input data is in group of b bits to form a symbol and $M = 2^b$ distinct modulated waveforms correspond to each symbol. The modulation is called *M-ary modulation* for M > 2 [8]. Quadrature Phase-Shift Keying (QPSK) is one type of M-ary modulation with M = 4 and each symbol will map to one of the four distinct waveforms with different phase. The QPSK can achieve double data throughput with same bandwidth, same bit error rate (BER) and power efficiency compared with Binary PSK (BPSK). The trade-off of using QPSK over BPSK is the design complexity of circuit. The constellation diagram of mapping the symbol to waveform is shown in Fig 2.1. Four symbols located at 0, $\pi/4$ , $\pi/2$ , $3\pi/4$ with equal spacing, the assignment of each symbol to each phase is base on the Gray encoding to minimize the bit error rate since each adjacent symbol only has one bit change. Figure 2.1: QPSK constellation diagram The PSK can transmit and receive differentially giving rise to the so-called DPSK modulation. The differential property simplifies the demodulation by comparing the current differential encoded signal K with preceding signal K-I, hence the design of carrier recovery loop is not required. This property is illustrated mathematically from [8]. The differential Kth and K-Ith encoded signals are demodulated by multiplying $\cos(2\pi f_C t)$ and $\sin(2\pi f_C t)$ . The demodulated outputs are $$r_k = \sqrt{\varepsilon_s} e^{j(\theta_k - \phi)} + n_k$$ $$r_{k-1} = \sqrt{\varepsilon_s} e^{j(\theta_{k-1} - \phi)} + n_{k-1}$$ where $\theta_k$ is the phase angle, $\phi$ is the carrier phase and $n_k$ is the noise vector. Recovering the data from transmitter requires the decision variable for phase detector to determine the phase and it is the phase difference between $r_k$ and $r_{k-1}$ . Without the noise term, the phase difference $\theta_k - \theta_{k-1}$ can be obtained by projecting $r_k$ onto $r_{k-1}$ as $$r_k r_{k-1}^* = \varepsilon_s e^{j(\theta_k - \theta_{k-1})} + \sqrt{\varepsilon_s} e^{j(\theta_k - \phi)} n_{k-1}^* + \sqrt{\varepsilon_s} e^{-j(\theta_{k-1} - \phi)} n_k + n_k n_{k-1}^*$$ The DPSK has many advantages over PSK, but the SNR required to have same probability of bit error is higher than the original PSK modulation. The performance of DQPSK is 2.3dB poorer than QPSK. #### 2.1.2 PULSE SHAPING FILTER For high data throughput modulation, the design of matched filter at transceiver is crucial to avoid the Inter Symbol Interference (ISI). ISI is a form of signal distortion when a symbol spreads beyond the given time interval and overlaps subsequent symbol to cause error. It occurs when transmitting signals through band limited channel or multipath propagation through the wireless Figure 2.2: Zero ISI with utilized of pulse shaping filter medium. Both problems can be addressed by utilizing of pulse shaping filter. Fig 2.2 shows by utilizing pulse shaping filter, ISI can be avoided. Pulse shaping filter also minimizes the sharp transition to reduce the transmission power. The impulse response of Sinc function in time domain is the ideal shape for pulse shaping filter. It has the characteristic of fast decaying beyond the center frequency to avoid ISI. In frequency domain, the rectangular shape of Sinc function is equivalent to the ideal low pass filter, which has the cut off frequency B Hz [8, 9] and Fig 2.3 shows the Sinc function in frequency domain. Despite the advantages of Sinc function utilized as pulse shaping filter, the Sinc function and impulse response are not a causal signal and hence the Figure 2.3: Sinc function in frequency domain Sinc function cannot be designed in real system. To approximate the Sinc function, a raised cosine function is commonly used. The raised cosine filter is a low pass Nyquist filter and the behavior in frequency domain can be described as $$H(t) = \begin{cases} T & \left(0 \le |f| \le \frac{1-\beta}{2T}\right) \\ \frac{T}{2} \left\{1 + \cos\left[\frac{\pi T}{\beta}\left(|f| - \frac{1-\beta}{2T}\right)\right]\right\} & \left(\frac{1-\beta}{2T} \le |f| \le \frac{1+\beta}{2T}\right) \\ 0 & \left(|f| > \frac{1+\beta}{2T}\right) \end{cases}$$ where T is symbol rate, $\beta$ is roll-off factor for measureing the excess bandwidth, for example, $\beta=0.5$ has 50% excess bandwidth. For the matched filter in the tranceiver, the Root Raised Cosined (RRC) filter is used in pair to achieve the raised cosine response at reciver side. #### 2.2 Related work The pulse shaping filter in this work is implemented with digital FIR filter since it is more stable compared with IIR filter [9]. The conventional FIR filter output can be described as the convolution of input data and impulse response. In discrete time system, the output of digital filter is equivalent to the sum of current and previous input as $$y(n) = \sum_{i=0}^{N} b_i x(n-i)$$ where y(n) is the output signal, x(n) is input signal, $b_i$ are the filter tap coefficients and N is the filter order. The FIR filter has to be oversampled at least twice of highest frequency to obtain the useful response (Nyquist rate). Digital FIR filter can be implemented easily with delay units, Figure 2.4: Block diagram of FIR filter adders and multipliers as shown in fig 2.4 [9]. The tap number of filter will determine the output frequency response and the duration of the impulse response. The trade-offs between different numbers of taps are the design complexity and operation speed of the circuit. With higher clock frequency, the design of conventional FIR filter is not sufficient due to the carry propagation delay of adder and the requirement of high speed multiplier and adder. The unique look-up table (LUT) design approach proposed in [10] can simplify the pulse shaping filter design by eliminating adders and multipliers. The investigations of [10] were carried out for BPSK modulation with 13 taps raised cosine filer and roll-off factor of 0.25. This work incorporated the LUT design technique for pulse shaping filter at DQPSK modulator. Same number of filter taps and roll-off factor 0.25 is applied to the filter. All the possible outcomes from the filter are calculated by multiplying all the possible input to the tap coefficients and the result is stored in LUT. To meet the Nyquist rate, the upsampling factor of filter should be two, hence the LUT is divided into even and odd set and outputs at both edges of clock. Further LUT optimization can be done by realizing and removing the zero coefficients. The tap coefficients shown in even and odd sets are $Coeffice int: \ [0, 0.0866, 0, -0.1856, 0, 0.6274, 1, 0.6274, 0, -0.1856, 0, 0.0866, 0]$ $Odd \ set: [0(d0), 0(d1), 0(d2), 1(d3), 0(d4), 0(d5), 0(d6)]$ $Even\ set: [0.0866(d1), -0.1856(d2), 0.6274(d3), 0.6274(d4), -0.1856(d5), 0.0866(d6)]$ The odd set of coefficients has only one nonzero term and the value is one, hence the odd set of LUT has two possible outputs. The even set of LUT is calculated by multiplying all the possible input to the even coefficients. For example, if all the inputs are one, then the sum of output will be $$0.0866 - 0.1856 + 0.6274 + 0.6274 - 0.1856 + 0.0866 = 1.0586.$$ The LUT based pulse shaping filter block diagram is shown in Fig 2.5. The Data Flip-Flop (D-FF) is used as delay units and only six delay units is required since the odd set of LUT only has two possible outcome. Figure 2.5: Block Diagram of LUT based pulse shaping filter ## **Chapter 3 DQPSK modem and Results** The digital DQPSK modulator and demodulator are modeled and simulated using MATLAB SIMULINK toolbox and implemented using VHDL. Design details for each component are elaborated in this chapter. The Digital to Analog Converter (DAC) and Analog to Digital Converter (ADC) for DQPSK are not designed in this work. However, the ideal DAC and ADC in different number of bits are simulated to illustrate the quantization error. ## 3.1 DQPSK Modulator The DQPSK Modulator accepts two bits as input and the components designed in this work includes phase selector, differential signal encoder, In-phase and Quadrature-phase (I and Q) signals and pulse shaping filter $g_T(t)$ . ## 3.1.1 PHASE SELECTOR AND DIFFERENTIAL ENCODED SIGNAL $\theta_K$ The function of phase selector is to decide one of four phases $0^{\circ}$ , $90^{\circ}$ ( $\pi/4$ ), $180^{\circ}$ ( $\pi/2$ ) or $270^{\circ}$ ( $3\pi/4$ ) from two bits input. The selected phase then encode differentially by add current phase and previous phase. Both functions can be implemented simultaneously by the look-up table shows in Table 3.1. | Current | 0°(00) | 90°(01) | 180°(11) | 270°(10) | |----------|----------|----------|----------|----------| | Previous | | | | | | 0°(00) | 0°(00) | 90°(01) | 180°(11) | 270°(10) | | 90°(01) | 90°(01) | 180°(11) | 270°(10) | 0°(00) | | 180°(11) | 180°(11) | 270°(10) | 0°(00) | 90°(01) | | 270°(10) | 270°(10) | 0°(00) | 90°(01) | 180°(11) | Table 3.1: LUT of phase selector and $\theta_K$ The LUT contains all the possible outcomes by adding the current and previous phase. Since the DQPSK modem is designed digitally, the two bits input can be used to represent phase directly. The result of adding the two phases is always one of the four point located at the constellation diagram, hence the possible output is limited. The block diagram of phase selector and $\theta_K$ is shown in fig. 3.1 and the $\theta_K - 1$ signal is obtained from one unit delay of $\theta_K$ . Figure 3.1: Block diagram of phase selector and $\theta_K$ ## 3.1.2 IN-PHASE AND QUADRATURE-PHASE (I AND Q) SIGNALS The In-phase and Quadrature-phase (I and Q) signals are formed by multiplying the $\theta_K$ to cosine and negative sine. Table 3.2 shows the output of I and Q signals corresponding to each input. Three possible outputs of I and Q signals are 0, 1 and -1, hence two bits are sufficient to represent the output. | | $\cos( heta_K)$ | -SIN $(\theta_K)$ | |----------|-----------------|-------------------| | 0°(00) | 1(01) | 0(00) | | 90°(01) | 0(00) | -1(11) | | 180°(11) | -1(11) | 0(00) | | 270°(10) | 0(00) | 1(10) | Table 3.2: In-phase and Quadrature-phase (I and Q) signals #### 3.1.3 PULSE SHAPING FILTER $G_T(T)$ Root Raised Cosine (RRC) match filter is chosen as pulse shaping filter to achieve the raise cosine response. LUT design technique is applied to the RRC filter at modulator to simplify filter design. From [10], removing the zero coefficients can optimize the LUT, but the RRC filter in this work has only two zero coefficients and the complete LUT calculation are required. MATLAB code is written to calculate the RRC coefficients based on the roll off factor 0.25 and same tap numbers from [10]. The 13 tap coefficients for RRC filter is $$coefficients : \begin{bmatrix} 0.003, -0.015, 0.0424, -0.075, -0.1061, 0.5786, 1.1366, \\ 0.5786, -0.1061, -0.075, 0.0424, -0.015, 0.003 \end{bmatrix}$$ Before computing the LUT from 13 tap coefficients, the number of bits needed to represent the coefficients is determined by comparing the square error of each set of bits. The square error of original RRC coefficient is first computed and used as reference when compared with rest of RRC coefficients in different bits. Starting from 10 bits and decrease 1 bit at each time, the square error of 5 bits is further increased compare to other bits; hence 6 bits are chosen to represent the RRC coefficients. The square error of each set of bits is shown in table 3.3 and the eye diagrams from simulation in Fig 3.2 illustrate the filter performance in different number of bits. The SNR of 10 bits to 6 bits and original coefficient are the same and 5 bits RRC coefficients have 0.23 dB losses on upper eye and 0.28 dB SNR losses on lower eye compared with original coefficient. Figure 3.2: Eye diagram of (a) original RRC coefficient (b) 7 bits RRC coefficient (c) 6 bits RRC coefficient (d) 5 bits RRC coefficient | Number of bits to re | epresent coefficient | Square error of each set of bits | | |----------------------|----------------------|----------------------------------|-----------------------| | Original | 7 bits | 1.013481142971824e-04 | 4.742609529534075e-04 | | 10 bits | 6 bits | 9.964663316498880e-05 | 4.449447975087000e-04 | | 9 bits | 5 bits | 9.466376236810374e-05 | 2.843462134797815e-03 | | 8 bits | | 9.466376236810374e-05 | | Table 3.3: Square error of each set of bits The new RRC coefficients is shown as $$coefficients: \begin{bmatrix} 0, -0.015625, 0.046875, -0.078125, -0.109375, 0.578125, 1.140625, \\ 0.578125, -0.109375, -0.078125, 0.046875, -0.015625, 0 \end{bmatrix}$$ LUT calculation is done by multiplying tap coefficients to all the possible inputs and divided into EVEN and ODD set to meet the Nyquist rate as explained in [10]. From table 3.2, the possible inputs to the filter are 0, 1 and -1, hence the EVEN set of LUT has 729 (3<sup>6</sup>) and ODD set has 2187 (3<sup>7</sup>) possible output. After calculating all the possible output, the scaling and biasing form [10] is applied to all the output and covert to the binary bits. The outputs after scaling and biasing are positive integer and 6 bits is sufficient to represent all the outputs. Other advantage of scaling and biasing the output is to reduce the design complexity of Digital-to-Analog converter (DAC) after pulse shaping filter. Figure 3.3: SIMULINK model for LUT base RRC filter Fig 3.3 shows the SIMULINK model of LUT base RRC filter, each input from delay units is multiplied to the corresponding gain and sum of all the number is fed to LUT. Since the SIMULINK look-up table function only accepts the positive integer as input, the two extra steps described above are required to convert binary bits to positive integer. VHDL implementation of pulse shaping filter $g_T(t)$ requires two delay unit line because the two bits input. To perform delay at both clock edges, the dual edge triggered flip flop is used. One simple and synthesizable design of dual edge triggered FF is using both positive and negative trigger FF with multiplexer. Figure 3.4: Dual edge triggered flip flop Fig 3.4 shows the bock diagram of dual edge triggered flip flop. Unlike the SIMULINK look-up table, which accepts positive integer as input, the input of LUT implemented in VHDL is binary bits and the designer defines the format. The input format in this work is first two bits from delay unit concatenated with the rest of the subsequent two bits from other delay unit. For example, if all the number from delay unit is 1, then EVEN set of table input is [010101010101]. ## 3.2 DQPSK Demodulator The DQOSK Demodulator has two main components, which are pulse shaping filter and phase comparator. The received data from modulator is biased and scaled, hence removing the biasing and scaling is required to obtain the correct data for demodulator to process. ## 3.2.1 PULSE SHAPING FILTER $G_R(T)$ The look up table design technique of Root Raised Cosine (RRC) match filter at modulator is not applicable for RRC filter at demodulator due to large amount of possible input and the conventional FIR filter design is applied to the demodulator. To reduce the circuit complexity, the folded design technique is chosen. For the linear phase FIR filter of order M satisfying the symmetric condition, the number of multiplication can be reduced from M to M/2+1 [9]. The SIMULINK model of folded RRC filter is shown in fig 3.5. The symmetric pair of numbers from delay units are first added and then multiplied to the corresponding tap coefficients. All the result after multiply the tap coefficient is added and divided by two since the amplitude is double after convolution. Figure 3.5: SIMULINK RRC filter at demodulator The IEEE arithmetic library has built in adder and multiplier and both functions support sign number operation, hence they are used when implementing the filter in VHDL. Since the filter is operating at both clock edges, the adder and multiplier also need to operate at both clock edges. The final adder to add all the numbers after multiplied to tap coefficients is split into two stages because adding six numbers cannot be done in single clock cycle. All the numbers after removing the biasing and scaling are floating point numbers and they need to be in 2's complement format for IEEE arithmetic functions to do the calculation. The format to represent the floating point number in this work is | 9 | Sign<br>Bit | $2^7$ | $2^{6}$ | 2 <sup>5</sup> | $2^4$ | $2^3$ | $2^2$ | 21 | $2^0$ | × | 2-7 | |---|-------------|-------|---------|----------------|-------|-------|-------|----|-------|---|-----| |---|-------------|-------|---------|----------------|-------|-------|-------|----|-------|---|-----| where the first bit is sign bit and rest of bits are the number in 2's complement. The base two exponent multiplications are not included in the format; the purpose of it is to illustrate the actual floating point number. For example, the tap coefficient 0.578125 is 001001010 and equal to $$(2^6 + 2^3 + 2^1) \times 2^{-7} = 74 \times 0.0078125 = 0.578125$$ The exponent multiplication can be ignored since it will not affect the arithmetic result and finally the exponent will be canceled when comparing the numbers in phase comparator. The input number after removing the biasing and scaling is 8 bits, the tap coefficient is 7 bits and the result after multiplying is 17 bits. ### 3.2.2 PHASE COMPARATOR Figure 3.6: Phase comparator block diagram As fig 3.6 shows, the phase comparator need current and previous I and Q signal to compute the inverse tangent to obtain the differential encoded signal with noise $(\theta_K + \phi)$ , and then the original selected phase with noise $\widehat{\Psi}_K$ is computed by subtracting the current and previous $(\theta_K + \phi)$ signal. Finally the detector will recover the original input data depending on the location of $\widehat{\Psi}_K$ . For example, if $\widehat{\Psi}_K$ is in the R01 region (45° to 135°), then the output is 90°(01). Implementing inverse tangent function in digital system requires hardware efficient algorithm to achieve higher operation speed and to reduce the hardware resource. One method is called Coordinate Rotation Digital Computer (CORDIC), which is iterative algorithm based on rotation of vectors in a plane and the use of multiplier and divider can be avoided [12]. Other method is storing finite output of inverse tangent in look-up table (LUT) and searching through LUT to find the approximate output [13]. In this work, a simple and efficient approach is applied to combine the inverse tangent function and the detector function to recover the differential encoded signal $\theta_K$ . Since the I and Q signals are formed by multiplying cosine and negative sine at modulator and the trigonometric function can be constructed geometrically at unit circle as Fig 3.7 shows, finding the location of tangent directly at constellation diagram by using I and Q signals can be done and the value of $\theta_K$ can be determined at same time. Figure 3.7: Location of tangent from sine and cosine The location of tangent is determined from two steps, first is deciding the quadrant from the sign of cosine and sine, then compares the absolute value of both numbers to determine the value of $\theta_K$ . For example, if cosine is 0.0161 and sine is 0.9688, then tangent is at first quadrant and sine is greater than cosine, hence the output is 90°(01). If tangent is located at decision boundary (45°, 135°, 225°, 315°), then the designer has to specify the output region. This method only works when four points are located in the constellation diagram with equal spacing because the comparator operation only gives two results to determine the output. The differential encoded signal $\theta_K$ is obtained by the method described above; hence the last step to recover the original input data to modulator is taking the difference between $\theta_K$ and $\theta_K - 1$ . The same look-up table approach from differential encoded signal at modulator can be used to find the difference of $\theta_K$ and $\theta_K - 1$ as table 3.4 shows and the new phase comparator block diagram is shown in Fig 3.8. Figure 3.8: New Phase Comparator block diagram | Previous | 0°(00) | 90°(01) | 180°(11) | 270°(10) | |----------|----------|----------|----------|----------| | Current | | | | | | 0°(00) | 0°(00) | 270°(10) | 180°(11) | 90°(01) | | 90°(01) | 90°(01) | 0°(00) | 270°(10) | 180°(11) | | 180°(11) | 180°(11) | 90°(01) | 0°(00) | 270°(10) | | 270°(10) | 270°(10) | 180°(11) | 90°(01) | 0°(00) | Table 3.4: LUT of $\theta_K - (\theta_K - 1)$ #### 3.2.3 DATA SAMPLER Considering the input data to the DQPSK modulator may not be continuous, which means input data is received by modulator at some duration of time and no data is received for few clock cycle, then modulator receives data again. At demodulator side all the data are treated as continuous input data, hence the demodulation will not be successful. To solve this problem, the data sampler is designed at demodulator to find the correct sampling point and stop demodulating when no data is input to modulator. The data sampler has two components, which are enable hold and data counter. Figure 3.9: Demodulator with data sampler Fig 3.9 shows the complete DQSPK demodulator with data sampler. The external enable signal is from outside of demodulator and will trigger at positive edge of clock to indicate the I and Q signals are received. The filter reset signal is controlled from wireless receiver to reset the filter when no data is present at modulator side. Since the pulse shaping filter needs few clock cycles to process the received data, the enable signal should be ON until the last data output from filter. The enable hold function is designed for holding the enable signal when the external enable is trigged and turning the enable signal OFF when filter reset signal is ON. The purpose of data counter is to find the first sampling point when receiving the new data from filter and continue to sample the data at each positive clock edge when enable signal is ON. The first sampling point is at the twelfth clock cycle because the total process time of pulse shaping filter at demodulator are six clock cycles and the center of first pulse is at the sixth clock cycle. Finally the enable delay function is designed for delay unit $(\theta_K - 1)$ to synchronize the output to zero when enable is OFF. ### 3.3 DAC and ADC The Digital to Analog Converter (DAC) and Analog to Digital Converter (ADC) are not designed in this work, but the SIMULINK model is built to determine the SNR loss of ADC due to the quantization error. The ideal unipolar DAC is used since the 6 bits data from pulse shaping filter are all positive and the DAC output can be described as $$V_{out} = V_{ref}(b_1 2^{-1} + b_2 2^{-2} + b_3 2^{-3} + \dots + b_N 2^{-N}) = V_{ref}B_{in}$$ where $B_{in}$ is the input digital signal and $V_{ref}$ is the analog voltage signal [14]. The input to DAC is 6 bits, hence the voltage change for each bit is $$V_{LSB} = \frac{V_{ref}}{2^N} = \frac{3.3V}{2^6} = 0.0515625V$$ The ideal ADC accepts analog input and reference signal to output digital bits. The equation of ADC is shown as $$(b_1 2^{-1} + b_2 2^{-2} + b_3 2^{-3} + \dots + b_N 2^{-N}) = \frac{V_{in} + V_x}{V_{ref}}$$ where $$-\frac{1}{2}V_{LSB} \le V_x < \frac{1}{2}V_{LSB}$$ Figure 3.10: 2-bits ADC input-output transfer curve The 2 bits ADC input-output transfer curve is shown in fig 3.10. From the transfer curve, we can see that the same output corresponds to a range of input. The signal ambiguity due to the same output is called quantization error. Different number of bits for ADC will affect the SNR; the following equation shows the best possible SNR for N bits ADC [14]: $$SNR = 6.02N + 1.76 dB$$ The SIMULINK model of DAC and ADC is shown in Fig 3.11. The ideal DAC model accepts the integer input from pulse shaping filter and then converts it to binary digit. Analog voltage is calculated through multiplying each bit to corresponding base two exponent and reference voltage. The output of DAC is directly connected to ADC model and first adds the $V_x$ , which is generated from random number block function. Finally the signal is divided by $V_{LSB}$ and rounded the number to positive integer. Figure 3.11: DAC and ADC SIMULINK model The number of bits for DAC is determined from the length of input bits to DAC and it is 6 bits in this work. Reducing the number of bits for ADC to increase the operational speed of circuit is possible if the SNR loss due to this is small. Three different bits of ADC are simulated and the performances are evaluated through the eye diagram. The SNR loss of ADC in different bits is shown in table 3.5 and the SNR obtained from pulse shaping filter without ADC is used as reference to calculate the SNR loss. From the SNR loss of ADC and eye diagram shown in Fig 3.12, the 4 bits ADC has larger SNR loss compared to 6 bits and 5 bits ADC. Since the SNR loss for both 6 bits and 5 bits are almost same, the 5 bits ADC is chosen for future ADC design specification. | | 6 bits | 5 bits | 4 bits | |-----------|------------|-----------|-----------| | Upper eye | 1.08 dB | 1.059 dB | 4.1497 dB | | Lower eye | 0.71187 dB | 1.1125 dB | 3.3805 dB | Table 3.5: SNR loss of ADC in different number of bits Figure 3.12: (a) 5 bits ADC (b) 4 bits ADC ## 3.4 Simulation and Synthesized result The digital DQPSK modem design is implemented using VHDL and the Synopsys simulation tool is used to simulate the VHDL design. The simulation result from modulator is verified by comparing the output of pulse shaping filter to the output from SIMULINK pulse shaping filter model. Since the error due to the wireless transmission is not considered in this work, the demodulator output data should be same as the input data to modulator. The non-continuous data pattern is used as test input to verify the demodulation. Fig 3.13 and 3.14 shows the simulation waveform for DQPSK modulator and demodulator, the waveforms are divided into multiply figures to give clear view of each signal. From the demodulator waveform, the outputs are same as Figure 3.13: Simulation result for DQPSK modulator Figure 3.14: Simulation result for DQPSK demodulator the input data to modulator, hence we confirm that the demodulation for non-continuous input pattern has worked satisfactorily. The synthesis tool from Synopsys is used to generate the net-list of design and to report the timing and total power consumption. The 65nm CMOS technology libraries are used to synthesize the design and low power low VT (LPLVT) CMOS transistor model is chosen to reduce the power consumption. From the timing report generated from synthesis tool, the design of DQPSK modulator met the expected design timing or has zero slack with 0.5 ns clock per cycle. However, the DQPSK demodulator required 1 ns clock per cycle; hence the data throughput in this work is limited at 2Gbps. The total power consumption from modulator is 2.446 mW and 23.008 mW for demodulator. ## **Chapter 4 Conclusions and Future Works** #### 4.1 Conclusions In this work, the all-digital implementation of DQSPK modulator and demodulator for the millimeter wave Wireless Network-on-Chip (mWNoC) were designed to achieve higher data throughput and lower power consumption. Several design techniques are applied to the modulator and demodulator to reduce the circuit complexity. The look-up table (LUT) design approach for pulse shaping filter at modulator is applied to eliminate the adder and multiplier by storing all possible outcomes from filter to LUT. Since large amount of possible data is input to demodulator pulse-shaping filter, the LUT technique cannot be applied to the filter. Instead, the folded filter structure is used to remove half of the multiplier. The method of finding the tangent directly on constellation diagram at phase comparator is used to avoid the calculation of inverse tangent. The scenario of non-continuous input at modulator is considered and the data sampler is designed to sample the data at proper time at demodulator. The Digital to Analog Converter (DAC) and Analog to Digital Converter (ADC) are not designed in this work, but the SIMULINK model is built to determine the SNR loss of ADC due to the quantized error and 5 bits ADC is chosen for future ADC design specification. The data throughput of DQPSK modem in this work is 2 Gbps and total power consumption is 25.454 mW. #### 4.2 Future work The pulse shaping filter at demodulator is the bottle neck in this work. Both adder and multiplier in the filter need longer execution time and the overall operation speed of circuit is reduced to half comparing to the modulator. Since the filter is accepting data on both clock edges to meet the Nyquist rate, the total number of adders and multipliers are double and the power consumption also increased at same time. To alleviate the problems of pulse shaping filter, the customizing analog pulse shaping filter and more efficient adder and multiplier should be designed for future work. The design of 6 bits DAC and 5 bits ADC to meet the same data throughput as DQPSK modem is also considered as future work. Also, in future higher-bandwidth versions, M-ary FSK will be employed to save transmission power, as it allows tradeoff of increased bandwidth for decreased SNR per bit. #### REFERENCE - [1] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Performance evaluation and design trade-offs for network-on-chip interconnect architectures," Ieee Transactions on Computers, vol. 54, pp. 1025-1040, Aug 2005. - [2] V. F. Pavlidis and E. G. Friedman, "3-D Topologies for Networks-on-Chip", IEEE Transactions on Very Large Scale Integration (VLSI), Vol. 15, Issue 10, October 2007, pp. 1081-1090. - [3] A. Shacham et al., "Photonic Network-on-Chip for Future Generations of Chip Multi-Processors", IEEE Transactions on Computers, Vol. 57, no. 9, 2008, pp. 1246-1260. - [4] M. F. Chang et al., "CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect", Proceedings of IEEE International Symposium on High-Performance Computer Architecture (HPCA), 16-20 February, 2008, pp. 191-202. - [5] P. P. Pande, A. Ganguly, K. Chang, and C. Teuscher, "Hybrid Wireless Network on Chip: A New Paradigm inMulti-Core Design", NoCArc 2009, pp.71-76. - [6] Sujay Deb et al., "Enhancing Performance of Network-on-Chip Architectures with Millimeter-Wave Wireless Interconnects", Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference, pp 73-80. - [7] J. Lee et al., "A low-power fully integrated 60GHz transceiver system with OOK modulation and on-board Antenna assembly," Proceedings of IEEE Solid-State Circuits Conference, ISSCC 2009, pp.316-317,317a. - [8] John G. Proakis, "Digital Communications", Fourth Edition, McGraw-Hill, 2000. - [9] Robert J. Schilling, Sandra L. Harris, "Fundamentals of Digital Signal Processing Using Matlab", first edition, Thomson, 2005 - [10] Arun Rachamadugu, "Digital implementation of high speed pulse shaping filters and address based serial peripheral interface design", Mater thesis, Georgia Institute of Technology, 2008. - [11] Simon Haykin," Communication Systems", 4 edition, Wiley, 2001. - [12] Volder, Jack E, "The CORDIC Trigonometric Computing Technique", Electronic Computers, IRE Transactions on, Sept. 1959, pp330 - [13] William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery, "Numerical Recipes in C: The Art of Scientific Computing", Cambridge University Press, October 30, 1992, 2 edition. - [14] David A. Johns, Ken Martin, "Analog Integrated Circuit Design", 1 edition, Wiley, 1996. - [15] H. Furukawa, K. Matsuyama et al., "A d4-SHIFTED DQPSK DEMODULATOR FOR A PERSONAL MOBILECOMMUNICATIONS SYSTEM", Personal, Indoor and Mobile Radio Communications, 1992. Proceedings, PIMRC '92., Third IEEE International Symposium on, 1992 - [16] M. saber, Yutaka Jitsumatsu and T. Kohda, "A Low-Power Implementation of arctangent function for", Signal Design and its Applications in Communications. IWSDA '09. Fourth, pp60-63, 2009. - [17] Wenmiao Song, Qiongqiong Yao, "Design and implement of QPSK modem based on FPGA", Computer Science and Information Technology (ICCSIT), pp599-601, 2010. - [18] Rajan, S., Sichun Wang, Inkol, R., Joyal, A., "Efficient approximations for the arctangent function", Signal Processing Magazine, IEEE, pp108-111, 2006. - [19] Munoz, D.M., Sanchez, D.F., Llanos, C.H., Ayala-Rincon, M., "FPGA based floating-point library for CORDIC algorithms", Programmable Logic Conference (SPL),pp55-60, 2010.