# A 17 Gbps 156 fJ/bit Two-Channel Optical Receiver with Optical-Input Split and Delay in 65 nm CMOS

Mohammad Taherzadeh-Sani, Bahaa Radi, Mohammadreza Sanadgol Nezami, Michaël Ménard, Member IEEE, Odile Liboiron-Ladouceur, Senior IEEE Senior Member, and Frederic Nabki, Member IEEE

Abstract- Multi-channel optical receivers clocked at a frequency slower than the data-rate in which different phases are used for each channel are widely used in high data-rate applications. Here, we demonstrate a two-channel optical receiver where both channels share the same clock operating at half of the data-rate frequency. However, instead of using different phases for each channel, the optical input is split and the input to the 2<sup>nd</sup> channel is optically delayed with respect to the 1<sup>st</sup> channel. Moreover, to improve performance, the duty cycle of the clock can easily become a design parameter. Each channel consists of only one high-bandwidth gain-improved transimpedance amplifier with a pseudo-differential-output and a comparator with offsetnulling. The receiver is fabricated in 65 nm CMOS. To demonstrate the concept, the fabricated die and two photodetectors are bonded inside a QFN80 package. Measurement results of the receiver show that each channel can sample the input at 8.5 Gbps, resulting in a 17 Gbps total data rate, with a total energy efficiency of 156 fJ/bit, and an input optical modulation amplitude (OMA) sensitivity of -7 dBm without any equalization.

*Index Terms*— Multi-channel optical receivers, transimpedance amplifier, photo-detector, optical delay, high data-rate.

#### I. INTRODUCTION

Electrical interconnects for chip-to-chip or board-to-board communications suffer from well-known issues such as losses at high frequencies, wave reflections, and crosstalk. Even for on-chip applications with shorter and more abundant interconnects, the design of high-speed global interconnects is

B. Radi, M. S. Nezami, and O. Liboiron-Ladouceur are with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada (e-mail: bahaa.radi@mail.mcgill.ca). becoming challenging [1]. Optical interconnects offer, arguably, a solution to problems encountered by electrical interconnects at the chip-to-chip and board-to-board communication levels. Consequently, developing energyefficient systems taking advantage of sub-rate sampling techniques to process high-speed data is necessary to fully take advantage of optical interconnects. In the electronic domain, this sub-rate sampling approach reduces signal losses, crosstalk, and wave reflection, while exhibiting relatively lower power consumption, which is required in highly parallelized short reach point-to-point optical interconnects.

In sub-rate sampling receivers, a full-rate input is sampled with a sub-rate clock at conventionally half or a quarter of the data-rate. There are several ways to generate the necessary clock phases required for the operation of such receivers. In [2], two complimentary clock phases are supplied externally to the receiver and an on-chip CML-to-CMOS converter is used to generate the four full swing clock phases. In [3], an external hybrid coupler is used to generate the two clock phases needed by that receiver. In [4] a multi-phase phase locked loop (PLL) is used to generate five clock phases. Finally, work presented in [5] fully generates two clock phases on-chip. These receivers rely on accurate off-chip generation blocks [2, 3] or on-chip power-hungry blocks [4, 5].

In this work, a novel energy efficient 17 Gbps two-channel optical receiver architecture is demonstrated. In contrast with conventional multi-channel receivers using multi-phase clocking with the sampling scheme shown in Fig. 1(a), both channels are clocked with the same signal and phase. However, here, the optical input is split and the input to the 2<sup>nd</sup> channel is delayed optically with respect to the 1<sup>st</sup> channel. This sampling scheme is illustrated in Fig. 1(b). Thus, the system is simplified resulting in a improvement in energy efficiency as compared to the state-of-the-art [2-7]. To achieve this improvement, the clock-related functions are moved to the optical domain. To further save power, only one transimpedance amplifier (TIA) with gain enhancement as well as a dynamic comparator with offset nulling is utilized for each channel.

Manuscript received Month XX, 2019; revised Month XX; accepted Month XX, 2019. Date of publication Month XX, 2020; date of current version Month XX, 2019. This work was supported by the Natural Sciences and Engineering Research Council of Canada through the Idea to Innovation (I2I) Grants Program. This paper was approved by Associate Editor NAME. (Corresponding author: Mohammad Taherzadeh-Sani).

M. Taherzadeh-Sani is with the Department of Electrical Engineering, Ferdowsi University of Mashhad, Mashhad, Iran (<u>taherzadeh@um.ac.ir</u>).

M. Ménard is with Department of Computer Science, Université du Québec à Montréal, H3C 3P8, Canada.

F. Nabki is with Department of Electrical and Computer Engineering, École de technologie supérieure, QC H3C 1K3, Canada.



Fig. 1. (a) A conventional two-channel receiver architecture that uses two clock phases to sample data; (b) proposed two-channel receiver designed to use one clock phase to sample two versions of the data delayed by one bit.

The article is organized as follows. Section II reviews the conventional and recently reported multi-channel optical receiver architectures. Section III presents the details of the novel two-channel optical receiver, and section IV shows the experimental results. Lastly, the conclusions are summarized in section V.

## II. OVERVIEW OF MULTI-CHANNEL OPTICAL RECEIVERS FOR HIGH-DATA-RATE APPLICATIONS

#### A. Multi-Phase Clock Receivers

Multi-channel optical receivers are widely used to achieve high data rates [2-5]. In such receivers, each channel runs at a clock frequency of  $f_S/N$ , where  $f_S$  is the data rate of the full receiver and N is the number of channels. This relaxes the design constraints of each electrical channel by allowing the comparators and the following digital circuitry to operate at a lower speed. However, such a receiver requires N clock phases for its N channels, with  $2\pi/N$  phase delay between each two consecutive channels. Each clock phase samples the data in one of the channels and the received signal is then regenerated by serializing the output of all channels, if needed.

Figure 2(a) illustrates a conventional two-channel receiver architecture [3]. Here the optical input is applied to a photodetector (PD) and the resulting photocurrent is then passed to a TIA. The output of the TIA is then split into channel 1 and channel 2. To digitize the TIA output signal, this signal is compared with a reference signal at  $f_S/2$  clock speed in each channel. The final output results from serializing the outputs of the two channels. As shown, this architecture requires two clock phases at  $f_S/2$ : the clock and the clock signal shifted in phase by 180° (labelled Clock 180° in Fig. 2(a)). This architecture exhibits crosstalk between the paths resulting in inter-symbol interference, as well as the clock feedthrough from one channel clock on the other channel signal at the output of the TIA. Thus, before splitting the signal into two paths, the TIA is usually followed by gain stages to improve the signal-to-noise ratio of the receiver.

To mitigate the crosstalk and clock-feedthrough noise, Fig. 2(b) shows an architecture that splits the input signal earlier in the signal paths, i.e. before the PD. The TIA gain requirement can be relaxed, but two PDs and two TIAs are needed. This architecture also requires two clock phases at  $f_{s}/2$ : Clock and Clock\_180° phases. Here, the optical input is divided into two



Fig. 2. (a) A conventional two-channel receiver architecture that splits the paths after the TIA, and requires the Clock and Clock\_180° (i.e., the clock signal that is shifted by 180°) phases; (b) A two-channel receiver architecture that splits the paths before the PD, and requires the Clock and Clock\_180° phases; (c) The proposed two-channel receiver architecture that splits the paths before the PD and only requires one clock phase for its comparators.

identical optical paths and then applied to two PDs. Consequently, this architecture requires at least 3 dB more optical input power to compensate for the optical splitter. It should be noted that to increase the data rate, the number of paths can also be increased. For instance, if 4 paths are used then a 4-phase clocking must be adopted. One of the trade-offs in that case is that the optical input must be split into 4 paths, resulting in a theoretical 6 dB optical insertion loss.

# B. Optical-Input Split and Delay Receiver

In this two-channel optical receiver, instead of delaying the phase of the main clock by  $180^{\circ}$  and passing it to the  $2^{nd}$ 

channel, its optical input can be delayed. This concept simplifies the receiver by removing the need for the Clock 180° phase. As illustrated in Fig. 2(c), the input can be passively delayed in its optical form, before converting it to an electrical signal in the PD. To implement this concept, the optical signal should be split into two optical signals. One signal is directly passed to the PD of channel 1, and the 2<sup>nd</sup> signal is delayed by one bit period  $(T_D)$  and passed to the PD of channel 2. The development of silicon photonics (SiP), which enables the fabrication of optical circuits with the mass production tools developed for CMOS circuits, makes the implementation of simple processing functions in the optical domain straightforward and economically viable [8, 9]. The main drawback of this concept is that the optical power received by each PD is at least 3 dB less than the total power at the input of the receiver, which will reduce the sensitivity of the optical receiver and can limit its reach.

The proposed architecture has the following advantages over the structure presented in Fig. 2(b). It only requires one clock phase to sample the signal in the comparator of both channels, such that Clock 180° is not required. It should be noted that Clock 180° is usually available in a receiver since it can be generated by inverting the main clock to generate  $\overline{clock}$ . However, generating a clock signal that is exactly 180° phase shifted from the clock signal requires additional circuitry. Indeed, this additional circuitry is needed to adjust the phase difference between Clock and clock to be exactly 180° and ensure that the duty cycle of both Clock and  $\overline{clock}$  are precisely 50%. The architecture proposed in Fig. 2(c) does not require such an accurate  $\overline{clock}$ , as it will be explained in Section III. Another advantage of this structure is that the duty cycle of the clock can be tuned to improve the comparator performance, as validated by the measurement results presented in Section III.

The area / cost overhead of this architecture requiring additional optical elements is minimal, especially when implemented with silicon photonics [10]. The split-delay structure, shown in Fig. 3, can be built using a directional coupler followed by a delay line. The coupling ratio of the directional coupler can be adjusted to compensate for the optical propagation loss in the delay line such that the power at each PD is the same. The delay line loss for a silicon on insulator optical waveguide with a cross-section of 220 nm  $\times$  3 µm ranges between 0.1 and 0.2 dB/cm [8]. As a result, the coupling ratio needed is r = 0.49:0.51. The benefit of carefully tuning the coupling ratio is the elimination of the need for gain control stages in the TIA because of different optical power at the photodetectors. As such, the two TIA stages in each sub receiver can be identical. Another benefit of the integration of photonic elements is their compact size and low cost since this technology leverages the infrastructure of existing CMOS foundries. For example, a 50 ps (20 Gb/s) delay line along with the directional couplers and the photodiodes occupy only 0.43 mm<sup>2</sup> [8] on the SiP die. The cost of this process per fabrication area is below that of modern CMOS processes, since the latter require several small critical dimensions masks. Note that for higher transmission speeds, the cost to fabricate the SiP chips is even less since the delay lines needed are shorter. The delay line length can be finely

controlled to achieve accurate delays. In [8], the delay offset between the fabricated devices and the design value is approximately 3 ps at 20 Gb/s. It is possible to use electronically tunable optical delay lines such as [11, 12] that can provide tunable delays of up to 1 ns. The temperature dependency of the delay in the silicon-made delay lines is only 0.01 % per Celsius [13]. At 17 Gbps input with a required delay of 59 ps, the timing delay change due to a shift in temperature of 100 °C is only 0.6 ps. The devices presented in [8] were designed for 20 Gb/s links and not for 17 Gbps. Consequently, a discrete optical splitter and a mechanically-tunable optical delay-line were used instead in this work.

# III. DESIGN OF THE TWO-CHANNEL ELECTRONIC RECEIVER WITH OPTICAL-INPUT SPLIT AND DELAY

In this work, a two-channel optical receiver with an opticalinput split and delay structure prior to photodetection, shown in Fig. 4, is implemented. This section details the electronic design of the receiver.

### A. Electronic Receiver Architecture

The receiver of each channel is connected to a PD and consists of a TIA, a comparator, and a latch, as shown in Fig. 4. The latch output is then passed to a current-mode buffer (Output driver in Fig. 4) to transmit the output bits off-chip. Since the output swing of the buffer is not large enough to drive the input of the error detector (ED) of the bit-error-rate tester (BERT), an external high-bandwidth amplifier is used between the chip output and the ED input. This amplifier does not impact the performance of the chip as it is only used to amplify the digital output of the chip.



Fig. 3. Silicon Photonics (SiP) split-delay structure schematic with envisioned integration with the electronic IC chip.



Fig. 4. The system-level details of the implemented two-channel receiver.

As mentioned in Section II.B, the proposed architecture reduces crosstalk and clock-feedthrough by splitting the signal in the optical domain and thus relaxing the SNR requirements of the receiver. Hence, in each channel, only one TIA without additional gain stages is used to amplify the signal before the comparator, resulting in a substantial reduction of the total power consumption. Gain improvement is also proposed for the TIA to partially compensate for the lack of multiple cascaded gain stages. Furthermore, a dynamic comparator and latch are used instead of a static counterpart to significantly decrease the total power consumption.

# B. Transimpedance Amplifier with Single-Ended-Input and Differential Output

Figure 5 shows the TIA and its connections to the PD and the comparator. Here, the PD is modeled as a current source with a parallel capacitance  $C_{PD}$ , representing the junction capacitance of the photodiode, and a series resistance  $R_{PD}$ . Moreover,  $L_B$ ,  $C_P$ , and  $C_L$  are the bondwire inductance, pad capacitance, and TIA load capacitance, respectively. The TIA consists of an inverter as an amplifier [14] that has a resistor in series with an inductor in its feedback. The inductor is used to improve the TIA bandwidth by introducing a zero in the TIA transfer function. Resistor  $R_{LF}$  is used to damp the high frequency peaking induced by the inductor  $L_F$ . Here, the difference between  $V_{OUT}$  and  $V_{IN}$  is passed to the comparator to make the bit decision. Although these two signals are not differential, they have different polarities and hence, here, they are named pseudo-differential signals. The resulting input-output transfer function of this TIA, when only  $V_{OUT}$  is used as the output, has a low-frequency gain of:

$$\left|\frac{V_{OUT}}{I_{IN}}\right| = \frac{R_{F} - \frac{1}{g_{m}}}{1 + \frac{1}{1 + g_{m}R_{L}}}$$
(1)

where

$$g_m = g_{m1} + g_{m2}$$
, and  $R_L = r_{o1} || r_{o2}$ . (2)

Here,  $R_F$ ,  $g_{m1}$  and  $g_{m2}$  are the feedback resistor, the transconductance of transistor M1 and the transconductance of transistor M2, in Fig. 5, respectively.  $r_{o1}$  and  $r_{o2}$  are the output resistances of M1 and M2.

The resulting input-output transfer function of this TIA, when  $V_{OUT}$ - $V_{IN}$  is used as the output, has a low-frequency gain of  $R_F$ :

$$\left|\frac{V_{OUT} - V_{IN}}{I_{IN}}\right| = R_F \tag{3}$$

Thus, the pseudo-differential output shows a higher low-frequency gain. By writing the KCL equations for the circuit in Fig. 5, it can be shown that both transfer functions have five poles and two zeros. The two zeroes of the input-output transfer function (i.e.,  $V_{OUT}/I_{IN}$ ) are:

$$\omega_{Z1a} = \frac{R_F ||R_{LF}}{L_F}, \ \omega_{Z2a} = \frac{g_m}{C_L}$$
(4)

The two zeroes of the input-output transfer function when  $(V_{OUT} - V_{IN})$  is used as the output instead of  $V_{OUT}$  (i.e.,  $(V_{OUT} - V_{IN})/I_{IN}$ ) are:

$$\omega_{Z1b} = \frac{R_F ||R_{LF}}{L_F}, \ \omega_{Z2b} = \frac{g_m}{C_{GD}}$$
(5)

where  $C_{GD}$  is the sum of the gate-drain capacitance of both NMOS and PMOS transistors. The second zero of both transfer functions is at very high frequencies, and the first zero can be used to extend the bandwidth of the TIA. Assuming that the denominator of both transfer functions can be simplified as:

$$Den(s) = 1 + a \, s + b \, s^2 + c \, s^3 + d \, s^4 + e \, s^5, \tag{6}$$

and considering a dominant-pole transfer function, where the first pole can be approximated by 1/a, both transfer functions have the same first pole. Thus, the main advantage of using pseudo-differential output signaling is that it achieves higher gain without a detrimental effect on the frequency behavior of the TIA.

In this design, the TIA bandwidth is set to 24 GHz, which is intentionally higher than the input data rate (17 Gbps) such that the signal coming from the PD is not limited by the bandwidth of the TIA.

Figure 6(a) shows the bode diagram of the transfer functions of (1) and (3) using the component values listed in Table 1. As shown, the TIA gain is improved by 1.9 dB with the pseudodifferential signaling whereas the TIA bandwidth is reduced by only 5%. This additional gain relaxes the need for additional signal amplification before the comparator. Figure 6(b) shows the effect of the damping resistor  $R_{LF}$  on the gain bode diagram. If no damping resistor is used, the substantial peaking of the gain at high frequencies results in a noticeable ringing of the output pulse response of the TIA. As shown in Fig. 6(b), this peaking is removed by using the resistor  $R_{LF}$ .



Fig. 5. The TIA circuit and its connections to the PD and comparator.

TABLE 1. COMPONENT VALUES USED IN FIG. 5 AND TO PLOT FIG. 6.

| C <sub>PD</sub> | 80 fF | Ср              | 90 fF | R <sub>F</sub>    | 320 Ω |
|-----------------|-------|-----------------|-------|-------------------|-------|
| $R_{PD}$        | 80 Ω  | L <sub>F</sub>  | 3 nH  | $g_{m1} + g_{m2}$ | 20 ms |
| L <sub>B</sub>  | 1 nH  | R <sub>LF</sub> | 1 kΩ  | CL                | 15 fF |

#### C. High-Speed Comparator with Offset Nulling and Latch

Figure 7 shows the dynamic comparator and latch. Only one of the two latches is shown for clarity. As compared to static comparators, dynamic comparators have a lower power consumption but they suffer from kickback error and feedthrough of the clock. These two effects can generate an offset at the input of the comparator as well as noise. However, the splitting at the input relaxes the kickback from the dynamic comparator to its inputs.

As mentioned in section III.B, the input to the comparator is a pseudo-differential signal and, hence, is self-referenced. Thus, the comparator does not need to have a reference voltage at its input. To reduce the loading effect of the comparator, the input transistors are small and can have a noticeable offset. To compensate for this offset as well as the offset due to the kickback error and clock feedthrough, the bias voltages of the bulk of the input transistors (i.e.,  $V_{BP}$  and  $V_{BN}$  in Fig. 7) are controlled off-chip to adjust their threshold voltages [15]. In our experimental test setup, the  $V_{BN}$  of both channels are connected to a similar bias voltage (0.3 V), and only the  $V_{BP}$  of each channel is tuned. Thus, in total, only one bias voltage per channel needs to be tuned to compensate for the offset error. This offset cancellation is also used to cancel the DC component of the photo-current.

As shown in Fig. 7, the comparator only requires one clock phase. The proposed architecture has the advantage of not requiring  $\overline{\text{clock}}$  for its comparators, as both channels can be clocked using the same phase. Moreover, to improve the comparator performance, the duty cycle of this clock can be tuned. For instance, in this design, due to the low mobility of the PMOS transistors, the PMOS transistors (MP) that are used



Fig. 6. The bode diagram for (a)  $(V_{OUT}-V_{IN})/I_{IN}$  and  $V_{OUT}/I_{IN}$ ; (b)  $(V_{OUT}-V_{IN})/I_{IN}$  with and without the damping resistor  $R_{LF}$ .



Fig. 7. The dynamic comparator and latch with offset-nulling signals  $V_{BP}$  and  $V_{BN}$ .

to reset the comparator outputs should be large enough to do their function. By increasing the off-time of the clock signal, they have more time to reset the output, and hence their size can be reduced. Thus, the comparator speed can be improved when considering that the duty cycle is also a design parameter. Furthermore, at the same speed, it is possible to obtain a better signal detection due to the improved output resetting of the comparator. In this design, by using a clock that has a 45 % duty cycle (i.e., 55 % off-time), the measured input optical modulation amplitude (OMA) sensitivity of the receiver is improved by 1.1 dB from -5.9 dBm to -7 dBm. Adjustable dutycycle circuits [16, 17] can be used to realize such a duty cycle.

The latch is implemented using a simple transmission gate (TGATE) switch that consists of both NMOS and PMOS switches. It is only ON during the ON-time of the comparator, when the outputs are valid. To compensate for the comparator delay, the latch clock signal CLK<sub>D</sub> is delayed accordingly. The PMOS and the NMOS transistors of the switch are sized through simulations such that the charge injection and clock feedthrough of the switch are minimized. The latch requires the inverse of the input clock (Fig. 7). Here  $\overline{\text{CLK}_{D}}$ , is a delayed version of the  $\overline{\text{clock}}$  signal. Although  $\overline{\text{CLK}}_{D}$  could be implemented on-chip, here, it is provided off-chip to study its accuracy requirement. In our measurement, it is observed that  $\overline{\text{CLK}_{D}}$  does not need to be precise in time and can have up to 10 ps of delay as compared to a true 180° phase clock, without affecting the system performance. Such an inaccuracy in the inverted clock of a conventional receiver when it is used as the 180° phase clock leads to bit errors since this clock samples the 2<sup>nd</sup> channel and the eye width of the signal at the input of the comparator is limited. For instance, at 8.5 Gbps, a 10 ps timing error corresponds to a 0.09 UI reduction in the width of the eye diagram opening of a typical two channel receiver. Note that a 10 ps timing error in 65 nm CMOS technology, with a typical digital-gates rise/fall time of 20 ps to 30 ps, is a relatively small value. For two chains of only three inverters with aspect ratios of 8 and 16 for the NMOS and PMOS transistors of all inverters, respectively, the delay difference between the two chains can vary by 4.6 ps ( $3\sigma$ ). Usually, the clock path requires a longer chain to distribute the clock signals at high frequencies, and hence, can have a timing error larger than 4.6 ps. Back-to-back inverters or bigger inverters can be used to improve matching but they increase power consumption. Moreover, back-to-back

inverters are not very effective for small delays, due to the limited rise / fall times of the inverters.

The insensitivity of the proposed receiver to the inverted clock timing error is mainly due to the fact that the TGATE switch also has an NMOS transistor to pass the signal at the right time. Moreover, the TGATE clock signals are always designed to tolerate some timing error, i.e. here, they turn off the TGATE 10 ps before the signal at the input of the TGATE resets. Also, the circuit utilized to generate  $\overline{\text{CLK}_{\text{D}}}$  has only 3 ps (3 sigma) of delay variation due to mismatches, providing sufficient margin for correct operation of the receiver.

#### IV. MEASUREMENT RESULTS

The proposed two-channel receiver was implemented in a 65 nm CMOS technology and mounted in a QFN80 package. To emulate the optical splitter, the delay, and the PD functionality presented in [8, 9], a discrete optical fiber splitter and a mechanically tunable optical delay-line are used to generate two optical signals, where one signal is the delayed version of the other signal. Then, both of these optical signals are coupled to two photodetectors. The 30 GHz InGaAs photodetectors from Global Communication Semiconductors (P/N: DO309\_20um\_C3) have a responsivity of 0.7 A/W. The photodetectors are mounted in the QFN80 package next to the receiver die. The photodetectors are bonded to the receiver inputs using bondwires with a length of 1 mm. Their estimated inductance of 1 nH matches the model in Fig. 5. Figure 8 shows a micrograph of the electronic receiver chip and the connections of the photodetectors to the receiver inputs. The CMOS chip active area is 300  $\mu$ m × 300  $\mu$ m per channel.

Both channels use the same clock signal. Careful symmetric layout techniques ensure that the off-chip clock is distributed similarly to both channels. The optical delay line is manually tuned to generate the required delay of  $T_D$ . An integrated splitter and delay in SiP were demonstrated in [8, 9]. Tunable photonic delay lines are also available and can generate a wide range of delay relaxing the accuracy necessary in the fabrication of fixed delay lines [11, 12]. It should be noted that tuning the delay is only required if the delay error is comparable to the width of the eye opening. In such a case, the power consumption of the optical delay must be included in the total power consumption.

Figure 9 illustrates the experimental test setup. Here, the 1550 nm light from the laser is coupled to a fiber that is connected to a polarization controller and then is modulated with an electrical 17 Gbps PRBS10 signal. Then, the modulated optical signal is passed to the optical splitter with a measured insertion loss of 3.3 dB. Since the data rate is 17 Gbps, the clock frequency for both receiver channels is 8.5 GHz. Figure 10 shows the bit error rate (BER) versus the input optical signal power of the two-channel receiver before the splitter, at 17 Gbps. This BER measurement is performed using a Centellax TG1B1-A BERT. As shown, to achieve a BER of 10<sup>-12</sup>, the optical input sensitivity of the receiver is -7 dBm OMA. This sensitivity is achieved without using any technique. Implementing an equalization equalization technique, such as DFE (Decision Feedback Equalization), would improve the sensitivity [3]. Optimal energy efficiency is targeted and thus no equalization is implemented here. Note that the input sensitivity of each path is 3.3 dB lower than the sensitivity of the full receiver. Figure 11 shows the bathtub curve of the receiver for an OMA input of -6 dBm with respect to the sampling clock. The receiver tolerates up to 105° of eye opening (equals to 0.3 UI) at a BER of 10<sup>-12</sup>.

Figure 12 shows the eye diagram of the signal at the output of the optical modulator along with one at the output of the receiver. Since the output signal amplitude ( $25 \text{ mV}_{pp}$ ) is smaller than the input sensitivity of the error detector (ED), which is 100 mV<sub>pp</sub> in our case, it is amplified using a wideband amplifier with a 20 dB gain before the ED. The ringing on the eye diagram is due to fact that the output is single-ended with a low amplitude and that there is a few millivolts of clock feedthrough through the PCB or package bondwires.



Fig. 8. Packaged chip micrograph of the receiver and its connections to photodetectors with 1 mm bondwires.



Fig. 9. Experimental test setup used to validate the optical receiver.

TABLE II. COMPARISON TO THE STATE-OF-THE-ART

|                            | THIS WORK | [5]   | [2]              | [3]    | [4]                 | [6]   | [7]               |
|----------------------------|-----------|-------|------------------|--------|---------------------|-------|-------------------|
| CMOS technology node (nm)  | 65        | 40    | 65               | 65     | 90                  | 28    | 65                |
| Data-rate (Gb/s)           | 17        | 25    | 24               | 20     | 16                  | 25    | 25                |
| Sensitivity (dBm)          | -7.0 OMA  | -10.8 | -4.7             | -5 OMA | -5.4                | -14.9 | -8                |
| Power consumption (mW)     | 2.66      | 27.6  | 9.6              | 14.2   | 23                  | 4.25  | 17                |
| Energy efficiency (pJ/bit) | 0.156     | 1.13  | <sup>1</sup> 0.4 | 0.71   | <sup>2</sup> 1.4375 | 0.17  | <sup>3</sup> 0.68 |

1. without clock generation and SR latches.

2. Power consumption of front-end only.

3. Includes a clock receiver.

In each channel, the TIA power consumption is 0.95 mW, and the total power consumption of the comparator, the latch, and the clock distribution is 0.38 mW. Thus, for the full twochannel receiver running at 17 Gbps, the power consumption is 2.66 mW, resulting in a power efficiency of 156 fJ/bit for a BER of 10<sup>-12</sup>. Here, the power consumption by the output drivers is excluded. Table 2 compares this work with the stateof-the-art. Overall, this novel receiver has a superior power efficiency as compared to the ones previously reported in the literature [2-7]. Whereas [6] achieves a similar energy efficiency, it is implemented in a smaller technology node that contributes to lowering the dynamic power consumption of the clock generation blocks. The work presented here also achieves good sensitivity despite the 3.3 dB splitting. This highlights the feasibility of this proposed receiver approach. An implementation in a more advanced technology node would allow for higher speed leading to better energy efficiency.

# V. CONCLUSION

A 17 Gbps two-channel optical receiver with an energy consumption of 156 fJ/bit was presented. The combination of a simplified clocking, signal amplification with only one gainimproved TIA, as well as a dynamic comparator results in superior energy efficiency. The full receiver was implemented in 65 nm CMOS. The receiver die and the photodetectors were mounted in a QFN80 package and connected together using bondwires. An input sensitivity of -7dBm OMA was achieved for this receiver without using any equalization technique.

This architecture is suitable for integration with SiP circuits which can be used to achieved the required optical function at the input, allowing for a high degree of integration. The architecture exhibits a performance that compares favorably to the state-of-the art.

#### REFERENCES

- D. A. B. Miller, "Rationale and challenges for optical interconnects to electronic chips", *Proceedings of the IEEE*, vol. 88, no. 6, pp. 728-749, June 2000.
- [2] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s Double-Sampling Receiver for Ultra-Low-Power Optical Communication," *IEEE Journal* of Solid-State Circuits, vol. 48, no. 2, pp. 344-357, Feb. 2013.
- [3] A. Sharif-Bakhtiar and A. Chan Carusone, "A 20 Gb/s CMOS Optical Receiver with Limited-Bandwidth Front End and Local Feedback IIR-DFE," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 11, pp. 2679-2689, Nov. 2016.



Fig. 10. Bit error rate (BER) for a 17 Gbps PRBS10 optical input signal of the full receiver versus the input OMA at the input of the splitter and considering its 3.3 dB loss for the splitter.



Fig. 11. Full receiver bathtub curve at a 17 Gbps input.



Fig. 12. The eye diagram of the input and output signals. Since the output signal is single-ended and has a small amplitude, it is slightly distorted by some common-mode noise.

- [4] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s transceiver for optical interconnects," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 5, pp. 1235–1246, May 2008.
- [5] S. Huang and W. Chen, "A 25 Gb/s 1.13 pJ/b -10.8 dBm Input Sensitivity Optical Receiver in 40 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 3, pp. 747-756, March 2017.
- [6] S. Saeedi, S. Menezo, G. Pares and A. Emami, "A 25 Gb/s 3D-Integrated CMOS/Silicon-Photonic Receiver for Low-Power High-Sensitivity Optical Communication," *Journal of Lightwave Technology*, vol. 34, no. 12, pp. 2924-2933, 15 June15, 2016.
- [7] K. Yu et al., "A 25 Gb/s Hybrid-Integrated Silicon Photonic Source-Synchronous Receiver With Microring Wavelength Stabilization," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 9, pp. 2129-2141, Sept. 2016.
- [8] M. Hai, M. Ménard, O. Liboiron-Ladouceur, "Integrated optical deserialiser time sampling based SiGe photoreceiver," *Optics Express*, vol 23, 31736-31754, December 2015.
- [9] M. S. Hai, M. Ménard and O. Liboiron-Ladouceur, "A 20 Gb/s SiGe photoreceiver based on optical time sampling," 2015 European Conference on Optical Communication (ECOC), Valencia, 2015, pp. 1-3.
- [10] F. Pricing.(2019) Fabrication and Pricing|CMC Microsystems Accessed:Sep.09,2019 [Online] Available: https://account.cmc.ca/WhatWeOffer/Make/FabPricing.aspx
- [11] X. Wang, L. Zhou, R. Li, J. Xie, L. Lu and J. Chen, "Nanosecond-range continuously tunable silicon optical delay line using ultra-thin silicon waveguides," 2016 Conference on Lasers and Electro-Optics (CLEO), San Jose, CA, 2016, pp. 1-2.
- [12] X. Wang, L. Zhou, R. Li, J. Xie, L. Lu, K. Wu, and J. Chen, "Continuously tunable ultra-thin silicon waveguide optical delay line," *Optica*, vol 4, 507-515, May 2017.
- [13] Komma, J., Schwarz, C., Hofmann, G., Heinert, D., & Nawrodt, R. (2012). Thermo-optic coefficient of silicon at 1550 nm and cryogenic temperatures. *Applied Physics Letters*, 101(4), 04190
- [14] B. Nauta, "A CMOS transconductance-C filter technique for very high frequencies," *IEEE J. Solid-State Circuits*, vol. 27, no. 2, pp. 142–153, Feb. 1992.
- [15] Y. Sinangil, A. P. Chandrakasan, "A 128 kbit sram with an embedded energy monitoring circuit and sense amplifier offset compensation using body biasing," *IEEE J. Solid-State Circuits*, vol. 49, no. 1, pp. 2730– 2739, Nov. 2014.
- [16] J. H. R. Schrader, E. A. M. Klumperink, J. L. Visschers, and B. Nauta, "Pulse-Width Modulation Pre-Emphasis Applied in a Wireline Transmitter, Achieving 33 dB Loss Compensation at 5-Gb/s in 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 990–999, Apr. 2006.
- [17] J. Kim, et al., "A 112 Gb/s PAM-4 56 Gb/s NRZ Reconfigurable Transmitter With Three Tap FFE in 10-nm FinFET" IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 29-42, Jan. 2019.



**Mohammad Taherzadeh-Sani** received the B.Sc. degree from the Ferdowsi University of Mashhad, Iran, in 2001, the M.Sc. degree from the University of Tehran, Iran, in 2004, and the Ph.D. degree from McGill University, Montreal, Canada, in 2011. He was a recipient of a J. W. McConnell Memorial Fellowship from McGill University in 2007 and 2008 for his

doctoral research, and a Post-Doctoral Fellowship from the Le Fonds Québécois de la Recherche sur la Nature et les Technologies for 2012 and 2013 (declined). In 2012, he joined Ferdowsi University of Mashhad as an Assistant Professor. He authored several journal publications in distinguished journals (e.g., JSSC, TCAS-I, TCAS-II, and T-VLSI) and many papers in different conferences (e.g., ESSCIRC, A-SSCC, ICCAD, and ISCAS). His research interests focus on biomedical circuits and systems, high-quality and highspeed data converters, and radio frequency integrated circuits. He has different fabricated ICs and publications on these subjects. He fabricated several integrated circuits in various technologies from 65- to 180-nm CMOS.



**Bahaa Radi** received the B.S. degree in electrical engineering from The Hashemite University, Zarqa, Jordan in 2012 and the M.S. degrees in microsystems engineering from Masdar Institute (Now Khalifa University), Abu Dhabi, UAE in 2015. He is currently pursuing the Ph.D. degree in electrical engineering with the Photonic Systems Group, McGill University,

Montreal, QC, Canada. His current research interests include power-efficient optical receivers, energy-efficient optical systems for short-reach applications, and electronic and photonic integrated circuits.



**Mohammadreza Sanadgol Nezami** received his M.Sc. degree in electrical engineering from Iran University of Science and Technology, Tehran, Iran, in 2006 and a Ph.D. degree in Electrical Engineering from University of Victoria, Victoria, BC, Canada, in 2016. He is currently a postdoctoral researcher at the department of Electrical and Computer

Engineering, McGill University, Montreal, QC, Canada. His current research is mainly focused on silicon photonics, optical interconnects, and optoelectronics



Michaël Ménard (S'98–M'09) was born in Québec city, QC, Canada. He received the B.Eng. and PhD degree in electrical engineering from McGill University, Montreal, QC, Canada, in 2002 and 2009, respectively. At McGill, he worked on the design and implementation of novel devices for optical telecommunication applications,

including spatial formatting in dense wavelength division multiplexer and broadband high-density electro-optical space switches in III-V waveguides. From 2009 to 2011, he was a post-doctoral fellow with the Cornell Nanophotonics Group under the supervision of Prof. Michal Lipson. At Cornell, he investigated broadband wavelength conversion with silicon waveguides for fiber and free space telecommunication.

In June 2011, Professor Ménard joined the microelectronic program at UQAM. He is an active member of NanoQAM, the research center on nanomaterials and energy. He jointly manages the Microtechnology and Microsystems Laboratory. In 2019, he was a visiting research in the Department of Applied Physics at The University of Campinas, SP, Brazil. His research interests include integrated optics, silicon photonics, nonlinear optics, micro-opto-electro-mechanical systems (MOEMS), optomechanics, and microfabrication.

Dr Ménard is a member of the Quebec Order of Engineers.

He has published over forty publications, and holds 3 issued patents and 3 pending patent applications. He holds or has held financial support from the Microsystems Strategic Alliance of Quebec (ReSMiQ), the Center for Optics, Photonics, and Lasers (COPL), the Quebec Fund for Research in Nature and Technology (FRQNT), Prompt Québec, PRIMA Québec, and the Natural Sciences and Engineering Research Council of Canada (NSERC).



Odile Liboiron-Ladouceur (M'95– SM'14) received the B.Eng. degree in electrical engineering from McGill University, Montreal, QC, Canada, in 1999, and the M.S. and Ph.D. degrees in electrical engineering from Columbia University, New York, NY, USA, in 2003 and 2007, respectively. From 1999 to 2000, she worked at Teradyne Inc. as an

Applications Engineer in the mass storage business unit. She then joined Texas Instruments Incorporated in 2000 and spent two years working in the fiber optic business unit as a test and design engineer. She joined the Department of Electrical and Computer Engineering in 2008, and is currently an Associate Professor and Canada Research Chair in Photonics Interconnect. From 2009 to 2016, she as an associate editor for the IEEE Photonics Technology Letter. She was an elected member on the IEEE Photonics Society Board of Governance from 2016 to 2018. She was the general co-chair of Photonics in Switching and Computing (PSC) in 2017, 2019, and 2020. She holds six granted U.S. patents and coauthored over 60 peerreviewed journal papers and more than 100 papers in conference proceedings. She published four book chapters and gave over 15 presentations as an invited speaker at international conferences. Her research interests include optical systems,

photonic-integrated circuits, and photonic interconnects. She is the 2018 recipient of McGill Principal's Prize for Outstanding Emerging Researcher. She manages the Photonic DataCom Research Team at McGill University.



**Frederic Nabki** (S'99–M'10) received the B.Eng. degree (Hons.) in electrical engineering and the Ph.D. degree in electrical engineering from McGill University, Montreal, QC, Canada, in 2003 and 2010, respectively. In 2008, he joined the Université du Québec à Montréal (UQAM), Montreal, QC, Canada, where he was an Associate Professor in

microelectronics engineering. In 2016, he joined the École de technologie supérieure, Montreal, QC, Canada, a constituent of the University of Quebec, as an Associate Professor with the department of Electrical Engineering. He has published two book chapters and over a hundred publications, and holds 11 issued patents and 21 pending patent applications related to MEMS and CMOS/MEMS monolithic integration. His research interests include MEMS and RF/analog microelectronics. He was a recipient of the Governor General of Canada's Academic Bronze Medal, the J.J. Archambault IEEE Canada Medal and the UQAM Faculty of Science Early Career Research Award. He holds or has held financial support from the Microsystems Strategic Alliance of Quebec (ReSMiQ), the Quebec Fund for Research in Nature and Technology (FRQNT), the Ministry of Economy, Science and Innovation (MESI) of Quebec, the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Canada Foundation for Innovation (CFI).