

# Multiport Remote JTAG Over Optical Fibers Under Radiation Environment

M. Nakao<sup>ID</sup>, Y. Nakazawa<sup>ID</sup>, H. Sudo, R. Honda<sup>ID</sup>, and N. Taniguchi<sup>ID</sup>

**Abstract**—The Joint Test Action Group (JTAG) protocol is a popular method to program field-programmable gate array (FPGA) devices where a more intelligent technology is not applicable. However, the original JTAG protocol is designed for a short-distance connection and is not necessarily suitable when the FPGA device is located in a remote radiation area. We developed a custom optical transmission technique for the JTAG protocol. A small receiver test board is developed based on discrete devices, and a multiport distributor is implemented on an FPGA board. We also developed a technique to overcome the latency due to serialization and cable length. We present the evaluation results and future applications.

**Index Terms**—Circuit emulation, data acquisition systems, field-programmable gate arrays (FPGAs), front-end electronics, high-energy physics (HEP), optical signal processing, radiation effects.

## I. INTRODUCTION

THE Joint Test Action Group (JTAG) protocol defined by the IEEE standard 1149.1 [1] is a popular method to program field-programmable gate array (FPGA) devices. Modern high-energy physics (HEP) experiments often use a large number of FPGAs to process a huge number of detector signal channels in real time. The Belle II experiment [2] at the SuperKEKB  $e^+e^-$  collider [3] in Tsukuba, Japan, is also one of such HEP experiments.

The JTAG protocol is a low-level serial protocol to access a device via three input signals (TCK, TMS, and TDI) and one output signal (TDO). The optional reset signal (TRST) is not used for FPGA programming. The protocol is popular because of its simplicity. A typical FPGA has these four pins, and a set of tools is provided. In the case of Xilinx [4] FPGAs, software tools to program FPGAs (Impact, ChipScope, and Vivado programs) and a universal serial bus (USB) cable to generate and capture the JTAG signals are provided.

The JTAG protocol can be implemented in a finite-state machine synchronous to the TCK signal. There are 16 states, and the next state upon TCK is defined by the TMS signal as shown in Fig. 1. These states are used to write and read an

Received 20 May 2024; revised 27 August 2024 and 6 October 2024; accepted 26 December 2024. Date of publication 8 January 2025; date of current version 17 March 2025. This work was supported in part by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research C under Grant 21K03595. (Corresponding author: M. Nakao.)

M. Nakao, Y. Nakazawa, R. Honda, and N. Taniguchi are with High Energy Accelerator Research Organization (KEK), Tsukuba 305-0801, Japan (e-mail: mikihiko.nakao@kek.jp; nakancyo@post.kek.jp; rhonda@post.kek.jp; nanae@post.kek.jp).

H. Sudo is with The University of Tokyo, Tokyo 113-8654, Japan (e-mail: hiroto.sudo@hep.phys.s.u-tokyo.ac.jp).

Color versions of one or more figures in this article are available at <https://doi.org/10.1109/TNS.2025.3526952>.

Digital Object Identifier 10.1109/TNS.2025.3526952



Fig. 1. JTAG state transition.



Fig. 2. JTAG chain.

instruction register (IR) and data registers (DRs); TDI carries the input data, and TDO carries the output data. In the Xilinx tools, the TCK signal, despite its name, is not a fixed frequency clock signal but is a trigger to make a transition from the current state to the next with a predefined minimal interval. The TDO signal has to arrive within the TCK interval, and this constraint limits the TCK interval and, hence, the maximum JTAG cycle frequency, or the length of the JTAG cable.

Multiple devices can be connected in a chain as shown in Fig. 2. The TDO output should be identical for the same programming sequence of the same device and firmware, and if there is no error, it is possible to program multiple devices in parallel to effectively reduce programming time, just by distributing the same TDI signal and using the logical-AND or logical-OR of TDO signals. When there is an error, it may be undetected and, hence, an additional validation mechanism is necessary.

In an HEP experiment, the FPGA devices are often located at a remote place, far from the PC where the programming software runs. A typical way is to convert the signal into low-voltage differential signaling (LVDS) or other differential

signaling. Since JTAG signals are not direct current (DC) balanced, alternating current (AC) coupling cannot be used, and the electrical ground has to be connected over a long distance. It is not possible to use small form-factor pluggable (SFP) optical transceiver modules or other commodity modules that are popular for high-speed serial data transmission, as they are internally AC coupled. In addition, the latency due to the long distance requires a larger TCK interval and, hence, lower programming speed. Typically, for a 10-m LVDS extension setup, the default 6-MHz programming of the Xilinx Impact program does not work, and the frequency has to be reduced to 3 MHz, hence the FPGA programming takes twice as much time.

We describe the current situation and problems in Section II, then present a new solution in Sections III–VI in steps, and conclude in Section VII.

## II. PRESENT JTAG AT BELLE II

The Belle II experiment is a flavor physics experiment to search for physics beyond the Standard Model from an unprecedentedly large sample of decays of bottom mesons, charm hadrons, and tau leptons. The Belle II detector consists of seven subdetectors, of which three of them, the central drift chamber (CDC), the time-of-propagation counter (TOP), and the aerogel ring-image Cherenkov counter (ARICH), use FPGA boards inside the detector volume where space is very limited, cabling is very tight, and the radiation due to gamma rays and neutrons is very harsh. The CDC uses 299 FPGA boards each with a Xilinx Virtex 5 FPGA, the TOP uses 64 FPGA boards with Xilinx ZYNQ, and the ARICH uses 72 FPGA boards with Xilinx Virtex 5.

These FPGA boards are located up to 10 m away from areas where radiation is not critical for FPGAs. The current FPGA boards use JTAG signals translated to LVDS for the long-distance DC coupling transport over category-7 Ethernet cables. We also use similar category-7 Ethernet cables to distribute the clock, trigger, and other timing signals to these FPGA boards using a custom *b2tt* serial protocol [5] which is DC-balanced and capable of AC-coupled connections. The source of JTAG signals is the frontend-timing-switch (FTSW) module [5], which has an FPGA connected to 20 RJ-45 ports via general-purpose dc-coupled LVDS lines. Optionally, eight RJ45 ports can be replaced by eight SFP transceivers.

The JTAG sequences are distributed over the *b2tt* protocol, as a part of the timing distribution. Functions corresponding to *initialize chain*, *get idcode*, *program*, and *verify* are implemented in a custom program *jtagft*, which runs on a single-board computer to directly manipulate an FTSW module. The JTAG sequences are then distributed over the *b2tt* protocol to another FTSW, which generates the JTAG signals for the target FPGA. In the case of the TOP subdetector, the FTSW module is used just as the switcher of the JTAG signals provided by the Vivado software and a USB cable with a long-distance extension, since the *jtagft* program does not support the programming of the processor part of the ZYNQ FPGA.

Belle II is now in the phase of designing the next-generation FPGA boards, to increase the data bandwidth, processing capability, and radiation tolerance. One of the concerns is the directly connected metal Ethernet cables which would



Fig. 3. Conceptual design of the *optjtag* system using a backend FPGA board and a circuit on a remote FPGA board.

be a source of electric disturbance to the stable operation of the FPGA boards as observed in some of the Belle II subdetectors [6]. A major obstacle is a lack of the technique to program the FPGA using an optical-fiber-only connection.

## III. OPTICAL JTAG UPGRADE FOR BELLE II

To realize the JTAG protocol over optical fibers, we designed a circuit made of a small number of discrete devices to receive a custom-encoded JTAG protocol (*optjtag*) from an AC-coupled connection, which is suitable for optical fiber transport over a long distance.

Our design is inspired by Deng et al. [7], who used a high-speed serial link in a similar application. The idea is simple: 3-bit parallel signals of TCK, TMS, and TDI are serialized into a DC-balanced serial stream at the sender, and deserialized by a receiver. The returned signal, TDO, is similarly encoded, serialized, and deserialized. The TCK signal requires additional care to avoid the occurrence of spurious JTAG sequences that may put the target FPGA in an error state that can only be recovered by power cycling.

To be used in a radiation area with a tight space constraint, our circuit includes no programmable devices and uses only a small area on the remote FPGA board. The *optjtag* protocol can be easily implemented at the backend on an inexpensive FPGA with no requirement on the high-speed transceivers. At the target FPGA board end, the physical layer of the communication link uses a 10-bit LVDS serializer and deserializer (TI DS92LV1023 / DS92LV1224), operated at a parallel data rate in the range of 10–66 MHz. To avoid propagating spurious TCK signals to the target FPGA, the lock signal of the deserializer is required. However, the lock signal alone is not sufficient as we found that a short lock signal is also generated from spurious serial data when the optical fiber is not connected to the SFP module. A supervisor chip (ROHM BD46272G) is added to ensure that the lock signal is sustained for more than 200 ms. Additional discrete 74-series AND/NAND gates are used to enable/disable the TCK signal. A conceptual diagram is shown in Fig. 3.

The backend is implemented in an FTSW module for the Belle II application. The FTSW board can easily handle multiple *optjtag* ports in applications where many remote FPGA boards have to be programmed. We use a 31.8-MHz parallel data rate (381.6-Mb/s serial data rate), to use a clock that can be derived from the system clock of 127.2 MHz. This corresponds to the quantized JTAG signals at 31.8 MHz, which



Fig. 4. First (top) and second (bottom) prototype receiver boards for the *optjtag* evaluation.

is fast enough as a typical TCK speed is up to about 10 MHz. A 190.8-MHz clock is used inside the FTSW module to generate the 381.6-Mb/s double-data-rate serial output.

The deserializer of the remote optical JTAG circuit requires a reference clock which is taken from a local clock source, while it can recover the clock from the serial data. The serializer of the remote side uses the same local clock source, but the FPGA of the FTSW has no clock recovery function. We encode the TDO signal as a long series of 0 or 1, which can be decoded by the FTSW even if the serial data boundary is ambiguous.

#### IV. EVALUATION WITH A TEST BOARD

We developed two types of small prototype test boards to demonstrate the *optjtag* functions. These boards serve as an adapter to translate the *optjtag* signal received by an SFP optical transceiver module into a Xilinx-standard JTAG 14-pin ribbon cable. The first board [see Fig. 4 (top)] was designed to minimize the area and is  $21 \times 60$  mm large including connectors to mount on a small carrier board for the SFP transceiver. This board was also designed to mount on a prototype CDC readout board which we discuss later. The second board [see Fig. 4 (bottom)] has relaxed area constraints and includes a few more devices to solve minor issues (see below).

The backend of the *optjtag* protocol is implemented on the FTSW module, which receives JTAG signals from a USB JTAG program cable and transmits them via an SFP transceiver. The target device of the JTAG protocol is another FTSW module. The test setup is shown in Fig. 5.

With these prototype boards, we found a few minor issues. First, we found that the output of the 4-Gb/s SFP+ transceiver we used is not compatible with LVDS and cannot stably drive the deserializer. Second, we found that the lock signal of the deserializer is not reliable, as it spuriously asserts the lock signal even when the SFP transceiver input is open. The former just requires a translation buffer, and the latter is solved by adding a supervisor chip to wait for a stable lock signal. Besides these issues, the correct operation of the JTAG port was confirmed, although the speed was limited to 1.5 MHz because of the overhead of the serializer and deserializer.

We irradiated one of the test boards up to 2 kGy with a  $^{60}\text{Co}$   $\gamma$ -ray source and found no loss of the functionality. Our requirement is to survive at 1 kGy, which corresponds to ten years of operation of Belle II at the most radiation-harsh FPGA-board area.



Fig. 5. Test setup for the evaluation of the test board.

#### V. OVERCOMING THE JTAG LATENCY

A potential issue of a long JTAG line is the round-trip time of the signal, which limits the JTAG operation frequency. This can be overcome by introducing an artificial FPGA device (emulator) at the end of the JTAG chain, generated by a logic in the backend FPGA.

Rate limitation is mainly caused by the fact that, in a JTAG chain that contains only one FPGA, the TDO signal from the target has to be received by the controller in one TCK period ( $\sim 160$  ns at 6 MHz). However, in a JTAG chain that contains  $N$  FPGAs, the controller only expects feedback on the TDO line after  $N$  TCK periods. The method that we propose to overcome rate limitation over long JTAG lines is to add a sufficient number of emulated FPGAs in the chain to make the controller sample the TDO line after an adequately adjusted number of TCK periods. Emulated FPGAs are implemented in the FPGA logic in the controller and have a TDI to TDO propagation delay much smaller than one TCK period, which allows us to absorb the latency of the remote FPGA.

The emulator does not need to support all JTAG functions. The minimum set of JTAG functions are: 1) *initialize chain* in which the identification (ID) code of the device is sent out; 2) *IR cycle*; and 3) *DR cycles for bypass, get idcode, and monitor*. The IR is 10-bit long for the Virtex 5 FPGA and the DR size differs depending on the IR: the *bypass* register is 1-bit long, and *idcode* and *monitor* registers are 32-bit long. Therefore, the amount of delay added by the emulator has to be adjusted accordingly.

We developed a JTAG latency absorber firmware, which includes two FPGA emulators as shown in Fig. 6. The delay has to correspond to the transmission of 20 bits (10 bits  $\times$  2) for the IR cycle, 64 bits (32 bits  $\times$  2) for the initialize chain where both emulators have to generate their ID codes, 2 bits (1 bit  $\times$  2) if both are in the bypass DR cycle, or 33 bits (32 bits + 1 bit) for other DR cycles. We use a ring buffer, from which the delayed output of the target device is fetched with a proper delay. The timing to write into the ring buffer is the rising edge of the TCK signal delayed by the physical latency due to the remote FPGA, but since JTAG is a slow signal, the timing requirement is not severe.

In this latency absorber logic, the first FPGA emulator works as an emulator of the JTAG state machine of the remote FPGA. It also bypasses the TDI input signal for the succeeding



Fig. 6. Conceptual design to absorb the JTAG latency with two FPGA emulators.

FPGA emulators in the chain, for the JTAG cycle of those emulators to work as expected. The amount of necessary delay is known from the JTAG state of the first emulator, and, hence, the proper position in the ring buffer from the remote FPGA is known. The overall TDO output is either the delayed TDO from the remote FPGA or the output of the FPGA emulator chain.

In our test setup, we measured that the insertion of two FPGA emulators in the JTAG chain can compensate 230 ns of external latency using the Xilinx Impact tool at the default JTAG rate of 6 MHz. We successfully operated the JTAG chain with a 20-m category-7 Ethernet cable in this setup at the default speed, while with the nominal JTAG connection, we could operate only at 3 MHz. Screenshots of the Xilinx Impact program and ChipScope program are shown in Fig. 7. Here, the target remote FPGA is XC5VLX30 Virtex 5 FPGA, while for the emulator, we arbitrarily chose them to be XC5VLX20T and XC5VLX85, just by changing the ID code that the emulator returns.

The *optjtag* part of the JTAG transport is yet to be combined with this FPGA emulator, to test an even longer distance, for example, 50 m, using optical fibers. One can add multiple dummy devices to absorb any latency of the long JTAG line and additional devices, but it has yet to be tested.

## VI. APPLICATIONS AT BELLE II

The main target application is the next version of the FPGA board for the Belle II CDC readout. This board will use an optical link for clock and trigger reception as well as for JTAG. As it is operated in the radiation area, it is unavoidable to reprogram the FPGA several times a day to recover from otherwise unrecoverable single-event upset errors due to the background neutrons. We avoid using onboard memory devices to store the configuration data, since data in such devices will also be affected by a single-event upset. As the FPGA size and, hence, the configuration memory size increases with respect to the current one, a faster JTAG programming solution is necessary.

A new version of the FTSW module with a larger number of optical ports for clock, trigger, and JTAG distribution is also planned. The module will equip 14 QSFP ports; using 12 of them for general purpose, up to 48 optical connections to the FPGA boards will be possible. This board will be the



Fig. 7. Example of the successful JTAG operation with two FPGA emulators, shown with the Xilinx Impact program (top) and ChipScope program (bottom).

timing and JTAG distributor board for the upgraded CDC FPGA boards.

By combining these two developments, we will be able to program a large number of FPGAs in the radiation area of the Belle II experiment.

## VII. CONCLUSION

We presented a new circuit to implement the JTAG protocol over optical fibers and showed its operation. The part to add at the remote end is only composed of radiation-tolerant discrete devices and it fits in a small area.

We also presented a new technique based on emulated FPGA devices to overcome the rate limitation of the JTAG protocol when signal delays are significant. We demonstrated the successful programming of a real target FPGA via JTAG at 6 MHz over a 20-m-long cable.

These solutions will be used in the planned upgrade of the Belle II readout system, which requires a large number of FPGA boards and, hence, JTAG programming ports.

## REFERENCES

- [1] *IEEE 1149.1 Working Group*. Accessed: Oct. 6, 2024. [Online]. Available: <https://grouper.ieee.org/groups/1149/1>
- [2] T. Abe et al., “Belle II technical design report,” 2010, *arXiv:1011.0352*.
- [3] K. Akai, K. Furukawa, and H. Koiso, “SuperKEKB collider,” *Nucl. Instrum. Methods Phys. Res. A, Accel. Spectrom. Detect. Assoc. Equip.*, vol. 907, pp. 188–199, Nov. 2018.
- [4] *AMD Xilinx*. Accessed: Oct. 6, 2024. [Online]. Available: <https://www.xilinx.com/products/silicon-devices/fpga.html>
- [5] M. Nakao, “Timing distribution for the Belle II data acquisition system,” *J. Instrum.*, vol. 7, no. 1, Jan. 2012, Art. no. C01028.
- [6] M. Nakao et al., “Performance of the unified readout system of Belle II,” *IEEE Trans. Nucl. Sci.*, vol. 68, no. 8, pp. 1826–1832, Aug. 2021.
- [7] B. Deng et al., “JTAG-based remote configuration of FPGAs over optical fibers,” *J. Instrum.*, vol. 10, no. 1, Jan. 2015, Art. no. C01050.