

# Continuous timing measurement using a data-streaming DAQ system

Ryotaro Honda<sup>1,\*</sup>, Takashi Aramaki<sup>2</sup>, Hidemitsu Asano<sup>3</sup>, Takaya Akaishi<sup>4</sup>, W. C. Chang<sup>5</sup>, Youichi Igarashi<sup>1</sup>, Takatsugu Ishikawa<sup>6</sup>, Shunsuke Kajikawa<sup>2</sup>, Yue Ma<sup>3</sup>, Kei Nagai<sup>5,7</sup>, Hiroyuki Noumi<sup>8</sup>, Hiroyuki Sako<sup>9</sup>, Kotaro Shirotori<sup>8</sup>, and Tomonori Takahashi<sup>3</sup>

<sup>1</sup>*Institute of Particle and Nuclear Studies, High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki 305-0801, Japan*

<sup>2</sup>*Department of Physics, Tohoku University, Sendai, Miyagi 980-8578, Japan*

<sup>3</sup>*RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan*

<sup>4</sup>*Department of Physics, Osaka University, Toyonaka, Osaka 560-0043, Japan*

<sup>5</sup>*Institute of Physics, Academia Sinica, Taipei 11529, Taiwan*

<sup>6</sup>*Research Center for Electron Photon Science (ELPH), Tohoku University, Sendai, Miyagi 982-0826, Japan*

<sup>7</sup>*High Energy Nuclear Physics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA*

<sup>8</sup>*Research Center for Nuclear Physics (RCNP), Ibaraki, Osaka 567-0047, Japan*

<sup>9</sup>*Japan Atomic Energy Agency (JAEA), Tokai, Ibaraki 319-1195, Japan*

\*E-mail: [rhonda@post.kek.jp](mailto:rhonda@post.kek.jp)

Received July 14, 2021; Revised September 30, 2021; Accepted October 8, 2021; Published October 22, 2021

We have investigated a prototype triggerless data acquisition (DAQ) system for charmed-baryon spectroscopy at J-PARC. A special time-to-digital converter (TDC) dedicated to continuous timing measurement without any external trigger was developed. A high dynamic range of the order of  $10^{10}$  for the timing measurement was realized by introducing the heartbeat method, which periodically generates a delimiter datum. DAQ software receiving and filtering the data stream from the TDCs was also developed based on a message-queue application programming interface (API), FairMQ. We tested the prototype DAQ system with detectors using electrons and positrons converted from a bremsstrahlung photon beam with a rate condition of up to 1.9 MHz/1-mm detector segment. The DAQ received a 5.4 Gbps data stream from six TDCs and merged them to a complete data set. After sorting a certain portion of the data, the software searched for a set of corresponding data fragments in response to the penetration of a positron or electron using time separation between detector hits in the time axis and removed unnecessary hits. Around 25 GB RAM and 25 CPU threads were required for the online computing power under the test conditions. No silent data drop in the DAQ software was observed.

Subject Index H33, H34, H40

## 1. Introduction

We plan to conduct charmed-baryon spectroscopy at the Japan Proton Accelerator Research Complex (J-PARC) in the J-PARC E50 experiment [1], where we use a high-intensity, unseparated 20 GeV/c pion beam at the high-momentum beam line in the Hadron Experimental Facility [2]. Charmed baryons ( $Y_c^{*+}$ ) will be produced in the  $\pi^- p \rightarrow D^{*-} Y_c^{*+}$  reaction. We measure the  $Y_c^{*+}$  mass spectra reconstructed in a missing-mass method from the four-momenta for an incident pion, a target proton, and a scattered  $D^{*-}$  meson. Since the production cross section

of  $Y_c^{*+}$  is expected to be as small as 1 nb [3], the high-intensity  $\pi^-$  beam of  $6 \times 10^7$  per spill (spill duration 2 s) will be delivered to a 4-g/cm<sup>2</sup> liquid hydrogen target. We construct a spectrometer complex [4], which covers a solid angle of approximately 65%, to detect a charged-hadron decay chain of  $D^{*-} \rightarrow D^0\pi^- \rightarrow K^+\pi^-\pi^-$ . Beam-line and scattered-particle trackers are adopted for momentum measurements of the incident pion and scattered  $D^{*-}$  meson, respectively. These trackers are required to have high rate capabilities of 1 MHz/mm because the high-intensity beam of 30 MHz is concentrated on an area of 10 cm in diameter at the target; a hadronic reaction rate of 1.5 MHz with a charged-particle multiplicity of around 4 is expected on average. Thus, tracking devices composed of scintillating fibers of 500  $\mu\text{m}$  diameter have been developed. There are three different scintillating-fiber trackers, and the number of their readout channels is around 8000 in total. In addition, five sets of drift chambers are placed to measure the scattered particle trajectories. For better particle identifications of scattered kaons, pions, and protons, time-of-flight counters of acryl-based Cherenkov time-zero detectors [5] and resistive plate chambers [6] need to be employed in the momentum range below 2 GeV/c; ring imaging Cherenkov counters (RICH) built in aerogel radiators of index  $n = 1.04$  and C<sub>4</sub>F<sub>10</sub> gaseous radiators of index  $n = 1.00137$  are installed in the momentum range up to 16 GeV/c [7].

In total, the number of readout channels is around 26 000. Under the 1.5 MHz reaction rate, the size of the data flow from the detector system is expected to be as high as 25 GB per spill. In the E50 experiment, the cross section of the background reactions containing  $K^+\pi^-\pi^-$  in the final state is 2.4 mb [8]. The cross section of the background reaction is 10<sup>6</sup> times greater than that of charmed baryon production, even though the triggered background reactions contribute to only 10% of the total pion interaction cross section with the proton at 20 GeV/c [9]. Thus, the event trigger for data acquisition (DAQ) is an essential issue in an experiment. A simple particle identification, although not very simple for RICH, may not be sufficient to reduce the trigger rate to a few tens of kHz, which is a reasonably achievable rate in a conventional DAQ system. One can employ a field-programmable gate array (FPGA)-based trigger system with a complicated and intelligent event reduction scheme such as  $D^0$  or  $D^{*-}$  mass selections after the quick momentum reconstruction of scattered particles, e.g., a neural network trigger for the Belle II experiment [10]. To select the  $D^{*-}$  produced events, it is effective to identify the sequential  $D^{*-} \rightarrow \bar{D}^0\pi^- \rightarrow K^+\pi^-\pi^-$  decay online. The required trigger rate of a few tens of kHz may be achieved by introducing a dedicated trigger, which takes the condition for the momenta of the  $D^{*-}$  decay products. Instead, we propose introducing a triggerless, data-streaming-type DAQ system that employs fully software-based event selection. In the triggerless DAQ system, front-end electronics digitize all detector signals and continuously transfer digitized data to personal computers (PCs) without any hardware trigger. Event selection, from simple timing coincidence to complicated momentum analysis, is performed by software on the PC. Such a DAQ system is considered for large-scale experiments [11,12]. A triggerless data-streaming-type DAQ has the great advantages of flexibility and scalability. Medium-scale experiments using high-intensity beams at J-PARC would benefit from the triggerless DAQ system while ensuring high efficiency. Simultaneous measurements of several byproduct channels such as strange-baryon production via the  $(\pi^-, K^*)$ ,  $(K^-, K^{(*)+})$ , and  $(K^-, K^+K^{(*)0})$  reactions as well as charmed-baryon production can be easily accommodated in the E50 experiment.

To this end, we developed and demonstrated a data-streaming-type DAQ system with detectors under the high-rate condition at a test experiment using electrons and positrons from a bremsstrahlung photon beam. A special time-to-digital converter (TDC) for the continuous measurement was developed and evaluated. The design of the TDC and the DAQ software are described in Section 2. The experimental conditions and results are reported in Sections 3 and 4, respectively.

## 2. Prototype streaming DAQ system

The data size from front-end electronics on a triggerless DAQ system is considerably larger than that of a conventional DAQ system because all detector signals are digitized and then transferred to a PC without filtering. Therefore, the DAQ system for the E50 experiment dedicates the reading timing information to reduce data size. A waveform digitizer is not used. Thus, we developed a TDC that digitizes the timing of the detector signal without any hardware trigger, a streaming TDC (StrTDC), and software to collect data from these TDCs. There is a specific requirement for the StrTDC, i.e., a high dynamic range of the order of  $10^{10}$ . As a proton beam is extracted using a slow extraction method, secondary beam particles randomly come over 2 s within a 5.2-s cycle. Further, we require a timing resolution better than 1 ns in  $\sigma$  for the scintillating-fiber tracker to measure the beam trajectories under high-intensity conditions. Thus, a continuous timing measurement over 2 s with a precision of around 1 ns is necessary; it corresponds to a dynamic range of the order of  $10^9$ .

### 2.1 Data-streaming-type TDC

We adopted a general-purpose FPGA module, a hadron universal logic (HUL) module [13], as a platform to implement an StrTDC. The HUL module is a VME 6U size board with a Xilinx FPGA, XC7K-160T-1 [14]. This module has 64 digital signal inputs and has two mezzanine card slots for function expansion. These signal input ports and the mezzanine card slots are connected to the FPGA. Gigabit Ethernet (1000BASE-T) is used as the communication standard. The HUL module communicates with the PC using the user datagram protocol (UDP) or the transmission control protocol (TCP) realized by silicon TCP (SiTCP) [15], which is implemented in the FPGA. When the SiTCP and the PC were connected one-to-one without a network switch, the measured transmission speed of the SiTCP was 949 Mbps for the TCP communication [15]. The StrTDC was synchronized by a common clock, a 50-MHz master clock. A mezzanine card, a master clock distributor (MCD) mounted on the HUL module, provides a receiver port for the master clock signal. Further, the MCD card has four transmitter ports distributing the master clock signal to the downstream HUL modules. Figure 1 shows a photograph of the HUL module with the MCD card.

A block diagram of the StrTDC implemented in the FPGA on the HUL module is shown in Fig. 2. The number of input channels for the StrTDC is 32. The StrTDC comprises an online data-processing (ODP) block and a vital block. The 130-MHz base clock and the 260-MHz clock signals are generated from the master clock signal via the mixed-mode clock manager (MMCM) in the FPGA, and they are distributed to the ODP and vital blocks. The ODP block has four functional units: 1) the TDC unit measures the timings of both the leading and trailing edges of incoming signals; 2) the pairing unit conducts the paring of the leading and trailing edges of a signal and calculates the time-over-threshold (TOT); 3) the pulse-height time-walk of the leading-edge timing is corrected by the time-walk corrector using the TOT information;



**Fig. 1.** Photograph of the HUL module with the MCD card. The center of the photograph is the HUL module. The part surrounded by the white square at the bottom left is the MCD card.



**Fig. 2.** Block diagram of the StrTDC. It comprises an online data-processing block and a vital block. The heartbeat generator is used to provide a coarse count for the TDC data and to generate heartbeat data in the vital block. We employ the single-system clock domain for all blocks with a frequency of 130 MHz, which is generated from the master clock signal.

and finally, 4) hits with the small TOT are filtered out by the filter unit. A fixed latency in each unit of the ODP block ensures the real-time processing of signals. The vital block generates a special datum in response to the periodically distributed signals; this datum and the periodic signal are called the heartbeat data and the heartbeat signal, respectively. We need an elapsed

time from the beginning of a spill duration. The number of heartbeat data yields an elapsed time with a precision of the heartbeat period. Generation of the heartbeat data is one of the most important functions of the StrTDC.

The timings are independently measured for both the leading and trailing edges of incoming signals. Four multi-phase 260-MHz clock signals with 0, 90, 180, and 270 degrees are used to determine the timing with a precision of 0.97 ns. Binary TDC data are fed into the shift registers inside the pairing unit. If the leading edge matches the trailing edge while it is moving inside the shift register, the pairing unit treats them as a pair and provides the TOT. The length of the shift register corresponding to the maximum measurable TOT value is set to 150 ns because the typical TOT of the fiber trackers in the hadron experiments at J-PARC is less than 100 ns. A TOT value of 0 is given if the leading edge does not find the corresponding trailing edge. The leading edge with the TOT is sent to the time-walk corrector and the trailing edge is discarded. The time-walk of the timing signal is corrected by the time-walk corrector unit to reduce the CPU load. The leading-edge timing is corrected by a function of TOT, which provides a constant value depending on the TOT region. Finally, the TOT filter rejects small TOT data such as dark noise at the end of the ODP block.

The heartbeat signal is generated by the carry-out bit of the 16-bit local counter, i.e., the heartbeat generator, as shown in Fig. 2. The interval between two heartbeats is defined as the heartbeat frame. Since the heartbeat generator is driven by the 130-MHz clock signal, the heartbeat period is 504  $\mu$ s. This heartbeat frame is used as a time unit in the analysis stage.

The vital block comprises two merger units and a heartbeat unit. The merger unit combines multiple data paths from the ODP block into one and the transferred data to the heartbeat unit. The configuration of a merger unit is shown in Fig. 3(a). The merger unit comprises small 2-to-1 mergers arranged in a six-layer tree. The 2-to-1 merger has two first-in–first-out (FIFO)-type buffers and combines two data input paths to an output path. The signal path for reading from these two buffers is switched at every clock cycle. The buffer depth from the first to the fourth layers is 128, while it is 4096 in the fifth and sixth layers. The total buffer size in the merger unit is larger than that corresponding to the number of expected incoming hits in a heartbeat frame when the detector hit rate is 1 MHz/channel. The heartbeat unit inserts the heartbeat data into a series of TDC data as a delimiter to indicate the end of the time frame. To determine the end of the frame, the vital block uses a double-buffer structure, as shown in Fig. 3(b). The signal path for writing to the merger unit is switched from one to the other in response to the heartbeat signal. The heartbeat data are generated when all data stored in the buffers of the merger unit currently being read are sent to the buffer of the heartbeat unit. We represent this condition as the merger unit being empty. After inserting the heartbeat data, the heartbeat unit switches the signal path for reading to the merger unit, which receives data from the ODP block. The 16-bit frame number is included in each heartbeat data fragment to identify the number of the current frame. Accordingly, the StrTDC is able to measure the elapsed time from the spill start up to 33 s because the heartbeat period is 504  $\mu$ s. As the timing precision of the lowest significant bit is 0.97 ns, a high dynamic range of the order of  $10^{10}$  is realized. The heartbeat unit has a FIFO-type buffer with a depth of 4096.

The data length of an internal data fragment in the FPGA is 64 bits; however, 24 bits are dummies. Thus, the dummy bit ripper removes the unnecessary bits before sending data to the SiTCP. The 40-bit length data include 6 bits for the data type, 6 bits for the channel number, 8 bits for the TOT value, 19 bits for the leading-edge timing value, and an additionally reserved



**Fig. 3.** (a) Configuration of a merger unit. The small white box represents a primitive 2-to-1 merger, which combines two signal paths into one. The gray box in the 2-to-1 merger indicates a FIFO-type buffer. (b) Block diagram of a vital block. The arrows represent signal paths for writing and reading. The heartbeat unit also has a FIFO-type buffer.

1 bit. Finally, the SiTCP sends data to the PC. The SiTCP buffer size is 32 kilobits. The designed data-processing rate of the vital block and dummy bit ripper are 5.2 Gbps (130 MHz × 40 bits) and 1.0 Gbps, respectively. Thus, the StrTDC throughput is limited by the SiTCP with a transfer speed of 949 Mbps. To confirm the StrTDC throughput, we reproduced the behavior of the vital block, dummy bit ripper, and SiTCP using C++ by performing a Monte Carlo simulation. In this simulation, we assumed that the data come randomly to each channel of the merger unit. Thus, the StrTDC throughput of 940 Mbps was reproduced in the simulation. We found that the prepared buffer sizes are sufficient for the randomly incoming data and there is no data drop because of the buffer being full. If the incoming data rate for the vital block exceeds the transfer speed of the SiTCP, the buffers fill up from that in the SiTCP. Finally, the buffers in the merger units become full. Since the merger unit currently being read needs to be empty to insert the heartbeat data, the heartbeat data are not generated. Further, the function of the heartbeat unit breaks down. In this case, the state of the heartbeat unit switches from normal operation mode to busy mode. Figure 4 illustrates the difference in the heartbeat data insertion timing between the normal and busy modes. The signal path change timing is delayed from the heartbeat signal because the heartbeat unit reads the remaining data from the merger unit. The signal path for reading is switched after inserting the heartbeat data. When the buffer in a heartbeat unit becomes full, the currently read merger unit is not empty, and the heartbeat data are not generated. If the time until the next heartbeat signal is less than 500 ns, i.e., 64 clock cycles of the base clock signals, the operation mode is switched from normal mode to



**Fig. 4.** Time chart of the heartbeat unit operation for both normal and busy modes. The heart symbols indicate the timing of heartbeat signal generation. Left and right represent the signal paths from the left and right merger units to the heartbeat unit, respectively. HB and busy HB data denote the normal heartbeat data and busy heartbeat data inserted under the busy mode, respectively.

busy mode without generating the heartbeat data. The signal path between the merger units and heartbeat unit is disconnected and the remaining data in both merger units are discarded. At this timing, the busy heartbeat data are generated, which indicates that some of the data collected during heartbeat frame A in Fig. 4 are not sent to the PC. The data from the ODP block during frame B are stored in the right merger; however, a data drop can occur because the buffer is full inside the right merger. Therefore, data in both merger units are discarded. Simultaneously, writing data to the merger units is blocked under the busy mode. When both merger units became empty, the second busy heartbeat data are inserted to indicate that all data collected during frame B are discarded. This occurs during the frame C and writing to the merger units is still blocked. Thus, the third busy heartbeat data are necessary before returning to the normal mode. If the heartbeat unit is ready to transit to the normal mode, the write block is released by the heartbeat signal. Normal operation is restarted from frame D. As mentioned above, once the heartbeat unit transits to the busy mode, it requires three frames to return to the normal mode.

## 2.2 DAQ software

The key to achieving a high-throughput streaming DAQ is to have a large high-speed buffer and use it effectively as discussed in Ref. [16] both in hardware and software. A streaming DAQ is expected to be a triggerless and high-speed DAQ system. Recently, a commercial PC has enabled the implementation of the components of the streaming DAQ using many CPU cores, Intel Xeon and AMD EPYC series, a high-speed bus, PCI-Express, and a large-capacity RAM. Software for a prototype streaming DAQ was developed based on FairMQ [17], which is a C++ library comprising an abstract messaging application programming interface (API) to collect data from the StrTDC. FairMQ provides an efficient data transport service realizing high throughput based on message-queuing technologies. Further, FairMQ provides basic building blocks called “devices”, which are used for constructing a data collecting and processing framework. A schematic configuration of the DAQ is shown in Fig. 5. This system is composed of five processes: sampler, sub-time-frame builder, time-frame builder, filter, and file sink. Each HUL module is bonded to one dedicated process called the sampler process provided by the



**Fig. 5.** (a) Configuration used in the performance evaluation for the StrTDC. Data from six HUL modules are collected and saved independently. (b) Full configuration to evaluate the software performance. Multiple time-frame builders, which gather data from six sub-time-frame builders, are arranged for load balance.

FairMQ library; each sampler process is connected to one sub-time-frame builder; all data from the sub-time-frame builders are merged in the time-frame builder process to pack the complete data from all HUL modules for online analysis and data storage. The sub-time-frame builder and time-frame builder processes can serve effectively as the data buffer to improve the throughput of data collection. The filter process enables a data group to correspond to track candidates and filter out random noise. The file sink writes the data to the disk. In a test experiment, two different types of configurations were used as shown in Fig. 5. The simple configuration shown in Fig. 5(a) was used for the performance evaluation of the StrTDC. To evaluate the throughput of the StrTDC, the software configuration is simplified by removing the time-frame builder and filter, which can be a bottleneck in this system. Further, this time-frame builder and filter are the most important components to be tested for the software. Thus, the full configuration shown in Fig. 5(b) was also tested.

The length of a sub-time frame is defined as the number of heartbeat frames ( $10^3$  heartbeat frames = 504 ms in this work). We used six HUL modules in the test experiment. Consequently, six sub-time frames were generated in parallel for each StrTDC. We merged and synchronized these six sub-time frames to obtain a complete data set for event construction. A time-frame builder process was provided by FairMQ for this task. During the beam time, each time-frame builder process was followed by one online filter and one file sink to save data into the HDD. The number of time-frame builder processes can be defined by the user in the configuration file. The multiple sub-time-frame builder processes and time-frame builder processes were connected in a round-robin fashion: all six sub-time-frame builder processes dumped one sub-time frame synchronously into the same time-frame builder process and then moved on to the next time-frame builder process to repeat the same action. We used 12 time-frame builder processes to hold data from the six sub-time-frame builder processes. At the beginning of the data collection, all six sub-time-frame builder processes dumped the first sub-time frame into the first time-frame builder process, the second sub-time frame into the second



**Fig. 6.** Schematic of the experimental setup. A prototype Cherenkov timing detector, prototype scintillating-fiber tracker, drift chamber, and reference counters upstream and downstream of the scintillating-fiber tracker are used in the experiment.

time-frame builder process, and so on. A total of 12 binary data files were generated from the file sink processes coupled with each time-frame builder process in the configuration as shown in Fig. 5(b).

### 3. Test experiment

#### 3.1 Detector setup

We installed a prototype Cherenkov timing detector and sets of a scintillating-fiber telescope, drift chamber, and reference counters upstream and downstream of the scintillating-fiber telescopes to test the prototype streaming DAQ system. All detectors were installed along the photon beam direction ( $z$  axis) as shown in Fig. 6. A prototype Cherenkov timing detector is made of an acrylic Cherenkov radiator that takes an X shape where two acrylic bars are crossed at the center [5]. Each bar had a cross section of  $3 \times 3 \text{ mm}^2$  and a length of 150 mm. A multi-pixel photon counter (MPPC), Hamamatsu Photonics S13360-3050, was connected to the downstream end of each bar. The MPPCs were implemented on a shaping amplifier board with a fast operation amplifier. The shaping amplifier circuit cut the tail of the output signal from the MPPC for making the signal width narrow to  $\sim 4 \text{ ns}$  in  $\sigma$ . A timing resolution of  $\sim 40 \text{ ps}$  (rms) was obtained by the test experiment [5]. The scintillating-fiber telescope comprised a 1-mm diameter scintillating fiber, Kuraray SCSF-78M, with a staggered configuration for constructing one layer [18]. The fibers in the layer were precisely aligned and fixed with an epoxy-type glue. We constructed four sets of three scintillating-fiber telescopes as a prototype scintillating-fiber tracker. In each telescope, the fibers were placed at tilt angles of  $0^\circ$  ( $X$ ),  $+30^\circ$  ( $U$ ), and  $-30^\circ$  ( $V$ ) with respect to the vertical ( $y$ ) direction, and all of them were placed to intersect the photon beam direction ( $z$  axis). Each fiber was attached to the 1.3-mm size MPPC, Hamamatsu Photonics S13360-1350, with air contact. A photograph and drawing of the fiber tracker are shown in Fig. 7. The drift chamber consisted of three layers of multi-wires in the vertical ( $y$ ) direction with a 10-mm size hexagonal cell. The output signal was processed using an amplifier–shaper–discriminator (ASD) card. Each reference timing counter consisted of a plastic scintillator, Eljen Technology EJ-228, and metal-package photomultiplier tube (PMT), Hamamatsu Photonics R9880U-113, installed upstream and downstream of the prototype fiber tracker. The reference counters covered an area of the same size as the sensitive area of the fiber tracker.



**Fig. 7.** Photograph and drawing of the prototype scintillating-fiber tracker. Photograph of the detector is shown on the left. Scintillating fibers comprise a 1-mm diameter scintillating fiber with a staggered configuration, as shown in the image on the right. In each layer, fibers are placed at tilt angles with respect to the vertical ( $y$ ) direction. The beam detection area is the central position where each layer crosses with coverage of  $6 \times 6$  mm.

### 3.2 Experimental conditions

We tested the prototype DAQ system with a set of prototype detectors using charged particles converted from bremsstrahlung photons on the second photon beam line [19] at the Research Center for Electron Photon Science (ELPH), Tohoku University [20]. In the test, it was necessary to ensure that the experimental environment had the same counting rate for each detector as that of the actual E50 experiment conditions of  $\sim 1$  MHz per 1-mm detector segment, wherein the detector segment corresponded to the readout block such as the Cherenkov radiator and scintillating fiber. A high counting-rate condition of 1 MHz/mm segment was realized using electrons and positrons converted from the bremsstrahlung photon beam. The converter was a 1-mm thick aluminum exit window in the flange of the vacuum duct and an additional 1-mm thick aluminum plate. The electron energy in the accelerator was 1.3 GeV; therefore, the energies of electrons and positrons converted from the extracted bremsstrahlung photon beam were distributed up to the energy of 1.3 GeV. The detectors were irradiated with electrons and positrons over 10 s within a 17-s beam cycle. We gradually increased the counting rate by adjusting the circulating current of the accelerator; then, we evaluated the performance under the counting-rate condition up to the 1.9 MHz/mm segment.

### 3.3 Configuration of the DAQ system

The schematic configuration of the DAQ system evaluated in this test experiment is illustrated in Fig. 8. The StrTDCs running on the HUL modules digitized the timing of the leading and trailing edges of the incoming logic signals generated by the front-end electronics reading the detectors. StrTDCs were synchronized using the 50-MHz master clock signals. Six HUL modules sent the StrTDC data to the PC where the DAQ software was implemented. A discriminator for the timing detector was developed for the LEPS2 experiment in SPring-8. Signals



**Fig. 8.** Configuration of the DAQ system. The gray box on each HUL module denotes the MCD card. The dashed lines represent the CAT5 cables connecting the MCD cards. The logic signals from the discriminator, ASD, and NIM-EASIROC module are sent to StrTDC running on the HUL module. The data link speed is up-converted from Gigabit Ethernet to 10 Gigabit Ethernet by a network switch.

from the drift chamber were read by an ASD card, which included an ASIC chip for the thin-gap chambers in the ATLAS experiment [21]. The NIM-EASIROC module [22] was a NIM standard module for the multi-pixelated photon detector (multi-PPD) readout using an ASIC, extended analogue SiPM readout chip (EASIROC) [23]. We employed a NIM-EASIROC module as an ASD to feed the logic signals to the HUL module. The discriminator, ASD, and NIM-EASIROC provided the logic signals of each detector. The logic signals were sent into the StrTDC. The 50-MHz master clock signals distributed by the MCD cards synchronized the six HUL modules. The data link speed was up-converted to 10 Gigabit Ethernet by a network switch, CISCO 6120xp and Fiberstore S3800-24F4S. The software components were implemented on a PC with two Intel Xeon E5-2630 v4 processors and 256 GB main memory (DDR4-2400).

#### 4. Results

We derived the performance of StrTDC, the TOF timing resolution between the scintillating-fiber tracker and reference counter, noise rejection by the TOT filter, and throughput via an offline analysis. The data used in the offline analysis for evaluating these items were collected using the DAQ configuration shown in Fig. 5(a). We confirmed the extent to which the time-walk corrector improved the timing resolution and its rate dependence. Further, the timing resolutions were compared between the time-walk corrector in the FPGA and the offline analysis. Next, we determined how efficiently the TOT filter reduced the data rate. Finally, we estimated the throughput of the StrTDC from a recorded data size and the data transfer efficiency. The throughput of the StrTDC was determined based on the speed of the data link. The obtained throughput indicates the data transfer speed with the TCP communication under the DAQ configuration in the test experiment.



**Fig. 9.** (a) Correlation between TOT and the timing of the fiber of the fiber tracker. (b) Time-walk corrected plot by the time-walk corrector. The vertical lines in (b) represent the boundaries of the TOT regions.

#### 4.1 Performance of StrTDC

**4.1.1 Timing resolution.** Coincidence between reference counters was required in the offline analysis. A software trigger was generated if all reference counters were fired within a time window of 46 ns, which corresponds to  $\pm 3$  clock cycles of the 130-MHz base clock. The start timing of an event was determined by the most upstream reference counter. The fiber tracker data falling in a  $\pm 77$  ns range from the start timing were collected, and an event data set was built. The TOF for each fiber was obtained here. The correlation between the TOT and the timing of the leading edge for the fiber is shown in Fig. 9(a) at the low rate, 70 kHz/fiber on average. We describe a correction method used in the time-walk corrector. The TOT was divided into five regions and the TDC values were corrected by constant values, which were set in each region as shown in Fig. 9(b). The width of each TOT region was determined such that the standard deviation of the raw timings for regions 1–3 were the same when using beta-rays from the  $^{90}\text{Sr}$  source before the test experiment. We set the correction values such that the TOF peak position in each region would be the same during the experiment. The correction values were common to all channels in the StrTDC. Figure 9(b) shows the correlation plot when the TDC values were corrected by the time-walk corrector under the same rate condition as in Fig. 9(a). The obtained timing resolutions without and with correction provided by the time-walk corrector were  $1.11 \pm 0.04$  ns and  $0.94 \pm 0.04$  ns in  $\sigma$ , respectively. These results included the contribution from the reference counter, which was estimated to be  $0.42 \pm 0.00$  ns from the TOF distribution between the reference counters. As the intrinsic timing resolution of the actual timing counter in the E50 experiment will be better than 0.1 ns, we subtracted 0.42 ns from the TOF timing resolution and defined the value after subtracting the timing resolution of the fiber tracker. Thus, the timing resolutions without and with the correction provided by the time-walk corrector were  $1.03 \pm 0.04$  ns and  $0.84 \pm 0.04$  ns, respectively.

Further, we studied the rate dependence of the timing resolution. We obtained the timing resolutions fiber-by-fiber and calculated the averaged resolution using fibers under the same signal rate condition because the signal rates per fiber differed by a factor of four in the plane depending on the converted electron and positron profile. The averaged timing resolution as a function of the signal rate is plotted in Fig. 10. The horizontal axis is the rate per TDC channel,



**Fig. 10.** Averaged timing resolution as a function of the signal rate. The vertical axis is the timing resolution in  $\sigma$ . The horizontal and vertical dashed lines indicate the required timing resolution of 1 ns and the expected rate in the E50 experiment, respectively. The data points indicated by the filled squares and open triangles are the timing resolutions obtained with and without the correction by the time-walk corrector, respectively. The filled triangles (red) and open circles (black) represent the results when correcting the time-walk by the software-emulated time-walk corrector with seven TOT regions using an integer number and a 2-bit fixed-point number in the offline analysis, respectively. The open squares (blue) represent the timing resolution when using a continuous and smooth function for the correction.

which is equal to the rate per 1-mm segment in this test experiment, because the fiber diameter is 1 mm. The timing resolution satisfies the requirement of 1 ns at the low rate by using the time-walk corrector; however, it exceeds 1 ns at the expected rate of 1 MHz/channel. The signal pile-up in the EASIROC chip would change the effective threshold and cause deterioration of the timing resolution when the signal rate is high. The timing resolution is also estimated by correcting the time-walk using the following function:

$$f(\text{TOT}) = a \times \text{TOT} + b/\text{TOT} + c/\sqrt{\text{TOT}} + d. \quad (1)$$

The difference between the filled and open squares indicates that there is room for improvement in the time-walk corrector. We modified the time-walk corrector and reproduced it in the offline analysis. First, the number of divisions of the TOT regions were increased from five to seven. The fourth and fifth regions were split at the center of those regions. Here, we determined that the center of the fifth region was 70 because the maximum TOT value was around 90. The timing resolution when correcting using the modified time-walk corrector is represented by the filled triangles (red) in Fig. 10. The timing resolution is slightly improved; however, it is still consistent with the results obtained with the original time-walk corrector within the error bars. Next, we tested a 2-bit fixed-point number for correction values instead of an integer number. The number of TOT regions was seven. The results are represented by open circles (black) in Fig. 10. The obtained timing resolution is close to the resolution obtained using Eq. (1). Thus, the TOT should be divided into seven regions and the leading-edge timing should be corrected using a 2-bit fixed-point number to satisfy the required timing resolution at 1 MHz/mm segment.

**4.1.2 TOT filter efficiency.** The open histogram shown in Fig. 11 illustrates the TOT distribution of a fiber in the third  $U$  layer located at the center of the detector for all hits including



**Fig. 11.** TOT distribution for a fiber. The open histogram indicates the TOT distribution for all the hits. The shaded red histogram represents the TOT distribution for events requiring a hit in each of the other layers.

noise. The TDC channel with a peak around a TOT value of 45 corresponds to signals from the converted electrons and positrons. The shaded (red) histogram is the TOT distribution for the same fiber if there is a hit on the other 11 layers. This requirement corresponds to the event where the converted electron and positron that pass through the fiber tracker are selected. To reject accidental hits, the  $\pm 3\sigma$  range of the TOF distribution was selected. The two histograms were scaled so that their heights were the same. One can find an excess in the TOT region lower than 40 ns, which is probably caused by noise. The excess around the TOT value of 60 seems to be an event where two charged particles penetrate a fiber. In the test experiment, we tested the TOT filter by setting the TOT threshold to 30, which corresponds to 29.1 ns. This was studied under the low-rate condition. By applying the TOT filter, the data rate was reduced from 2.29 Mbps to 2.14 Mbps. The data rate dropped by 0.15 Mbps. As we used new MPPCs and set the discriminator threshold to 3.5 p.e. in EASIROC, the noise rate derived from the dark noise was not high. However, as the MPPCs will be damaged during the E50 experiment by radiation, the TOT filter may be important during the actual beam time.

**4.1.3 Throughput.** The throughput of the StrTDC was obtained by dividing the amount of data  $D$  recorded in the PC by the measurement time  $\Delta t$ . The amount of data that was going to be transmitted was estimated from the recorded data to determine the relationship between the throughput and the data rate inside the StrTDC. We call this the expected data rate and define it as

$$\frac{D(\Delta t)}{\Delta t \cdot \epsilon}, \quad (2)$$

where  $\epsilon$  denotes the DAQ efficiency. As the DAQ efficiency depends on the rate of the incoming detector signals, the beam intensity should be flat in the time span. Since the beam intensity was gradually changed during a beam-on period of 10 s, we divided the period into 10 spans and estimated  $\epsilon$  in each span. Under these conditions, the maximum variation of the beam rate in  $\Delta t$  was  $\pm 5\%$ . Two different DAQ efficiencies can be defined. One is the normal mode ratio, which is the ratio of the number of heartbeat frames in the normal mode to the number of total heartbeat frames. However, this approach always underestimates the DAQ efficiency because



**Fig. 12.** (a) DAQ efficiency (filled squares) and normal mode fraction (open circles) as a function of expected data rate. (b) Throughput of the StrTDC as a function of the expected data rate.

some data are transferred to the PC in the first heartbeat frame after transitioning to the busy mode. Thus, we calculated the live time of the StrTDC by counting the time for which data are not transferred. The DAQ efficiency,  $\epsilon$ , is defined as the ratio of the live time to the measurement time. On the other hand, the throughput is defined as  $D/\Delta t$ .

The DAQ efficiency and normal mode fraction are plotted as a function of the expected data rate in Fig. 12(a). As mentioned above, the normal mode fraction is always smaller than the DAQ efficiency. Further, the DAQ efficiency starts to drop around 900 Mbps, which indicates that the data transfer speed using the TCP communication in this test experiment is 900 Mbps. Figure 12(b) shows the throughput as a function of the expected data rate. The throughput is at its maximum when the expected data rate is 900 Mbps; it then gradually decreases. This tendency is caused by the recovery procedure from the busy mode. The heartbeat unit discards the remaining data in the merger units and blocks to write the data to the vital block under the busy mode. If the state of the heartbeat unit frequently transits into busy mode, the throughput becomes lower than the data link speed.

#### 4.2 DAQ software

The beam intensity was adjusted to be around 1 MHz/mm on the scintillating-fiber tracker during the beam time. Each HUL module transfers data with a data transfer speed of around 900 Mbps because the expected data rate from the HUL exceeds the data link speed under these conditions. Thus, the total data rate from six HUL modules was around 5.4 Gbps.

We employed an efficient algorithm `std::inplace_merge` in the C++ standard template library to merge and sort the TDC data collected in each filter process. The CPU consumption for on-line computing is shown in Fig. 13, and the RAM consumption is shown in Fig. 14. The highest curve of Fig. 13 corresponds to the total working load that includes five tasks; it represents that around 25 CPU threads are used to handle 5.4 Gbps throughput. Among them, the online filter and file sink processes are the most demanding, and they are indicated by the second and third highest curves in Fig. 13. The rest of the tasks dedicated to the TCP/IP bounding with the HUL modules and data bridging between different tasks have marginal CPU consumption. A similar observation is found in Fig. 14, where the highest curve indicates the total RAM consumption; filter and file sink processes consume about 6% and 2% of 256 GB RAM, respectively. The highest curve of Fig. 14 shows that around 25 GB RAM is consumed in total. This is



**Fig. 13.** CPU consumption for each DAQ process. The solid (black), dashed (red), dotted (green), dash-dotted (blue), long-dashed (violet), and long-dash-dotted (magenta) lines represent the CPU consumption for all DAQ, sampler, sub-time-frame builder, time-frame builder, filter, and file sink processes, respectively.



**Fig. 14.** RAM consumption for each DAQ process. The solid (black), dashed (red), long-dashed (violet), and long-dash-dotted (magenta) lines represent the memory consumption for all DAQ, sampler, filter, and file sink processes, respectively.

reasonable because the merging and sorting algorithm used by the filter process requires a large RAM space to manipulate the TDC stream; further, the file sink has to store the data temporarily in the RAM before dumping it into the HDD. The recorded RAM consumption for the sub-time-frame and time-frame builder processes were zero because their memory usage was too low.

#### 4.3 Data verification

To verify the data quality, we checked the heartbeat frame number received from the HUL module reading signals from the reference counters. We did not find any missing heartbeat frame number, which suggests a silent drop-out of the heartbeat data. Within the beam time, we confirmed that all six HUL modules sent their synchronized data to the same time frame. Further, there was no crash of the TCP connection between the HUL modules and PC. The DAQ framework successfully accumulated the data.

After the test beam, a cosmic ray test with the streaming DAQ system was performed to confirm the long time stability. The main difference was the smaller data rate than that of



**Fig. 15.** Illustration of the definition of a time window. The vertical black lines represent boundaries between time windows. The third time window illustrates that two beam events are included because of accidental hits between two beam events.



**Fig. 16.** Time separation between neighboring hits. The open and shaded (red) histograms represent the data collected in the test experiment and simulated data, respectively.

the beam test and the topology between the sub-time-frame and time-frame builders. Four HUL modules were connected to one time-frame builder and no round-robin operation was employed between the sub-time-frame and time-frame builders. The stability of data collection was investigated and confirmed by checking the heartbeat frame number during the 30-hr data collection.

#### 4.4 Data analysis

At the beginning of the data analysis in the filter process, data from different StrTDCs were merged into an aligned time buffer with `std::inplace_merge`. This sorting algorithm consumes a considerable amount of RAM, which is the reason for using a smaller data trunk of 0.5 s. After the TDC data are merged, one needs to define the basic data analysis unit that corresponds to the trigger logic in the conventional DAQ system.

The preliminary method used for the present test beam is called the time-window method, and it is illustrated in Fig. 15. A time window is constructed by determining the time separation between neighboring hits in the time axis: if the separation between two neighboring hits is less than 10 ns, these hits are grouped into the same time window; the timing of the second hit is used as the standard to determine whether the next hit belongs to the same time window. Therefore, the width of the time window is not fixed; it changes dynamically. Figure 16 illustrates the time separation between the neighboring hits. The open and shaded (red) histograms represent the distributions of data and the Monte Carlo (MC) simulation of random hits. The MC simulation was performed assuming a random hit rate of 0.5 MHz/channel. The distribution of correlation



**Fig. 17.** Multiplicity within a time window of 10 ns. The open and shaded (red) histograms represent the data collected in the test experiment and simulated data, respectively.

times larger than 10 ns results from the random hits, which is consistent with the experimental setup considering the TOF resolution and detector response time.

Figure 17 shows the multiplicity in each time window. The open and shaded (red) histograms indicate the actual and simulated data, respectively. The simulation condition is the same as that of the simulation illustrated in Fig. 16. The low-multiplicity time windows result from random noise, and they need to be filtered out by the filter process. More than 50% data reduction can be achieved if we reject time windows with a multiplicity less than 4.

After determining the time window, hit clustering is performed for a set of  $X$ ,  $U$ , and  $V$  layers using a built-in lookup table (LUT). The LUT is a type of counter map formatted as a matrix wherein each fiber layer is 1D and the fiber ID is the element ID. A Boolean matrix  $X_{ijk}$  is prepared where fiber IDs, the  $X$ ,  $U$ , and  $V$  layers, respectively. Thus, by looking at a particular matrix element in this built-in lookup table, one can immediately learn whether the fibers under consideration have a geometry overlap. For example,  $X_{113} = 0$  implies that  $X$ -layer-ID1,  $U$ -layer-ID1, and  $V$ -layer-ID3 have no geometry overlap; this combination should be dumped from track finding. Provided that this LUT is not too big, we think that this is one of the most efficient online filters, if not the fastest.

Finally, we estimated the CPU time for the data analysis method described above. In general, 0.5 s of data collection using four HUL modules requires 2 s of CPU time on a single core including sorting, constructing the time window, and hit clustering. However, the performance strongly depends on the data segment size: the smaller the data segment to be processed, the better the performance. The CPU time can be further reduced by fine-tuning the size of the data segment, even though the performance of the current choice of a 0.5-s-long data trunk is already satisfactory. Therefore, we have successfully demonstrated the functionality of the current system and its potential application for triggerless data collection.

## 5. Discussion

The fiber detectors are the largest data source in the E50 experiment. The expected data rate in E50 is higher than that of the present test experiment. The developed DAQ system can be scaled up by simply increasing the number of PCs so that the load on each DAQ software component becomes moderate. We estimate the computer resources required for the E50 experiment to

receive data from the fiber detectors from the results obtained in the test experiment. A new electronics is adopted instead of the NIM-EASIROC for the readout of the fiber detectors. We are developing an on-detector-type readout electronics with four ASICs that can read 128 channels in total. The prototype StrTDC developed in this work is implemented into the FPGA on the new circuit. Thus, the data structure transferred to the PC is the same as that used in the test experiment. As the data rate from an FPGA is four times larger than that of the prototype StrTDC, the 10 Gigabit Ethernet-based SiTCP is used for providing sufficient data transfer speed. By considering a  $\pi^-$  beam rate of 30 MHz and a 1.5 MHz reaction rate at the target, the expected total data rate from the fiber detectors is 45 Gbps. According to Fig. 13, the CPU occupancy is over 50%, and all 20 physical cores have already been used. The PC may be able to receive more data, but it may not be able to receive twice. Therefore, if we use at least eight PCs with similar specifications, all data streams from the fiber detectors can be received.

In the E50 experiment, track finding is needed after the processes performed in the prototype DAQ software, but it is not necessary to try track finding for all incoming data. Around 90% of the beam particles pass through the target without reacting, i.e., beam through. The beam-through events can be eliminated from the time information of the timing counters and the position information of the fiber detectors. Therefore, track finding is done after reducing the amount of data by rejecting the beam-through events. Investigation of a track-finding algorithm and estimation of the required computer resources are left for future work.

## 6. Summary

To deal with the need for randomly incoming particles under J-PARC slow extraction, we developed a prototype streaming DAQ system for the J-PARC E50 experiment. The hit rate of detectors can go up to 1 MHz/channel. A continuous timing measurement is planned with a system composed of front-end StrTDC modules and a message-queue-based FairMQ. A novel heartbeat method is applied to the StrTDC and a large dynamic timing range of the order of  $10^{10}$  was realized with only a 19-bit TDC data width without any hardware trigger.

The developed system with the prototype detectors was tested using the detectors and electrons and positrons converted from the bremsstrahlung photon beam at ELPH in Tohoku University. In the test experiment, six samplers, six sub-time-frame builders, 12 time-frame builders, 12 filters, and 12 sinks were configured to cope with the data stream from six StrTDCs. The data transfer speed from the StrTDC using the TCP communication was found to be as high as 900 Mbps under the DAQ configuration in this experiment. The resource consumption for the on-line computing of the DAQ software was around 25 GB RAM and around 25 CPU threads for the incoming data rate of 5.4 Gbps. A timing resolution of 0.84 ns was achieved for the StrTDC after time-walk correction. No missing heartbeat frame was found during the test.

Through the test experiment, the feasibility of the streaming DAQ system was successfully demonstrated. In the process of building up the DAQ system for the J-PARC E50 experiment, it is important to develop a realistic filter logic optimized for the final detector configuration.

## Acknowledgements

We would like to thank the facility staff of ELPH for their help with the stable accelerator operation in the test experiment. We also acknowledge the support of the Ministry of Science and Technology of Taiwan. We would like to acknowledge the technical support from the members of the Open Source

Consortium of Instrumentation (Open-It). This work was supported by Grants-in-Aid for Scientific Research (KAKENHI): Scientific Research on Innovative Areas (Grant No. JP18H05402).

## References

- 1 H. Noumi et al., KEK/J-PARC-PAC 2012-19 (J-PARC E50 Proposal) (2012) (available at: [http://www.j-parc.jp/researcher/Hadron/en/pac\\_1301/pdf/P50\\_2012-19.pdf](http://www.j-parc.jp/researcher/Hadron/en/pac_1301/pdf/P50_2012-19.pdf)).
- 2 K. H. Tanaka et al., Nucl. Phys. A **835**, 81 (2010).
- 3 S. H. Kim, A. Hosaka, H. C. Kim, H. Noumi, and K. Shirotori, Prog. Theor. Exp. Phys. **2014**, 103D01 (2014) [arXiv:1405.3445 [hep-ph]] [Search INSPIRE].
- 4 K. Shirotori et al., PoS Hadron 2013, **130** (2013).
- 5 T. Akaishi et al., ELPH Annual Report **2018**, Tohoku University, 58 (2019).
- 6 N. Tomida et al., J. Instrum. **11**, C11037 (2016).
- 7 T. Yamaga et al., Nucl. Instrum. Meth. A **766**, 36 (2014).
- 8 Y. Nara et al., Phys. Rev. C **61**, 024901 (2000).
- 9 P. A. Zyla et al. [Particle Data Group], Prog. Theor. Exp. Phys. **2020**, 083C01 (2020).
- 10 Baehr, F. Kempf, and J. Becker, 31st IEEE Int. SOCC, p. 174 (2019).
- 11 D. H. Campora Perez et al., Nucl. Instrum. Meth. A **824**, 280 (2016).
- 12 The ALICE Collaboration, ALICE-TDR-19, CERN-LHCC-2015-006.
- 13 Hadron Universal Logic Module (available at: <http://openit.kek.jp/project/HUL>) [in Japanese].
- 14 Xilinx Inc. (available at: <https://www.xilinx.com>).
- 15 T. Uchida, IEEE Trans. Nucl. Sci. **55**, 1631 (2008).
- 16 R. Fruehwirth et al., Data Analysis Techniques for High-Energy Physics (Cambridge University Press, Cambridge, UK), 2nd ed (2000).
- 17 M. Al-Turany et al., J. Phys. Conf. Ser. **513**, 022001 (2014).
- 18 T. Aramaki et al., ELPH annual report **2018**, Tohoku University, 44 (2019).
- 19 T. Ishikawa et al., Nucl. Instrum. Meth. A **622**, 1 (2010).
- 20 H. Hama, APPS Bull. **30**, 41 (2020).
- 21 O. Sasaki and M. Yoshida, IEEE Trans. Nucl. Sci. **46**, 1871 (1999).
- 22 NIM-EASIROC module (available at: <http://openit.kek.jp/project/MPPC-Readout-Module/public>) [in Japanese].
- 23 S. Callier et al., Physics Procedia **37**, 1569 (2012).