Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml
Sep 8, 2024e-Print:
- 2409.05207 [cs.LG]
View in:
Citations per year
0 Citations
Abstract: (arXiv)
This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. We demonstrate the strategy for implementing the multi-head attention, softmax, and normalization layer and evaluate three distinct models. Their deployment on VU13P FPGA chip achieved latency less than 2us, demonstrating the potential for real-time applications. HLS4ML compatibility with any TensorFlow-built transformer model further enhances the scalability and applicability of this work. Index Terms: FPGAs, machine learning, transformers, high energy physics, LIGOReferences(24)
Figures(14)
- [1]
- [2]
- [2]
- [3]
- [3]
- [4]
- [5]
- [6]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]