Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml - INSPIRE

DataBETA

Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

Sep 8, 2024

e-Print:

2409.05207 [cs.LG]

View in:

ADS Abstract Service

reference search0 citations

Citations per year

0 Citations

Abstract: (arXiv)

This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. We demonstrate the strategy for implementing the multi-head attention, softmax, and normalization layer and evaluate three distinct models. Their deployment on VU13P FPGA chip achieved latency less than 2us, demonstrating the potential for real-time applications. HLS4ML compatibility with any TensorFlow-built transformer model further enhances the scalability and applicability of this work. Index Terms: FPGAs, machine learning, transformers, high energy physics, LIGO

References(24)

Figures(14)

[1]

LHC machine

L. Evans
,
P. Bryant

- JINST 3 (2008) S08
•
DOI:
- 10.1088/17480221/3/08/s08001

[2]

Advanced LIGO

LIGO Scientific

Collaboration

•

J. Aasi
(
- Caltech
)

et al.

- Class.Quant.Grav. 32 (2015) 074001
•
e-Print:
- 1411.4547
•
DOI:
- 10.1088/0264-9381/32/7/074001

[2]

[Online]. Available:

DOI:
- 10.1088/0264-9381/

[3]

Advanced Virgo: a second-generation interferometric gravitational wave detector

VIRGO

Collaboration

•

F. Acernese
(
- Salerno U. and
- INFN, Naples
)

et al.

- Class.Quant.Grav. 32 (2015) 2, 024001
•
e-Print:
- 1408.3978
•
DOI:
- 10.1088/0264-9381/32/2/024001

[3]

Advanced Virgo: a second-generation interferometric gravitational wave detector

VIRGO

Collaboration

•

F. Acernese
(
- Salerno U. and
- INFN, Naples
)

et al.

- Class.Quant.Grav. 32 (2015) 2, 024001
•
e-Print:
- 1408.3978
•
DOI:
- 10.1088/0264-9381/32/2/024001

[4]

Overview of KAGRA: Detector design and construction history

KAGRA

Collaboration

•

T. Akutsu
(
- Natl. Astron. Observ. of Japan
)

et al.

- PTEP 2021 (2021) 5, 05A101
•
e-Print:
- 2005.05574
•
DOI:
- 10.1093/ptep/ptaa125

[5]

Attention is all you need

A. Vaswani
,
N. Shazeer
,
N. Parmar
,
J. Uszkoreit
,
L. Jones

et al.

[6]

Ftrans: energy-efficient acceleration of transformers using fpga

B. Li
,
S. Pandey
,
H. Fang
,
Y. Lyv
,
J. Li

et al.

•
DOI:
- 10.1145/3370748.3406567

[6]

[Online]. Available: https: //doi.org/

B. Li
,
S. Pandey
,
H. Fang
,
Y. Lyv
,
J. Li

et al.

DOI:
- 10.1145/3370748.3406567

[7]

Accelerating Transformer Neural Networks on FPGAs for High Energy Physics Experiments

•
DOI:
- 10.1109/ICFPT56656.2022.9974463

[8]

Hardware acceleration of transformer networks using fpgas

G. Tzanos
,
C. Kachris
,
D. Soudris

•
DOI:
- 10.1109/PACET56979.2022.9976354

[9]

Hpta: A high performance transformer accelerator based on fpga

Y. Han
,
Q. Liu

•
DOI:
- 10.1109/FPL60245.2023.00012
•
- https://doi.ieeecomputersociety.org/10.1109/

[10]

Fast inference of deep neural networks in FPGAs for particle physics

et al.

- JINST 13 (2018) 07, P07027
•
e-Print:
- 1804.06913
•
DOI:
- 10.1088/1748-0221/13/07/P07027

[11]

Fast convolutional neural networks on FPGAs with hls4ml

Thea Aarrestad
(
- CERN
)
,
Vladimir Loncar
(
- CERN and
- Belgrade, Inst. Phys.
)
,
Nicolò Ghielmetti
(
- CERN and
- Belgrade, Inst. Phys.
)
,
Maurizio Pierini
(
- CERN
)
,
Sioni Summers
(
- CERN
)

et al.

- Mach.Learn.Sci.Tech. 2 (2021) 4, 045015
•
e-Print:
- 2101.05108
•
DOI:
- 10.1088/2632-2153/ac0ea1

[12]

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Elham E. Khoda
(
- Washington U., Seattle
)
,
Dylan Rankin
(
- MIT
)
,
Rafael Teixeira de Lima
(
- SLAC
)
,
Philip Harris
(
- MIT
)
,
Scott Hauck
(
- Washington U., Seattle
)

et al.

- Mach.Learn.Sci.Tech. 4 (2023) 2, 025004
•
e-Print:
- 2207.00559
•
DOI:
- 10.1088/2632-2153/acc0d7

[13]

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Zhiqiang Que
(
- Imperial Coll., London
)
,
Erwei Wang
(
- Imperial Coll., London
)
,
Umar Marikar
(
- Imperial Coll., London
)
,
Eric Moreno
(
- Caltech
)
,
Jennifer Ngadiuba
(
- Caltech
)

et al.

[14]

Graph Neural Networks for Charged Particle Tracking on FPGAs

Abdelrahman Elabd
(
- Pennsylvania U.
)
,
Vesal Razavimaleki
(
- UC, San Diego
)
,
Shi-Yu Huang
(
- Taiwan, Natl. Chiao Tung U.
)
,
Javier Duarte
(
- UC, San Diego
)
,
Markus Atkinson
(
- Illinois U., Urbana
)

et al.

- Front.Big Data 5 (2022) 828666,
- Front.Big Data 5 (2022) 828666
•
e-Print:
- 2112.02048
•
DOI:
- 10.3389/fdata.2022.828666

[15]

Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

H. Peng
,
S. Huang
,
T. Geng
,
A. Li
,
W. Jiang

et al.

•
DOI:
- 10.1109/ISQED51717.2021.9424344

[16]

Dfx: A low-latency multi-fpga appliance for accelerating transformer-based text generation

S. Hong
,
S. Moon
,
J. Kim
,
S. Lee
,
M. Kim

et al.

[17]

The ucr time series classification archive

H.A. Dau

[18]

Mc: Ttbar sample from the cms hep tutorial

- http://opendata.cern.ch/record/204

[19]

Omicron: a tool to characterize transient noise in gravitational-wave detectors

Florent Robinet
(
- IJCLab, Orsay
)
,
Nicolas Arnaud
(
- IJCLab, Orsay
)
,
Nicolas Leroy
(
- IJCLab, Orsay
)
,
Andrew Lundgren
(
- Portsmouth U., ICG
)
,
Duncan Macleod
(
- Cardiff U.
)

et al.

- SoftwareX 12 (2020) 100620
•
e-Print:
- 2007.11374
•
DOI:
- 10.1016/j.softx.2020.100620

[20]

Gwak: Gravitational-wave anomalous knowledge with recurrent autoencoders

R. Raikman

[21]

Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

A. Kuusela
,
S. Li
,
H. Zhuang
,
T. Aarrestad
,
V. Loncar

et al.