AwkwardForth: accelerating Uproot with an internal DSL
Feb 24, 202111 pages
Published in:
- EPJ Web Conf. 251 (2021) 03002
- Published: 2021
e-Print:
- 2102.13516 [cs.PL]
View in:
Citations per year
Abstract: (EDP Sciences)
File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10–80× faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.Note:
- 11 pages, 2 figures, submitted to the 25th International Conference on Computing in High Energy & Nuclear Physics
- structure
References(19)
Figures(5)