Scaling MadMiner with a deployment on REANA

Apr 12, 2023
6 pages
Contribution to:
e-Print:

Citations per year

20232024202501
Abstract: (arXiv)
MadMiner is a Python package that implements a powerful family of multivariate inference techniques that leverage matrix element information and machine learning. This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the underlying physics or detector response. In this paper, we address some of the challenges arising from deploying MadMiner in a real-scale HEP analysis with the goal of offering a new tool in HEP that is easily accessible. The proposed approach encapsulates a typical MadMiner pipeline into a parametrized yadage workflow described in YAML files. The general workflow is split into two yadage sub-workflows, one dealing with the physics simulations and the other with the ML inference. After that, the workflow is deployed using REANA, a reproducible research data analysis platform that takes care of flexibility, scalability, reusability, and reproducibility features. To test the performance of our method, we performed scaling experiments for a MadMiner workflow on the National Energy Research Scientific Computer (NERSC) cluster with an HT-Condor back-end. All the stages of the physics sub-workflow had a linear dependency between resources or wall time and the number of events generated. This trend has allowed us to run a typical MadMiner workflow, consisting of 11M events, in 5 hours compared to days in the original study.
Note:
  • To be published in proceedings of 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research
  • scaling
  • statistics
  • family
  • machine learning
  • cluster
  • computer
  • performance
  • neural network
  • statistical analysis
  • data analysis method
  • [1]
    Irina Espejo, Sinclert Perez and Kyle Cranmer. MadMiner workflow
  • [2]
    Irina Espejo, Sinclert Perez and Kyle Cranmer. MadMiner workflow
  • [3]
    Irina Espejo, Sinclert Perez and Kyle Cranmer. MadMiner workflow
  • [4]
    Irina Espejo, Sinclert Perez and Kyle Cranmer. MadMiner workflow
  • [5]
    Johann Brehmer, Kyle Cranmer, Irina Espejo, Felix Kling, Gilles Louppe, and Juan Pavez. Effective LHC measurements with matrix elements and machine learning
    • [6]
      Johann Brehmer, Kyle Cranmer, Gilles Louppe, and Juan
      • Pavez. A
        • Machine Learning 98 052004
    • [7]
      Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer. MadMiner: Machine Learning-Based Inference for Particle
        • Physics 4 3
    • [8]
      Johann Brehmer, Gilles Louppe, Juan Pavez, and Kyle Cranmer. Mining gold from implicit models to improve likelihood-free inference. 117(10):5242-5249
      • [9]
        HEP
        • ML Community. A
        • [10]
          Kyle Cranmer and Lukas Heinrich. Yadage and Packtivity - analysis preservation using parametrized workflows. 898:102019
          • [11]
            Monte Carlo Methods of Inference for Implicit Statistical Models. 46(2):193-227
            • Peter J. Diggle
              ,
            • Richard J. Gratton
            • [12]
              Bayesianly Justifiable and Relevant Frequency Calculations for the Applied
              • Donald B. Rubin
                • Statistician 12 1151-1172
            • [13]
              Lukas Heinrich, Harri Hirvonsalo, Dinos Kousidis, and Diego Rodríguez. REANA: A System for Reusable
              • Tibor ˇ. Simko