Penalized Splines for Smooth Representation of High-dimensional Monte Carlo Datasets
Jan, 2013
8 pages
Published in:
- Comput.Phys.Commun. 184 (2013) 2214-2220
e-Print:
- 1301.2184 [physics.data-an]
View in:
Citations per year
Abstract: (Elsevier)
Detector response to a high-energy physics process is often estimated by Monte Carlo simulation. For purposes of data analysis, the results of this simulation are typically stored in large multi-dimensional histograms, which can quickly become both too large to easily store and manipulate and numerically problematic due to unfilled bins or interpolation artifacts. We describe here an application of the penalized spline technique (Marx and Eilers, 1996) [1] to efficiently compute B-spline representations of such tables and discuss aspects of the resulting B-spline fits that simplify many common tasks in handling tabulated Monte Carlo data in high-energy physics analysis, in particular their use in maximum-likelihood fitting. Program summary: Program title: Photospline Catalogue identifier: AEPK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEPK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: 2-clause BSD No. of lines in distributed program, including test data, etc.: 9723 No. of bytes in distributed program, including test data, etc.: 156138 Distribution format: tar.gz Programming language: C, Python Computer: 32- and 64-bit x86, 32- and 64-bit PowerPC Operating system: Linux, Mac OS X, FreeBSD Has the code been vectorized or parallelized?: Both RAM: Approximately proportional to number of knots used in fitting, depends on problem condition Classification: 4.9 External routines: SuiteSparse (http://www.cise.ufl.edu/research/sparse/SuiteSparse/), Python (http://www.python.org/), BLAS (http://www.netlib.org/blas/), Numpy (http://www.numpy.org/) Nature of problem: An algorithm to smoothly represent histograms, including mathematical operations and convolutions. Using histograms of Monte Carlo simulation for likelihood fitting can be unstable due to binning artifacts from statistical fluctuations and hard bin-to-bin transitions. This package provides a toolkit for using penalized spline fits on extremely large multi-dimensional datasets to reduce or eliminate such issues. Solution method: Uses sparse matrix operations, non-negative least-squares fitting, and generalized linear array models in conjunction with a number of other algorithms to allow fits to be made, manipulated, and saved with very low computational requirements. This enables even very large problems to be solved on commercially available machines. Restrictions: Required computation time and memory increase very rapidly with the number of dimensions. Fits without stacking involving more than 5 dimensions and 20 knots on each are usually not practical on 2012-era hardware. Running time: Roughly proportional to the cube of the number of knots used, depends strongly on conditioning of the problem.Note:
- 8 pages, 6 figures, submitted to Computer Physics Communications, program source code included in photospline.tgz
- Splines
- Monte Carlo
- Histograms
- Maximum likelihood
- numerical calculations: Monte Carlo
- numerical methods
- data analysis method
- statistics
References(17)
Figures(0)
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]