Feature importance for machine learning redshifts applied to SDSS galaxies

Oct 17, 2014
9 pages
Published in:
  • Mon.Not.Roy.Astron.Soc. 449 (2015) 2, 1275-1283
  • Published: May 11, 2015
e-Print:

Citations per year

201520182021202420250246810
Abstract: (Oxford University Press)
We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Decision Trees with the ensemble learning routine adaboost (hereafter RDF). We select a list of 85 easily measured (or derived) photometric quantities (or ‘features’) and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNNs) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18 per cent and decreases the catastrophic outlier rate by 32 per cent. We further compare the redshift estimate using RDF with those from two different aNNs, and with photometric redshifts available from the Sloan Digital Sky Survey (SDSS). We find that the RDF requires orders of magnitude less computation time than the aNNs to obtain a machine learning redshift while reducing both the catastrophic outlier rate by up to 43 per cent, and the redshift error by up to 25 per cent. When compared to the SDSS photometric redshifts, the RDF machine learning redshifts both decreases the standard deviation of residuals scaled by 1/(1+z) by 36 per cent from 0.066 to 0.041, and decreases the fraction of catastrophic outliers by 57 per cent from 2.32 to 0.99 per cent.
Note:
  • 10 pages, 4 figures, updated to match version accepted in MNRAS
  • catalogues
  • surveys
  • galaxies: distances and redshifts