Clustering large number of extragalactic spectra of galaxies and quasars through canopies

Sep 15, 2013
21 pages
Published in:
  • Communication in Statistics - Theory and Methods (2013) 000
e-Print:

Citations per year

0 Citations
Abstract: (arXiv)
Cluster analysis is the distribution of objects into different groups or more precisely the partitioning of a data set into subsets (clusters) so that the data in subsets share some common trait according to some distance measure. Unlike classi cation, in clustering one has to rst decide the optimum number of clusters and then assign the objects into different clusters. Solution of such problems for a large number of high dimensional data points is quite complicated and most of the existing algorithms will not perform properly. In the present work a new clustering technique applicable to large data set has been used to cluster the spectra of 702248 galaxies and quasars having 1540 points in wavelength range imposed by the instrument. The proposed technique has successfully discovered ve clusters from this 702248X1540 data matrix.
  • [1]
    and Raina, C 1998. Scaling clustering algorithms to large data bases, Proceedings of the fourth International Conference on Knowledge Discovery and Data Mining(KDD-98),AAAI press
    • P.S. Bradley
      ,
    • U. Fayyad
    • [2]
      2013. Independent Component Analysis for the objective classification of the globular Clusters of the galaxy NGC 5128
      • A.K. Chattopadhyay
        ,
      • S. Mondal
        ,
      • T. Chattopadhyay
        • Comput.Stat.Data Anal. 57 17
    • [4]
      2012b. Independent Component Analysis for dimension reduction classification: Hough transform and CASH Algorithm,Astrostatistical Challenges for the New Astronomy Joseph M Hilbe (eds.) Ch. 9,pp.183-200
      • A.K. Chattopadhyay
      • [6]
        1994. Independent component analysis, A new concept?
        • P. Comon
          • Signal Processing 36 287
      • [7]
        and Xu, J.H.,2009.Median-based classifier for high dimensional data, Journal of American Statistical Association. 104(488), pp. 1597-1608
        • P. Hall
          ,
        • D.M. Titterington
        • [8]
          L.,2004. Sparce Proncipal Component Analysis, Tecnical Report, Standford Univ.epartment of Statistics(arxiv.org as e-print 0941.4392)
          • I.M. Johnston
            ,
          • Y. Arthur
          • [9]
            L.,2009.On consistency and sparsity for Principal Component Analysis in high dimension, Journal of American Statistical Association. 104(486), pp. 682-693
            • I.M. Johnston
              ,
            • Y. Arthur
            • [10]
              2003.A modified Principal Component Analysis based on the LASSO,Journal of Computational and Graphical
              • I.T. Jolliffe
                ,
              • N.T. Trendafilov
                ,
              • M. Uddin
                • Statistics 12 531
            • [11]
              2000. Efficient clustering of high-dimensional data sets with application to reference maching,Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
              • McCallum A
              • [12]
                1967. Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.pp. 281 - 297
                • J.B. MacQueen
                • [13]
                  1980. An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms, Psychometrica. 45,pp. 325342
                  • G.W Milligan
                  • [14]
                    2008. New Routes from Minimal Approximation Error to Principal Components,Neural Processing Letters. 27(3)pp
                    • A.A. Miranda
                    • [16]
                      2008. A multople index model and dimension reduction,Journal of American Statistical Association. 103(484), pp. 1631-1640
                      • Y. Xia
                      • [17]
                        and Titterington,R 2006.Sparse Principal Component Analysis,Journal of Computational and Graphical
                        • H. Zou
                          ,
                        • T. Hastie
                          • Statistics 15 265286