Maximum Entropy competes with Maximum Likelihood

Dec 17, 2020
11 pages
e-Print:

Citations per year

0 Citations
Abstract: (arXiv)
Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there exists a well-defined prior Dirichlet density for unknown probabilities, and that the average Kullback-Leibler (KL) distance can be employed for deciding on the quality and applicability of various estimators. These allow to evaluate the relevance of various MAXENT constraints, check its general applicability, and compare MAXENT with estimators having various degrees of dependence on the prior, viz. the regularized maximum likelihood (ML) and the Bayesian estimators. We show that MAXENT applies in sparse data regimes, but needs specific types of prior information. In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.
Note:
  • 11 pages
  • [2]
    From Microphysics to Macrophysics: Methods and Applications of Statistical Physics. Volumes I / II / Science & Business Media,)
    • R. Balian
  • [3]
    Principles of maximum entropy and maximum caliber in statistical physics
    • S. Press
      ,
    • K. Ghosh
      ,
    • J. Lee
      ,
    • K.A. Dill
      • Rev.Mod.Phys. 85 (2013) 1115
  • [4]
    Maximum-Entropy and Bayesian Methods in Science and Engineering: Volume 2: Applications, Vol. 31 / Science & Business Media,)
    • G. Erickson
      ,
    • C.R. Smith
  • [5]
    Shannon entropy: axiomatic characterization and application, International Journal of Mathematics and Mathematical Sciences
    • C. Chakrabarti
      ,
    • I. Chakrabarty
  • [6]
    A characterization of entropy in terms of information loss
    • J.C. Baez
      ,
    • T. Fritz
      ,
    • T. Leinster
      • Entropy 13 (2011) 1945
  • [7]
    Maximum entropy and conditional probability
    • J. Van Campenhout
      ,
    • T. Cover
      • IEEE Trans.Info.Theor. 27 (1981) 483
  • [8]
    Information-theoretical optimization techniques
    • F. Topsøe
      • Kybernetika 15 (1979) 8
  • [9]
    Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy
    • J. Shore
      ,
    • R. Johnson
      • IEEE Trans.Info.Theor. 26 (1980) 26
  • [10]
    In defense of the maximum entropy inference process, International Journal of approximate reasoning 17, 77
    • J. Paris
      ,
    • A. Vencovsk
  • [11]
    Theoretical statistics (CRC Press,)
    • D.R. Cox
      ,
    • D.V. Hinkley
  • [12]
    Some statistical methods in machine intelligence research
    • I. Good
      • Math.Biosci. 6 (1970) 185
  • [13]
    Entropy minimax multivariate statistical modeling-i: Theory, International Journal Of General System 11, 231
    • R. Christensen
  • [14]
    Minimax entropy principle and its application to texture modeling
    • S.C. Zhu
      ,
    • Y.N. Wu
      ,
    • D. Mumford
      • Neural Comput. 9 (1997) 1627
  • [15]
    in
    • G. Pandey
      ,
    • A. Dukkipati
  • [15]
    International Symposium on Information Theory / pp. 1521-1525
    • G. Pandey
      ,
    • A. Dukkipati
  • [16]
    A generalized maximum entropy principle, Operations Research 27, 1188
    • M.U. Thomas
  • [17]
    in Advances in neural information processing systems pp. 447-454
    • G. Lebanon
      ,
    • J.D. Lafferty
  • [18]
    Maximum entropy models with inequality constraints: A case study on text categorization
    • J. Kazama
      ,
    • J. Tsujii
      • Machine Learning 60 (2005) 159
  • [19]
    in International Conference on Computational Learning Theory / pp. 139-153
    • Y. Altun
      ,
    • A. Smola
  • [20]
    Maximum entropy density estimation and modeling geographic distributions of species. phd dissertation presented to the princeton university, (.)
    • M. Dudik
  • [21]
    Inferring the gibbs state of a small quantum system
    • J. Rau
      • Phys.Rev.A 84 (2011) 012101
  • [22]
    Minimum cross-entropy estimation with inaccurate side information
    • L.L. Campbell
      • IEEE Trans.Info.Theor. 45 (1999) 2650
  • [23]
    On minimizing distortion and relative entropy
    • M.P. Friedlander
      ,
    • M.R. Gupta
      • IEEE Trans.Info.Theor. 52 (2005) 238
  • [24]
    Introduction to the dirichlet distribution and related processes, Department of Electrical Engineering, University of Washignton, UWEETR--0006, 1
    • B.A. Frigyik
      ,
    • A. Kapila
      ,
    • M.R. Gupta