A Tutorial on Bayesian Optimization

Jul 8, 2018
e-Print:

Citations per year

2018202020222024202505101520
Abstract: (submitter)
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
  • harmless

    • M.O. Ahmed
      ,
    • B. Shahriari
      ,
    • M. Schmidt
  • first-order

    • M.O. Ahmed
      ,
    • B. Shahriari
      ,
    • M. Schmidt
  • Statistical Decision Theory and Bayesian Analysis
    • J.O. Berger
  • Media
    • Multidimensional stochastic approximation methods. The Annals of Mathematical
      • J.R. Blum
    • Statistics, pages 737-744
      • A rigorous framework for optimization of expensive functions by surrogates. Structural and Multidisciplinary Optimization, 17(1):1-13
        • A. Booker
          ,
        • J. Dennis
          ,
        • P. Frank
          ,
        • D. Serafini
          ,
        • V. Torczon
        et al.
      • Stochastic gradient descent tricks. In Montavon, G., Orr, G. B
        • L. Bottou
      • A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning
        • E. Brochu
          ,
        • M. Cora
          ,
        • N. de Freitas
      • Convergence rates of efficient global optimization algorithms. Journal of Machine
        • A.D. Bull
      • Learning Research, 12(Oct):2879-2904
        • Average performance of a class of adaptive algorithms for global optimization. The
          • J. Calvin
            • Annals Appl.Probab. 7 711-730
        • and Žilinskas, A. . One-dimensional global optimization for observations with noise
          • J. Calvin
        • Computers & Mathematics with Applications, 50(1-2):157-169
          • and Žilinskas, A. . On the convergence of the P-algorithm for one-dimensional global optimization of smooth functions
            • J. Calvin
              • J.Optim.Th.Appl. 102 (1999) 479-495
          • and Žilinskas, A. . One-dimensional P-algorithm with convergence rate O(n-3+δ) for smooth functions
            • J. Calvin
              • J.Optim.Th.Appl. 106 (2000) 297-307
          • Multi-step Bayesian optimization for onedimensional feasibility determination. arXiv preprint
            • J.M. Cashore
              ,
            • L. Kumarga
              ,
            • P.I. Frazier
          • Design and analysis of robust total joint replacements: finite element model experiments with environmental variables. Journal of Biomechanical Engineering, 123(3):239-246
            • P.B. Chang
              ,
            • B.J. Williams
              ,
            • K.S.B. Bhalla
              ,
            • T.W. Belknap
              ,
            • T.J. Santner
            et al.
          • New two-stage and sequential procedures for selecting the best simulated system. Operations
            • S.E. Chick
              ,
            • K. Inoue
              • Research 49 (2001) 732-743
          • The greatest of a finite set of random variables. Operations
            • C.E. Clark
              • Research 9 (1961) 145-162
          • Elements of Information Theory. John
            • T.M. Cover
              ,
            • J.A. Thomas
          • Controlled Markov Processes
            • E. Dynkin
              ,
            • A. Yushkevich
          • Sóbester, A., and Keane, A. . Engineering Design via Surrogate Modelling: A
            • A. Forrester