Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach

Jan 27, 2025
e-Print:

Citations per year

0 Citations
Abstract: (arXiv)
We address the problem of quantum reinforcement learning (QRL) under model-free settings with quantum oracle access to the Markov Decision Process (MDP). This paper introduces a Quantum Natural Policy Gradient (QNPG) algorithm, which replaces the random sampling used in classical Natural Policy Gradient (NPG) estimators with a deterministic gradient estimation approach, enabling seamless integration into quantum systems. While this modification introduces a bounded bias in the estimator, the bias decays exponentially with increasing truncation levels. This paper demonstrates that the proposed QNPG algorithm achieves a sample complexity of O~(ϵ1.5)\tilde{\mathcal{O}}(\epsilon^{-1.5}) for queries to the quantum oracle, significantly improving the classical lower bound of O~(ϵ2)\tilde{\mathcal{O}}(\epsilon^{-2}) for queries to the MDP.
  • Alekh Agarwal
    • Sham M. Kakade
      ,
    • Jason D. Lee
    • Research, pages 64-66. PMLR, 09-12 Jul
    • Alekh Agarwal
      • Sham M. Kakade
        ,
      • Jason D. Lee
      • 1-76
      • Arnob Ghosh, and Vaneet Aggarwal. Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning
        • Abubakr O. Al-Abbasi
      • Qinbo Bai, Amrit Singh Bedi, and Vaneet Aggarwal. Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6737-6744
      • Jonathan Baxter and Peter L Bartlett. Infinite-horizon policy-gradient estimation
          • J.Artif.Intell.Res. 15 (2001) 319-350
      • Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain Tapp. Quantum amplitude amplification and estimation
          • Contemp.Math. 305 (2002) 53-74
      • Balthazar Casalé, Giuseppe Di Molfetta, Hachem Kadri, and Liva Ralaivola. Quantum bandits. Quantum
        • Machine Intelligence, 2:1-7
        • Arjan Cornelissen, Yassine Hamoudi, and Sofiene Jerbi. Near-optimal quantum algorithms for multivariate mean estimation. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages
          • 33-43
          • Vedran Dunjko
            • Jacob M. Taylor
              ,
            • Hans J. Briegel
          • International Conference on Systems, Man, and Cybernetics (SMC), pages 282-287
          • Glebys Gonzalez, Mythra Balakuntala, Mridul Agarwal, Tomas Low, Bruce Knoth
            • Andrew W. Kirkpatrick
            • [1]
              :66-78
            • A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 212-219
              • Lov K. Grover
            • Yassine Hamoudi. Quantum sub-gaussian mean estimator. In 29th Annual European Symposium on Algorithms (ESA). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
            • Sofiene Jerbi
              • Lea M. Trenkwalder
                • PRX Quantum 2 (2021) 010328
            • Sofiene Jerbi, Arjan Cornelissen, Maris Ozols, and Vedran Dunjko. Quantum policy gradient algorithms
              • In 18th Conference on the Theory of Quantum Computation, Communication and Cryptography
              • Yanli Liu, Kaiqing Zhang, Tamer Basar, and Wotao Yin. An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods
                  • Adv.Neural Inf.Process.Syst. 33 7624