Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach

Xu, Yang; Aggarwal, Vaneet

Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach

Jan 27, 2025

e-Print:

2501.16243 [quant-ph]

View in:

ADS Abstract Service

pdf

reference search0 citations

Citations per year

0 Citations

Abstract: (arXiv)

We address the problem of quantum reinforcement learning (QRL) under model-free settings with quantum oracle access to the Markov Decision Process (MDP). This paper introduces a Quantum Natural Policy Gradient (QNPG) algorithm, which replaces the random sampling used in classical Natural Policy Gradient (NPG) estimators with a deterministic gradient estimation approach, enabling seamless integration into quantum systems. While this modification introduces a bounded bias in the estimator, the bias decays exponentially with increasing truncation levels. This paper demonstrates that the proposed QNPG algorithm achieves a sample complexity of

\tilde{\mathcal{O}}(\epsilon^{-1.5})

for queries to the quantum oracle, significantly improving the classical lower bound of

\tilde{\mathcal{O}}(\epsilon^{-2})

for queries to the MDP.

References(510)

Figures(0)

Alekh Agarwal

Sham M. Kakade
,
Jason D. Lee

Research, pages 64-66. PMLR, 09-12 Jul

Alekh Agarwal

Sham M. Kakade
,
Jason D. Lee

1-76

Arnob Ghosh, and Vaneet Aggarwal. Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning

Abubakr O. Al-Abbasi

Qinbo Bai, Amrit Singh Bedi, and Vaneet Aggarwal. Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6737-6744

Jonathan Baxter and Peter L Bartlett. Infinite-horizon policy-gradient estimation

- J.Artif.Intell.Res. 15 (2001) 319-350

Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain Tapp. Quantum amplitude amplification and estimation

- Contemp.Math. 305 (2002) 53-74

Balthazar Casalé, Giuseppe Di Molfetta, Hachem Kadri, and Liva Ralaivola. Quantum bandits. Quantum

Machine Intelligence, 2:1-7

Arjan Cornelissen, Yassine Hamoudi, and Sofiene Jerbi. Near-optimal quantum algorithms for multivariate mean estimation. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages

33-43

Vedran Dunjko

Jacob M. Taylor
,
Hans J. Briegel

International Conference on Systems, Man, and Cybernetics (SMC), pages 282-287

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

e-Print:
- 2310.11684

Glebys Gonzalez, Mythra Balakuntala, Mridul Agarwal, Tomas Low, Bruce Knoth

Andrew W. Kirkpatrick

[1]

:66-78

Creating superpositions that correspond to efficiently integrable probability distributions

e-Print:
- quant-ph/0208112

A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 212-219

Lov K. Grover

Yassine Hamoudi. Quantum sub-gaussian mean estimator. In 29th Annual European Symposium on Algorithms (ESA). Schloss Dagstuhl-Leibniz-Zentrum für Informatik

Sofiene Jerbi

Lea M. Trenkwalder

- PRX Quantum 2 (2021) 010328

Sofiene Jerbi, Arjan Cornelissen, Maris Ozols, and Vedran Dunjko. Quantum policy gradient algorithms

In 18th Conference on the Theory of Quantum Computation, Communication and Cryptography

Yanli Liu, Kaiqing Zhang, Tamer Basar, and Wotao Yin. An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods

- Adv.Neural Inf.Process.Syst. 33 7624

1-25 of 510
1
2
3
4
5
•••
21
25 / page