Space Trajectory Planning with a General Reinforcement-Learning Algorithm

Space trajectory planning is a complex combinatorial problem that requires selecting discrete sequences of celestial bodies while simultaneously optimizing continuous transfer parameters. Traditional optimization methods struggle with the increasing computational complexity as the number of possible...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrea Forestieri, Lorenzo Casalino
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Aerospace
Subjects:
Online Access:https://www.mdpi.com/2226-4310/12/4/352
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Space trajectory planning is a complex combinatorial problem that requires selecting discrete sequences of celestial bodies while simultaneously optimizing continuous transfer parameters. Traditional optimization methods struggle with the increasing computational complexity as the number of possible targets grows. This paper presents a novel reinforcement-learning algorithm, inspired by AlphaZero, designed to handle hybrid discrete–continuous action spaces without relying on discretization. The proposed framework integrates Monte Carlo Tree Search with a neural network to efficiently explore and optimize space trajectories. While developed for space trajectory planning, the algorithm is broadly applicable to any problem involving hybrid action spaces. Applied to the Global Trajectory Optimization Competition XI problem, the method achieves competitive performance, surpassing state-of-the-art results despite limited computational resources. These results highlight the potential of reinforcement learning for autonomous space mission planning, offering a scalable and cost-effective alternative to traditional trajectory optimization techniques. Notably, all experiments were conducted on a single workstation, demonstrating the feasibility of reinforcement learning for practical mission planning. Moreover, the self-play approach used in training suggests that even stronger solutions could be achieved with increased computational resources.
ISSN:2226-4310