In this paper, a meta-reinforcement learning approach is investigated to design an adaptive guidance algorithm capable of carrying out multiple rendezvous space missions. Specifically, both a standard fully-connected network and a recurrent neural network are trained by proximal policy optimization on a wide distribution of finite-thrust rendezvous transfers between circular co-planar orbits. The recurrent network is also provided with the control and reward at the previous simulation step, thus allowing it to build, thanks to its history-dependent state, an internal representation of the considered task distribution. The ultimate goal is to generate a model which could adapt to unseen tasks and produce a nearly-optimal guidance law along any transfer leg of a multi-target mission. As a first step towards the solution of a complete multi-target problem, a sensitivity analysis on the single rendezvous leg is carried out in this paper, by varying the radius either of the initial or the final orbit, the transfer time, and the initial phasing between the chaser and the target. Numerical results show that the recurrent-network-based meta-reinforcement learning approach is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to meet, with greater accuracy, the terminal rendezvous condition, even when considering problem instances that fall outside the original training domain.
Meta-reinforcement learning for adaptive spacecraft guidance during finite-thrust rendezvous missions / Federici, Lorenzo; Scorsoglio, Andrea; Zavoli, Alessandro; Furfaro, Roberto. - In: ACTA ASTRONAUTICA. - ISSN 0094-5765. - 201:(2022), pp. 129-141. [10.1016/j.actaastro.2022.08.047]
Meta-reinforcement learning for adaptive spacecraft guidance during finite-thrust rendezvous missions
Federici, Lorenzo
;Zavoli, Alessandro;
2022
Abstract
In this paper, a meta-reinforcement learning approach is investigated to design an adaptive guidance algorithm capable of carrying out multiple rendezvous space missions. Specifically, both a standard fully-connected network and a recurrent neural network are trained by proximal policy optimization on a wide distribution of finite-thrust rendezvous transfers between circular co-planar orbits. The recurrent network is also provided with the control and reward at the previous simulation step, thus allowing it to build, thanks to its history-dependent state, an internal representation of the considered task distribution. The ultimate goal is to generate a model which could adapt to unseen tasks and produce a nearly-optimal guidance law along any transfer leg of a multi-target mission. As a first step towards the solution of a complete multi-target problem, a sensitivity analysis on the single rendezvous leg is carried out in this paper, by varying the radius either of the initial or the final orbit, the transfer time, and the initial phasing between the chaser and the target. Numerical results show that the recurrent-network-based meta-reinforcement learning approach is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to meet, with greater accuracy, the terminal rendezvous condition, even when considering problem instances that fall outside the original training domain.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.