Autonomous Rendezvous and Proximity Operations (ARPOD) for CubeSats and small satellites are becoming increasingly important for on-orbit servicing, formation flying, and debris mitigation. These operations present unique challenges: the spacecraft must navigate complex dynamics, operate under strict safety constraints, and adapt to uncertain environmental conditions. Traditional control techniques, such as Linear Quadratic Regulators (LQR) and Model Predictive Control (MPC) offer stability and theoretical guarantees but struggle when exposed to unmodeled dynamics, external disturbances, or actuator faults. By contrast, Reinforcement Learning (RL) offers flexibility and adaptability to nonlinear and partially observable environments. Still, its exploration-driven learning often leads to unsafe behaviours, especially in the critical phases of approach and docking, where collisions, misalignments or loss of control could jeopardise the mission. In this work, we propose the use of Safe Reinforcement Learning (Safe RL) to address the safety-performance trade-off in autonomous RPOD operations. We extend the Soft Actor-Critic (SAC) framework with an auxiliary Safety Critic, which estimates the probability that a given state-action pair will lead to a violation of safety constraints, such as collision risk, excessive relative velocity, or orientation misalignment that could lead to a failure of the mission. By incorporating this Safety Critic, the agent can actively avoid unsafe actions during both training and execution, ensuring safe proximity operations even when adapting to new conditions or system dynamics. The Safe RL approach is applied to a simulated SE(2) planar RPOD scenario, where a 12U CubeSat performs autonomous approach and docking manoeuvres with a target. We compare the Safe RL agent’s performance to both classical controllers and standard deep reinforcement learning agents. This work highlights the potential of Safe Reinforcement Learning as a critical enabler for next-generation CubeSat autonomy, combining the flexibility of learning-based approaches with the robustness and operational safety demanded by space missions. Future work will explore the extension of the Safety Critic concept to incorporate real-time onboard learning for adaptive anomaly handling.

Safe Reinforcement Learning for Autonomous Rendezvous and Proximity Operations: Balancing Safety and Performance in CubeSat Docking Missions / Rotondi, S.; Becchetti, V.; Carletta, S.; Giuseppi, A.. - 2-:(2025), pp. 1315-1327. ( 2025 IAF Astrodynamics Symposium at the 76th International Astronautical Congress, IAC 2025 aus ) [10.52202/083087-0116].

Safe Reinforcement Learning for Autonomous Rendezvous and Proximity Operations: Balancing Safety and Performance in CubeSat Docking Missions

Becchetti V.;Carletta S.;Giuseppi A.
2025

Abstract

Autonomous Rendezvous and Proximity Operations (ARPOD) for CubeSats and small satellites are becoming increasingly important for on-orbit servicing, formation flying, and debris mitigation. These operations present unique challenges: the spacecraft must navigate complex dynamics, operate under strict safety constraints, and adapt to uncertain environmental conditions. Traditional control techniques, such as Linear Quadratic Regulators (LQR) and Model Predictive Control (MPC) offer stability and theoretical guarantees but struggle when exposed to unmodeled dynamics, external disturbances, or actuator faults. By contrast, Reinforcement Learning (RL) offers flexibility and adaptability to nonlinear and partially observable environments. Still, its exploration-driven learning often leads to unsafe behaviours, especially in the critical phases of approach and docking, where collisions, misalignments or loss of control could jeopardise the mission. In this work, we propose the use of Safe Reinforcement Learning (Safe RL) to address the safety-performance trade-off in autonomous RPOD operations. We extend the Soft Actor-Critic (SAC) framework with an auxiliary Safety Critic, which estimates the probability that a given state-action pair will lead to a violation of safety constraints, such as collision risk, excessive relative velocity, or orientation misalignment that could lead to a failure of the mission. By incorporating this Safety Critic, the agent can actively avoid unsafe actions during both training and execution, ensuring safe proximity operations even when adapting to new conditions or system dynamics. The Safe RL approach is applied to a simulated SE(2) planar RPOD scenario, where a 12U CubeSat performs autonomous approach and docking manoeuvres with a target. We compare the Safe RL agent’s performance to both classical controllers and standard deep reinforcement learning agents. This work highlights the potential of Safe Reinforcement Learning as a critical enabler for next-generation CubeSat autonomy, combining the flexibility of learning-based approaches with the robustness and operational safety demanded by space missions. Future work will explore the extension of the Safety Critic concept to incorporate real-time onboard learning for adaptive anomaly handling.
2025
2025 IAF Astrodynamics Symposium at the 76th International Astronautical Congress, IAC 2025
Autonomous rendezvous; Cubesat; Formation flying; On-orbit-servicing; Reinforcement Learning
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Safe Reinforcement Learning for Autonomous Rendezvous and Proximity Operations: Balancing Safety and Performance in CubeSat Docking Missions / Rotondi, S.; Becchetti, V.; Carletta, S.; Giuseppi, A.. - 2-:(2025), pp. 1315-1327. ( 2025 IAF Astrodynamics Symposium at the 76th International Astronautical Congress, IAC 2025 aus ) [10.52202/083087-0116].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1767032
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact