Autonomous Rendezvous and Proximity Operations (ARPOD) for CubeSats and small satellites are becoming increasingly important for on-orbit servicing, formation flying, and debris mitigation. These operations present unique challenges: the spacecraft must navigate complex dynamics, operate under strict safety constraints, and adapt to uncertain environmental conditions. Traditional control techniques, such as Linear Quadratic Regulators (LQR) and Model Predictive Control (MPC) offer stability and theoretical guarantees but struggle when exposed to unmodeled dynamics, external disturbances, or actuator faults. By contrast, Reinforcement Learning (RL) offers flexibility and adaptability to nonlinear and partially observable environments. Still, its exploration-driven learning often leads to unsafe behaviours, especially in the critical phases of approach and docking, where collisions, misalignments or loss of control could jeopardise the mission. In this work, we propose the use of Safe Reinforcement Learning (Safe RL) to address the safety-performance trade-off in autonomous RPOD operations. We extend the Soft Actor-Critic (SAC) framework with an auxiliary Safety Critic, which estimates the probability that a given state-action pair will lead to a violation of safety constraints, such as collision risk, excessive relative velocity, or orientation misalignment that could lead to a failure of the mission. By incorporating this Safety Critic, the agent can actively avoid unsafe actions during both training and execution, ensuring safe proximity operations even when adapting to new conditions or system dynamics. The Safe RL approach is applied to a simulated SE(2) planar RPOD scenario, where a 12U CubeSat performs autonomous approach and docking manoeuvres with a target. We compare the Safe RL agent’s performance to both classical controllers and standard deep reinforcement learning agents. This work highlights the potential of Safe Reinforcement Learning as a critical enabler for next-generation CubeSat autonomy, combining the flexibility of learning-based approaches with the robustness and operational safety demanded by space missions. Future work will explore the extension of the Safety Critic concept to incorporate real-time onboard learning for adaptive anomaly handling.
Safe Reinforcement Learning for Autonomous Rendezvous and Proximity Operations: Balancing Safety and Performance in CubeSat Docking Missions / Rotondi, S.; Becchetti, V.; Carletta, S.; Giuseppi, A.. - 2-:(2025), pp. 1315-1327. ( 2025 IAF Astrodynamics Symposium at the 76th International Astronautical Congress, IAC 2025 aus ) [10.52202/083087-0116].
Safe Reinforcement Learning for Autonomous Rendezvous and Proximity Operations: Balancing Safety and Performance in CubeSat Docking Missions
Becchetti V.;Carletta S.;Giuseppi A.
2025
Abstract
Autonomous Rendezvous and Proximity Operations (ARPOD) for CubeSats and small satellites are becoming increasingly important for on-orbit servicing, formation flying, and debris mitigation. These operations present unique challenges: the spacecraft must navigate complex dynamics, operate under strict safety constraints, and adapt to uncertain environmental conditions. Traditional control techniques, such as Linear Quadratic Regulators (LQR) and Model Predictive Control (MPC) offer stability and theoretical guarantees but struggle when exposed to unmodeled dynamics, external disturbances, or actuator faults. By contrast, Reinforcement Learning (RL) offers flexibility and adaptability to nonlinear and partially observable environments. Still, its exploration-driven learning often leads to unsafe behaviours, especially in the critical phases of approach and docking, where collisions, misalignments or loss of control could jeopardise the mission. In this work, we propose the use of Safe Reinforcement Learning (Safe RL) to address the safety-performance trade-off in autonomous RPOD operations. We extend the Soft Actor-Critic (SAC) framework with an auxiliary Safety Critic, which estimates the probability that a given state-action pair will lead to a violation of safety constraints, such as collision risk, excessive relative velocity, or orientation misalignment that could lead to a failure of the mission. By incorporating this Safety Critic, the agent can actively avoid unsafe actions during both training and execution, ensuring safe proximity operations even when adapting to new conditions or system dynamics. The Safe RL approach is applied to a simulated SE(2) planar RPOD scenario, where a 12U CubeSat performs autonomous approach and docking manoeuvres with a target. We compare the Safe RL agent’s performance to both classical controllers and standard deep reinforcement learning agents. This work highlights the potential of Safe Reinforcement Learning as a critical enabler for next-generation CubeSat autonomy, combining the flexibility of learning-based approaches with the robustness and operational safety demanded by space missions. Future work will explore the extension of the Safety Critic concept to incorporate real-time onboard learning for adaptive anomaly handling.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


