This paper presents a meta-reinforcement learning approach to the robust and autonomous waypoint guidance of a six-rotor unmanned aerial vehicle in Mars' atmosphere. The meta-learning is implemented by using a recurrent neural network as a control policy to map data about the hexacopter state provided by onboard sensors to the six rotor angular speeds. The network is trained by proximal policy optimization, a state-of-the-art policy gradient reinforcement learning algorithm. During the training, the network is also provided with information about the previous control output and reward, to improve the policy adaptability to different environment instances. Several mission scenarios, involving uncertainties on Mars' atmosphere's properties, the presence of random wind gusts, and Gaussian noise on the collected sensor data, are investigated to assess the robustness of the proposed approach in realistic operative conditions. The flexibility and performance of meta-reinforcement learning are also compared against standard reinforcement learning with a fully-connected neural network, to better highlight the potential of the proposed methodology in real-world autonomous guidance applications.
Robust Waypoint Guidance of a Hexacopter on Mars using Meta-Reinforcement Learning / Federici, Lorenzo; Furfaro, Roberto; Zavoli, Alessandro; De Matteis, Guido. - (2023). (Intervento presentato al convegno AIAA SciTech Forum and Exposition, 2023 tenutosi a National Harbor, MD (USA)) [10.2514/6.2023-2663].
Robust Waypoint Guidance of a Hexacopter on Mars using Meta-Reinforcement Learning
Federici, Lorenzo;Zavoli, Alessandro;De Matteis, Guido
2023
Abstract
This paper presents a meta-reinforcement learning approach to the robust and autonomous waypoint guidance of a six-rotor unmanned aerial vehicle in Mars' atmosphere. The meta-learning is implemented by using a recurrent neural network as a control policy to map data about the hexacopter state provided by onboard sensors to the six rotor angular speeds. The network is trained by proximal policy optimization, a state-of-the-art policy gradient reinforcement learning algorithm. During the training, the network is also provided with information about the previous control output and reward, to improve the policy adaptability to different environment instances. Several mission scenarios, involving uncertainties on Mars' atmosphere's properties, the presence of random wind gusts, and Gaussian noise on the collected sensor data, are investigated to assess the robustness of the proposed approach in realistic operative conditions. The flexibility and performance of meta-reinforcement learning are also compared against standard reinforcement learning with a fully-connected neural network, to better highlight the potential of the proposed methodology in real-world autonomous guidance applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.