UAV Formation Control Based on Deep Reinforcement Learning and Dynamic Artificial Potential Field

shaoxuan dong; Zetian Sun; Jiarui Li; Bo Li

doi:10.37965/jait.2026.1299

UAV Formation Control Based on Deep Reinforcement Learning and Dynamic Artificial Potential Field

Authors

shaoxuan dong Xi’an Hummingbird Pilot Testing Technology Co., Ltd, Xi’an, China
Zetian Sun School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China https://orcid.org/0009-0006-1713-8008
Jiarui Li School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
Bo Li School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China https://orcid.org/0000-0002-1415-4444

DOI:

https://doi.org/10.37965/jait.2026.1299

Keywords:

artificial potential field, deep reinforcement learning, formation control, UAV

Abstract

The navigation of unmanned aerial vehicle (UAV) swarms in complex environments faces significant challenges, especially the inherent local minimum deadlock and parameter rigidity issues of traditional artificial potential field (APF) algorithms. To address these issues, this paper proposes a novel hierarchical formation control framework that deeply integrates the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with a second-order consensus protocol. Based on a leader–follower augmented topology, the followers adopt a distributed consensus control law to maintain the geometric rigidity and structural safety of the formation. For the leader, this paper introduces an AI-driven meta-control architecture: TD3 agent continuously interacts with the physical environment, not only dynamically optimizing the optimal attractive and repulsive force gains but also outputting a continuous repulsive force deflection angle. Furthermore, by combining algebraic graph theory and Young’s inequality, this paper rigorously proves that this nonautonomous closed-loop system with time-varying parameters guarantees uniform ultimate boundedness. Comparative simulations in a simulated complex dense forest environment show that the proposed TD3-APF method significantly improves the obstacle avoidance success rate.