I.INTRODUCTION

Game theory provides a mathematical foundation for strategic interaction, while artificial intelligence (AI) offers tools for learning, inference, and decision-making. Their integration has become increasingly important in multi-agent systems, reinforcement learning (RL), and adversarial training and has created new opportunities for complex autonomous applications.

In recent years, the integration of AI and game theory has given rise to the emerging paradigm of “Intelligent Game” [1]. In this paper, Intelligent Game refers to the use of AI-driven learning, reasoning, and search to model and solve strategic interaction problems.

Intelligent Game problems are typically modeled as multi-agent systems and can be divided into cooperative games, competitive zero-sum games, and mixed general-sum games [2]. These categories respectively emphasize team utility maximization, adversarial equilibrium seeking, and the coexistence of cooperation and competition.

Historically, this domain was often referred to as computer games or machine games, which primarily encompassed traditional perfect-information games (such as checkers, chess, and Go) and imperfect-information games (such as Texas Hold’em and bridge). However, for clarity and to reflect the modern integration of AI and game theory for complex autonomous missions, this paper consistently uses the preferred term “Intelligent Game” throughout the text.

Currently, Intelligent Game scenarios are gradually transitioning from board games, card games, and video games to simulation-based wargaming, with gaming methodologies evolving from single learning methods and distributed learning methods to large-scale, general-purpose learning approaches. Since 2016, research on the AlphaX series of agents (AlphaGo, AlphaZero, AlphaHoldem and AlphaStar) [3] has established new benchmarks for solving various types of gaming problems. Research on Intelligent Game technology has expanded from computer games to task planning and decision-making in different fields.

Intelligent Game is widely applied in the unmanned aerial vehicle (UAV) domain. It integrates AI and game theory to address high-dynamic, information-incomplete, strongly coupled, and large-scale swarm real-time decision-making challenges in four main areas: UAV countermeasure decision-making, collaborative control, resource scheduling, and offense-defense countermeasures. Core technologies include deep reinforcement learning (DRL), multi-agent reinforcement learning (MARL), large models, and self-play evolution, supporting full-scenario deployment from military countermeasures to civilian collaboration.

The rest of the paper is organized as follows. In Section II, we review related works, discussing prior research achievements and the fundamental principles of key technologies like DRL and large-scale models. Section III details the methodology, describing the research background, objectives, and the analytical methods used to explore the symbiotic models of AI and gaming in the drone domain. Section IV presents the findings, analyzing theoretical and applied research hotspots such as generative adversarial networks (GANs), large language models (LLMs), differential games, evolutionary game theory (EGT), and mean field games (MFGs). Finally, Section V provides our conclusions and outlines future work by identifying practical bottlenecks and promising research directions.

II.RELATED WORKS

This paper analyzes the promising application hotspots of Intelligent Game in the UAV domain. Key technologies include DRL, MARL, large-scale models, and self-play evolution, whose fundamental principles involve theories such as AI applications in perfect- and imperfect-information games, the integration of game theory with reinforcement and deep learning, the application of LLMs, and the optimization of Intelligent Game. The following section discusses prior research achievements in these areas, primarily focusing on relatively well-established outcomes.

A.AI APPLICATIONS IN PERFECT- AND IMPERFECT-INFORMATION GAMES

AI milestones in perfect-information games, from Deep Blue’s chess victory to AlphaGo’s success, highlight the powerful integration of gaming algorithms and deep neural networks. In 2016, AlphaGo utilized Monte Carlo Tree Search to explore game states. To predict opponent moves, it employed a policy network trained on high-level player records, while generating rapid move strategies based on local features. Furthermore, it used a value network optimized via RL to evaluate instantaneous board positions and calculate winning probabilities. The algorithm ultimately defeated the then-World Go Champion Lee Sedol 4–1, marking a significant milestone in AI’s success in perfect-information games. In the realm of imperfect-information games, in 2017, DeepStack [4] used Counterfactual Regret Minimization (CFR) to explore the game tree to a certain depth and then fed the state into a fully connected network to obtain the player’s regret value and derived strategies on demand. Since 2019, AI programs such as AlphaStar (StarCraft II AI) and Pluribus (Texas Hold’em AI) [4,5] have defeated professional human players in various human–machine competitions. These advances represented significant breakthroughs in imperfect-information games, with challenges in domains like StarCraft and Texas Hold’em driving cutting-edge research in AI.

B.INTEGRATION OF GAME THEORY, RL, AND DEEP LEARNING

The remarkable success of Intelligent Game technology primarily relies on the combination of game theory and RL paradigms. RL is a method for solving agent strategies, where agents interact with the environment to receive rewards or penalties, optimizing their policies to maximize long-term rewards. CFR and the virtual self-play algorithm FSP [2] are representative methods in the gaming and RL paradigms, serving as algorithmic foundations that bridge the two. Game theory provides the theoretical foundation for MARL [2]. The virtual self-play algorithm, built on large-scale distributed computing frameworks, has become a general RL framework for solving multi-agent system problems. It has been successfully applied to complex, large-scale adversarial scenarios such as Quake III Arena, StarCraft, Honor of Kings, and Texas Hold’em [3,8]. Deep CFR applies deep learning to the Monte Carlo CFR algorithm. This integration offers an effective approach to model the impact of states and actions on strategies. Consequently, it rapidly converges to a Nash equilibrium, which aids in designing agent strategies for both cooperative and competitive environments. Heinrich et al. proposed end-to-end virtual self-play methods in 2015 and 2016, namely XFP (Full-width Extensive-form Fictitious Play) [21] and NFSP (Neural Fictitious Self-Play) [5,6], which use DRL to approximate Nash equilibrium strategies. In recent years, with the rapid development of AI technologies, DRL in gaming has increasingly integrated with emerging fields, gaining prominence in areas such as autonomous driving, embodied intelligence, and generative AI.

C.APPLICATION OF LLMs

The use of game-theoretic tools to enhance the accuracy and internal consistency of models. By incorporating game mechanisms such as signaling games [4,17] and multi-agent debate frameworks into the interactive modeling of LLMs, task performance can be effectively improved. To make language model outputs more consistent and the models themselves more reliable, Jacob and colleagues designed a game in which two modes of the model are driven to find answers they can agree upon. This simple procedure, termed the “consensus game,” enables the LLM to compete with itself, utilizing game theory tools to enhance the model’s accuracy and internal consistency [5,18].

D.OPTIMIZATION IN INTELLIGENT GAME

From the perspective of intelligent optimization in gaming, it involves intelligent optimization algorithms. These algorithms primarily simulate the local or global dynamic behavior of agents, beginning with genetic algorithms introduced in the 1970s. This opened new pathways for solving complex optimization problems lacking precise mathematical models, accelerating the development speed, depth, and breadth of intelligent optimization algorithms. As research on intelligent optimization algorithms has evolved, six major categories have emerged: human-inspired intelligent optimization algorithms, evolutionary algorithms, swarm intelligence optimization algorithms, plant growth-inspired algorithms, natural phenomena-inspired optimization algorithms, and emergent computation [7,8]. These studies demonstrated significant advantages in solving NP-hard problems. Researchers widely applied intelligent optimization algorithms to complex real-world problems such as UAV path planning, power systems, and engineering design [17], providing broad avenues and powerful tools for addressing complex challenges in the UAV domain.

III.METHODOLOGY

A.RESEARCH BACKGROUND AND BASIS

Game theory has important applications in the field of drones. Traditional game theory research primarily focused on two-player zero-sum and non-cooperative games. The emergence of AI has brought new challenges and opportunities to the study of game theory. The integration of game theory and AI forms Intelligent Game. It is a research area that has gained widespread attention, holds vast potential for development, and holds significant practical importance for in-depth exploration in the drone domain.

B.RESEARCH OBJECTIVES

This paper focuses on and explores the relationship between AI and gaming, seeking efficient and coordinated symbiotic models for AI and gaming, particularly by investigating practical application scenarios in the field of drones. Whether for military or civilian use, drone applications are becoming increasingly widespread, while Intelligent Game is developing rapidly. The paper discusses the hotspots, potential, new opportunities, and challenges brought about by the development of Intelligent Game in the drone field.

C.RESEARCH METHODS

This study employs a systematic literature review and comparative analysis to synthesize recent methodological trends. By deeply evaluating theoretical and applied research hotspots, we extract critical academic insights regarding the symbiotic models of AI and gaming within the drone domain, ultimately proposing key problems to be solved.

D.RESEARCH CONTENT

From the dual perspectives of theory and application with potential in the field of Intelligent Game for drones, an analysis of research hotspots is conducted. The paper delves into the research directions of Intelligent Game, centered around applicable scenarios in the drone domain. In Section IV, A, B, C, and D belong to the analysis of theoretical research hotspots with potential for Intelligent Game in drones, while E, F, G, H, and I belong to applied research hotspots with potential in the same field. The field of AI develops rapidly; the related content described in Section II dates from before 2021 and represents relatively mature achievements. This paper primarily expands the research in the following areas.

1).CONTINUOUS OPTIMIZATION OF INTELLIGENT ALGORITHMS

With the rapid advancement of AI, various theoretical approaches to solving Intelligent Game problems are continuously being proposed:

  • (1)Key Algorithms and Simulation Methods. In response to the complexity of multi-agent interactions, intelligent optimization and algorithmic tools such as RL, genetic algorithms, and multi-agent simulation platforms demonstrate the application of these tools in dynamic game learning, strategy stability verification, and numerical simulation.
  • (2)Application of Cooperative Theory. AI can also leverage cooperative theory in game theory to promote collaboration and mutual benefit among participants. Game theory can be used to design effective auction mechanisms, achieving optimal resource allocation, improving market efficiency; promoting cooperation and coordination among users in social networks; and preventing attacks and breaches in the security domain, among other applications.
  • (3)Formation of Three Typical Paradigms. As a novel research paradigm emerging from the fusion of the two fields, game intelligence has formed three typical paradigms covering cooperative, non-cooperative adversarial, and hybrid games. Current research hotspots focus on data-driven strategy learning, where agents no longer rely on preset rules but extract gaming patterns from data through methods such as deep learning.

2).OPTIMIZATION OF INTELLIGENT GAME

The combination of AI and gaming is a complex and multifaceted issue, influenced by multiple factors such as the type and characteristics of the environment, the scarcity and sharing of resources, and the capabilities of AI in perception, control, optimization, and equilibrium. AI employs algorithms and models to optimize and enhance efficiency and effectiveness in the environment, with the gains for both parties being the performance and value of the environment. Here, the environment is the drone domain, where methods such as machine learning, deep learning, and RL are used to optimize and improve the efficiency and effectiveness of drones in practical applications. Each agent’s strategy is the best response to the state and changes in the environment, and mutual trust is based on the correctness and interpretability of the optimization. The equilibrium of this game is an optimization equilibrium, where each agent’s optimization strategy represents the best possible improvement to the environment’s state, while considering the complexity and feasibility of the optimization.

IV.FINDINGS

From the dual perspectives of theoretical development and application of Intelligent Game in the drone domain, this section analyzes research hotspots. The paper explores the development trends, potential, and emerging opportunities and challenges of Intelligent Game in the drone field. To clearly illustrate the relationships between different theoretical models and their practical applications, Table I summarizes the key game-theoretic paradigms, core AI algorithms, and their representative UAV scenarios discussed in this section.

Table I. Overview of Intelligent Game models in UAV systems

ParadigmKey AI tech/algorithmsUAV application scenariosMain advantages
Discrete gamesDRL, CFR, MCTS, SoGPursuit-evasion, Air combatHandles hidden info & discrete logic
DifferentialDDPG, SAC, Neural SDEGuidance, Formation controlOptimal for continuous dynamics
EvolutionaryReplicator dynamics, ADPTask allocation, Swarm defenseRobust for bounded rationality
Mean fieldDRL-FP, Neural MFGsMassive swarm coordinationScalable to infinite agents
GenAI/LLMGANs, CoT, Role-learningIntent prediction, Situation awarenessHigh-level reasoning & zero-shot

A.INTEGRATION OF GAME THEORY AND DRL

The interactions among agents can constitute a game, where each agent’s strategy depends on the strategies of other agents, and each agent’s payoff depends on the strategies of all agents. Agents can exchange information and coordinate through language, signals, actions, and other means or achieve individual or shared goals through competition or cooperation. Game theory can be used to analyze and predict the equilibrium of games among agents.

MARL studies how multiple agents in the same environment learn to achieve common goals or maximize their respective interests. Unlike traditional RL, agents in MARL must not only interact with the environment but also with other agents. Multi-agent DRL methods integrate technologies such as deep learning, RL, and multi-agent system theory, endowing agents with enhanced perception, reasoning, decision-making, and learning capabilities, and demonstrating significant potential in numerous application scenarios.

The integration of game theory and DRL exhibits four key trends: unified algorithms, high-dimensional equilibrium, multi-agent collaboration, and cross-domain engineering implementation [2,4,9]. The core revolves around a fusion paradigm that combines DRL with game-theoretic equilibrium solutions, addressing the efficiency and generalization bottlenecks of traditional methods in complex, dynamic, and large-scale scenarios. This approach forms a complete technical closed-loop encompassing “game modeling–deep representation–reinforcement learning–equilibrium convergence.”

This is applicable to scenarios such as UAV/USV pursuit-evasion, formation-based attacks and defenses, and guidance interception. For example, the incomplete-information pursuit-evasion game based on DRL + Policy Space Response Oracle (PSRO) enhances generalization capabilities in complex environments by designing reward functions for relative distance change rate and obstacle avoidance, thereby outputting stochastic optimal policies. Another example is the use of neural Hamilton–Jacobi–Bellman (HJB) approximation to optimize missile guidance laws, improving hit accuracy under conditions of highly maneuverable targets and nonlinear dynamics. The cooperative pursuit of multiple UAVs, classified as a zero-sum game under perfect- or imperfect-information, utilizes DRL to achieve dynamic decision-making at the second level, enabling real-time adaptation to changing battlefield situations [2,11,31].

  • (1)Unified Game Learning FrameworkSuitable for both perfect and imperfect information, typical algorithms include [9,10] ReBeL, Student of Games (SoG), and Athénan. These frameworks maintain consistency through subgame resolving, reduce reliance on environmental priors with model-free self-play, and improve algorithm generalizability. ReBeL integrates policy iteration with equilibrium solving, achieving low exploitability in imperfect-information games like Texas Hold’em. SoG combines guided search with self-play learning and game-theoretic reasoning. It provides a unified solution for both perfect- and imperfect-information games. Given sufficient computation, it offers theoretical guarantees of convergence to an approximate Nash equilibrium, thereby reducing dependence on domain knowledge. DRL algorithms provide trainable agents with convergent learning algorithms, achieving stable and rational equilibrium in sequential decision-making processes [14,15]. Athénan optimizes the MCTS search of AlphaZero, outperforming traditional variants in a 22-class two-player zero-sum perfect-information game with shorter search times.
  • (2)Efficient Fusion of DRL and Game EquilibriumSeveral typical research approaches and methods have been proposed [9,10]. PSRO + Generative Models: Generative neural networks are used to expand the policy space, combining joint optimization of policy-value networks with Nash/correlated equilibrium solving, enhancing adaptability to opponent strategy mutations and improving equilibrium discovery efficiency under imperfect information. Branching DRL + Prioritized Experience Replay: For high-dimensional state spaces, this approach prioritizes learning critical decision paths, mitigating the “curse of dimensionality.” Entropy-Guided Min-Max Decomposition: In two-team zero-sum games, entropy regularization balances exploration and exploitation, enhancing robustness.
  • (3)Strengthening the Game-Theoretic Foundation of MARLResearch in this area primarily focuses on several directions [4,10]. Centralized Training with Decentralized Execution: Algorithms like Multi-Agent Proximal Policy Optimization (MAPPO) and Heterogeneous-Agent Proximal Policy Optimization (HAPPO) provide monotonic improvement guarantees in zero-sum or non-zero-sum games. Extending MCTS to Multi-Agent Games: This facilitates the integration of global value function learning with local policy execution, addressing issues of credit assignment and coordination consistency among multiple agents. Neural SDE Replacing Traditional Coupled PDEs: This reduces solving complexity from O(N2) to O(N), making it suitable for large-scale swarm control.

B.APPLICATIONS OF AI IN PERFECT-INFORMATION GAMES

1).AI IN SOLVING PERFECT-INFORMATION GAMES

The current development of Intelligent Game technologies, based on algorithmic game theory, DRL, and online convex optimization, has completely enabled solutions to two-player zero-sum perfect-information games like Go.

  • (1)Unified Game Algorithms (Applicable to Both Perfect/Imperfect Information)Typical algorithms include [11,18] SoG and ReBeL. SoG employs a unified framework combining guided search, self-play learning, and game-theoretic reasoning, achieving top-tier performance in perfect-information games like chess and Go, with theoretical guarantees of convergence to optimal strategies given sufficient computational and approximation capabilities. ReBeL breaks the algorithmic barrier between perfect and imperfect-information games, using subgame resolving for consistency to achieve low-exploitability approximate Nash equilibrium solutions, reducing dependency on domain knowledge.
  • (2)Efficient Integration of DRL and Tree SearchThis approach combines joint optimization of policy-value networks with the child node selection mechanism of MCTS to address the “curse of dimensionality” in high-dimensional state spaces and enhance robustness under imperfect information. Key improvements primarily involve modified methods, such as AlphaZero variants, branching DRL, and prioritized experience replay integrated with MCTS. Athénan outperforms AlphaZero with shorter search times, making it suitable for perfect-information games with high real-time requirements [11].
  • (3)Coordination Mechanisms in Multi-Agent Perfect-Information Games (MAPIG)Representative frameworks include Centralized Training with Decentralized Execution and Multi-Agent MCTS (MA-MCTS), among others. By integrating global value function learning with local policy execution, these methods address issues of credit assignment and coordination consistency among multiple agents, improving the efficiency of gaming in large-scale groups. They are applicable to perfect-information games involving multi-agent collaboration or confrontation, such as swarm attacks/defenses and formation coordination.

2).MAIN APPLICATIONS OF PERFECT-INFORMATION GAMES IN THE DRONE DOMAIN

  • (1)Pursuit-Evasion Confrontation. Both parties possess complete state knowledge, with the Nash equilibrium as the solution. Methods such as RL and sliding mode control (SMC) are used to solve the HJI equations. This approach simplifies policy updates, adapts to single- and multi-drone confrontations, and is classified as a zero-sum perfect-information game [11,12].
  • (2)Multi-Drone Cooperative Missions. This includes task allocation, path planning, and formation control. The goal is to achieve a Nash equilibrium or Pareto equilibrium. Adaptive dynamic programming (ADP) combined with neural networks is used to optimize trajectories and resource scheduling, enhancing robustness and real-time performance. This falls under non-zero-sum perfect-information games.
  • (3)Cybersecurity and Anti-Jamming. Modeling anti-jamming in drone swarm communications as an N + 1 zero-sum game, equilibrium strategies are employed to optimize encryption and anti-jamming configurations, ensuring link security. This is a non-zero-sum perfect-information game.
  • (4)Maneuvering Decision-Making. Air combat maneuvering combines extended-form game trees with RL for rapid equilibrium strategy search, integrating graph attention and target prediction to enhance collaborative and adversarial effectiveness. This falls under perfect-information dynamic games.

C.APPLICATIONS OF AI IN IMPERFECT-INFORMATION GAMES

Imperfect-information games are commonly used to describe multi-party sequential strategic interaction processes. During the gaming process, participants cannot obtain complete or accurate information; the game state and the next action are often unknown, and the information available is usually incomplete and imperfect. The application of imperfect-information games in the drone domain primarily involves using probabilistic reasoning combined with MARL to solve for perfect Bayesian Nash equilibrium. This is well suited to real-world scenarios such as unknown target intent, limited communication, and interference/deception, supporting tasks like air combat confrontation, swarm coordination, and search and rescue [13].

Due to the presence of hidden information, imperfect-information games may involve deception, bluffing, and other behaviors, making them more reflective of real-world problems and thus highly applicable. However, due to their large state space and hidden information, participants cannot access all information about the game state, making it impossible to determine an optimal strategy solely by predicting the opponent’s actions. Consequently, participants must not only analyze all possible outcomes of the opponent’s decisions but also consider the opponent’s hidden information. This causes the state space to grow exponentially, increasing the challenge. Therefore, this remains a challenging yet popular research area in AI.

Methods combining virtual regret minimization game reasoning with RL are a significant research direction for imperfect-information games [14]. Currently, in scenarios involving imperfect-information processing, theories like Nash equilibrium are combined with DRL algorithms to solve multi-agent decision-making and optimization problems. The approaches can be broadly categorized into two types: the first is dedicated to algorithms that quickly converge the average policy to a Nash equilibrium strategy. The primary goal of this class of methods is to address the large state space problem in imperfect-information games. The second approach focuses on modeling opponent behavior. It aims to observe and collect opponent characteristics during the game to organize behavioral traits and hidden information. By estimating the opponent’s strategy, the agent can exploit strategic weaknesses to maximize its own payoff.

D.ACHIEVING MORE INTELLIGENT AND EFFICIENT DECISION-MAKING PROCESSES

As one of the theoretical foundations of AI, game theory can be used to analyze and solve various decision-making problems. The integration of game theory and AI enables more intelligent and efficient decision-making processes. In AI, game theory models can describe and address a range of decision-making problems. Core concepts in game theory, such as optimal strategies and equilibrium points, can help AI systems analyze and predict the outcomes of different decisions, guiding the system toward the best choice.

Game theory provides a decision-making model for resolving cooperation and competition problems among multiple participants. In real life, many decision-making problems involve conflicts of interest and common interests among multiple parties. In multi-party games, AI can analyze the strategic choices of each participant through model building, thereby predicting possible outcomes. Through game simulation, AI can quickly identify optimal strategies, helping participants make informed decisions. Common applications include resource allocation and cooperative negotiations. At the same time, the decisions of AI systems often involve gaming or collaboration with other participants, so the effectiveness and payoff of these decisions need evaluation and optimization. Game theory offers a quantitative method to assess the quality of decisions by analyzing the strategies and payoffs of participants, helping evaluate the effectiveness of different decision strategies, optimize system behavior, and thereby guide the improvement and optimization of AI.

Technologies related to AI-generated action (AIGA), such as behavior generation and model generation, provide feasible solutions for solving decision-making problems. Transformer, a representation learning model that uses attention mechanisms to perform sequence-to-sequence transformations, is leveraged to construct decision-making policy solutions for Intelligent Game problems. Representation learning methods based on transformers, sequence modeling approaches, and multimodal fusion learning methods have garnered sustained attention in Intelligent Game decision-making [15]. Current methods centered on Decision Transformers can be categorized into three main types: 1) direct utilization of LLMs (e.g., encyclopedic, video, and Internet knowledge), 2) representation and model learning based on framework transformations (e.g., representation learning and environment learning), and 3) conditional generation based on decision problem reconstruction (e.g., sequence modeling, behavior generation, and world model generation).

This field still faces challenges, such as the complexity of building and solving game models, computational efficiency issues, and information asymmetry [16]. The emergence of ChatGPT in late 2022 has triggered an “arms race” across various AI tracks. However, general language capabilities do not fully match the reasoning abilities required for decision-making. How to construct a “Decision Foundation Models” has become a frontier issue in the current fields of AI and intelligent decision-making.

E.GENERATIVE ADVERSARIAL NETWORKS

GANs essentially consist of two neural networks: a generator and a discriminator. These are two opposing models: a generative model and a discriminative model. In GANs, the generator’s task is to generate random images, while the discriminator’s responsibility is to classify these generated random images. The generative model creates new data samples, and the discriminative model determines whether these samples are authentic. This adversarial training approach has broad applications in the field of AI, providing a powerful tool for designing more intelligent and creative AI agents.

The training process of GANs can be viewed as a game: the generator and discriminator continuously optimize through confrontation, ultimately reaching a Nash equilibrium. This dynamic gaming relationship drives the improvement of generative model quality. GANs exemplify the application of game theory in AI, where the generator and discriminator engage in mutual gaming and optimization, achieving an equilibrium state where neither can further improve.

F.APPLICATION OF LLMs

LLMs are increasingly applied in agent systems, demonstrating significant advantages in handling open-ended tasks, modeling human intent, and reasoning. Their intersection and fusion with game theory enable establishing game benchmarks to evaluate LLM behavior, using game-theoretic algorithms to enhance model performance, describing model impact through game modeling, and analyzing the influence on equilibria in traditional game models.

To enhance agents’ capabilities in autonomous environmental perception, effective collaboration, and reasoning-based decision-making, LLMs are being introduced into multi-agent systems. By combining them with RL and role-based learning methods, a collaborative agent architecture is constructed that possesses the ability to understand complex games and perform high-level strategic reasoning. For partially observable problems, a distributed LLM-based multi-agent collaborative decision-making architecture is built, where each agent is equipped with a cognitive building block module to enhance autonomous perception and environmental modeling capabilities. The introduction of a role-learning module improves agents’ collaborative adaptability in complex games.

With the rise of LLMs like the GPT series, BERT, and Tongyi Qianwen [17], integrating them into the decision-making processes of agents and leveraging their reasoning and generative capabilities for strategy optimization provided new directions for multi-agent collaborative decision-making. To address decision-making difficulties arising from highly dynamic environments, multi-level reasoning modules were designed to guide agents in identifying key game nodes and generating forward-looking strategies, thereby improving decision quality.

Large Models Empowering Gaming: Large models were used for situation understanding, intent prediction, and strategy generation, enhancing cognitive domain gaming capabilities [18,19]. Representative methods included entropy-guided in-context learning (RA-ICL), adaptive chain-of-thought reasoning (CoT), and particularly LLM + game theory reasoning modules. These approaches enhance LLMs’ zero-shot reasoning ability in structured gaming tasks through retrieval augmentation and dynamic reasoning depth adjustment, reducing dependency on task-specific fine-tuning and achieving technological breakthroughs. They are applicable to gaming scenario comprehension, strategy explanation, and opponent modeling, such as in negotiation and strategy games. By leveraging LLMs and deep learning to accelerate the numerical solution and parameter calibration of replicator dynamics equations, model iteration cycles are shortened, improving responsiveness to real-time battlefield data for AI-driven evolutionary game solving.

G.DIFFERENTIAL GAMES

Differential game theory is a mathematical method for studying how multiple participants make optimal decisions in dynamic environments, particularly suited for describing adversarial, continuous-time decision-making processes, and it establishes a rigorous mathematical theoretical foundation. The description of a differential game problem includes agents’ equations of motion, terminal conditions, and performance indices. Participants’ decisions are described by differential equations, which characterize the change in participants’ long-term payoff functions over time. Each participant’s goal is to maximize their individual payoff by selecting optimal control variables for the differential equations. The equations of motion for a multi-agent system are a set of ordinary differential equations.

1).INTEGRATION OF DIFFERENTIAL GAMES AND AI

This section addresses three traditional challenges faced by differential games: differential game problems in complex environments, differential games with incomplete information, and large-scale differential game problems. AI technologies are employed to tackle these challenges.

  • (1)DRL for Solving Continuous Adversarial EquilibriumResearchers integrate the extremum principles of differential games with the joint optimization of DRL policy-value functions. This approach effectively addresses the “curse of dimensionality” and convergence difficulties encountered by traditional numerical methods in high-dimensional, nonlinear, or incomplete-information scenarios. Approaches like fuzzy inference with DRL handle the discretization of continuous states, while Centralized Training with Decentralized Execution enhances multi-agent coordination efficiency. Representative algorithms include Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC)/Multi-Agent SAC, branching DRL, and prioritized experience replay [20], applicable to continuous action space games such as pursuit-evasion, navigation, and multi-agent cooperation.Furthermore, research on MARL algorithms for differential game problems necessitates the refinement of learning algorithms that adapt to the environment. In complete-information differential games, learning equilibrium strategies based on the extremum principle remains challenging due to the inherent difficulty of finding extrema in continuous action spaces. In incomplete-information differential games, exploring model-free or data-driven learning methods is a promising research direction. This can be facilitated by constructing benchmark testing platforms for differential games, which aid in evaluating the capability of RL algorithms to solve such problems and in transferring knowledge from differential games to RL algorithms. Additionally, simulation platforms supporting real-world environment operation are needed to verify algorithm generalization and decision-making capabilities in complex scenarios.
  • (2)Neural Network Approximation of Differential Game EquationsTypical methods include Structure-Aware Learning (SAL), Neural Stochastic Differential Equations (Neural SDEs), and the integration of online Bayesian estimation with DRL [21,22]. Neural SDEs were used to handle stochastic differential games, improving robustness to environmental noise and representing a technological breakthrough. These methods were primarily applied to differential games with continuous state-action spaces, such as pursuit-evasion, navigation, and guidance interception. By performing online parameter identification to update the game model in real time, they adapted to dynamic environments requiring immediate decisions, for instance, in UAV coordination.
  • (3)Neural Network Approximation with Online Adaptive LearningTypical methods included SAL, Neural SDEs, and online Bayesian estimation [22,23]. Neural networks were used to approximate state-value-control functions, and the game model is updated in real time through online parameter identification. This is applicable to perfect-information differential games with continuous state-action spaces, such as pursuit-evasion and navigation, as well as real-time decision-making in dynamic environments, like UAV coordination.
  • (4)Bayesian Learning and Online Parameter IdentificationRepresentative methods include online Bayesian estimation, adaptive feature mapping, and probabilistic behavior prediction, which can be used to infer opponent goals and parameters under incomplete information. By updating the posterior distribution of game parameters in real-time using trajectory data, these methods enable rapid convergence to true values and quantify uncertainty, supporting probabilistic decision-making in dynamic adversarial contexts.

2).MAIN APPLICATIONS OF DIFFERENTIAL GAMES IN THE DRONE DOMAIN

Applications of differential games in the drone field focused on adversarial decision-making, formation control, cooperative pursuit, and race optimization [24]. The core approach involved combining the HJI equation with RL to solve for continuous-time dynamic optimal strategies, heavily incorporating AI to address challenges of uncertainty and real-time requirements.

The development of drone swarms, intelligence, and dynamic adversarial capabilities had made traditional control and planning methods less suitable for application scenarios characterized by continuous dynamics, real-time gaming, and incomplete information. The strength of differential games in solving optimal strategies for continuous time-state dynamic systems aligns well with the dynamic decision-making needs of the drone field. The integration of AI technologies with differential games addresses core problems of traditional differential games, such as the complexity of solving HJI equations and poor adaptability to high-dimensional scenarios.

H.EVOLUTIONARY GAME THEORY

EGT is a branch of game theory inspired by Darwinian evolution, applying game theory to evolutionary contexts. It specifically focuses on how decision-makers, under conditions of bounded rationality, adjust their strategies through learning and adaptation, ultimately forming stable behavioral patterns. Core concepts of evolutionary games include bounded rationality, group strategy evolution, and Evolutionary Stable Strategies (ESS). In this type of game, a group of agents, through a repetitive selection process of evolution, continuously interacts with a variety of strategies, thereby creating a stable solution. The underlying idea is that many behaviors involve interactions among multiple agents within a group, and the success of an individual agent depends on how its chosen strategy interacts with the strategies of other agents. Unlike classical game theory, EGT focuses on how strategies evolve over time and which dynamic strategies are the most successful in the evolutionary process.

With the development of AI technology, this theory provides a powerful tool for analyzing decision-making processes in complex adversarial environments, showing broad application prospects in military simulation and training systems. Compared to other game-theoretic methods, EGT places greater emphasis on dynamic learning and adaptation processes, making it suitable for battlefield environments characterized by high uncertainty and incomplete information. It is well adapted to intelligent, unmanned, and asymmetric warfare scenarios. Through methods like replicator dynamics, multi-player public goods games, and complex network evolution, it provides a quantitative basis for group decision iteration, strategy diffusion, and adversarial equilibrium. However, its models are relatively complex and require substantial data support.

Applications of evolutionary games in the drone domain center on bounded rationality, distributed learning, and dynamic equilibrium. These approaches primarily address swarm task allocation, resource scheduling, attack–defense confrontation, and cooperative control. They deeply integrate with methods such as network potential games, evolutionary learning, ADP, and DRL. This integration effectively solves real-time computation and robustness problems in large-scale swarms. The applications are mainly reflected in the following areas [2527]:

  • (1)Dynamic Resource Allocation (Cooperation-Competition). Public goods game models are designed to balance individual payoff with global resource optimization, adaptable for dynamic scheduling of resources like communication bandwidth and battery power.
  • (2)Swarm Attack-Defense and Pursuit-Evasion. Belonging to zero-sum evolutionary games, strategies are updated using replicator dynamics or logit learning. Under partial observability, opponent intent is predicted to enhance the success rate of encirclement and capture, as well as penetration and evasion capabilities, supporting “swarm vs. swarm” confrontations.
  • (3)Fault-Tolerant Formation and Cooperative Obstacle Avoidance. Belonging to non-zero-sum evolutionary games, participants are leader-follower drones. ESS are used to optimize formation maintenance and fault compensation, maintaining formation stability during actuator failures.
  • (4)Civilian Expansion. Mainly applied in agriculture and logistics. By constructing multi-party evolutionary game models, the strategic interactions among enterprises, service providers, and regulators are characterized to improve operational coordination and distribution efficiency, adapting to smart agriculture and urban logistics scenarios.

Game Model Construction: The payoff function encompasses factors like mission payoff, energy consumption, and fault tolerance efficiency and capture success rate. States include position, velocity, and heading. Constraints include airspace restrictions, dynamic capabilities, communication topology, and temporal limits.

I.MEAN FIELD GAMES

MFG theory, emerging in 2006, was a relatively new field within the family of game theory. When the number of agents in a system was very large, the possible ways of interaction among them could grow exponentially, making modeling exceptionally complex. The introduction of the Mean Field Scenario (MFS) provided an effective solution to this class of problems.

When the number of agents increased toward infinity, the interactions can be represented by MFG theory [27]. It studied the interaction and decision-making problems among populations in continuous time and continuous state spaces. In MFG, the decision-making of each agent was described by a differential equation or dynamic system model, and solving these equations or systems yields the optimal strategy. In MFG, since an agent’s decision is assumed to be influenced by the average behavior of the population rather than directly by other individual agents, it can effectively address the issue of a drastically increasing number of agents. The core idea of MFG is to use the average field formed by a homogeneous population of agents to approximate and characterize the complex interactions among a large number of individuals, thereby achieving effective simplification and dimensionality reduction for large-scale game problems.

MFGs were used to study differential games within large populations of rational players, solving the problem of traditional DRL methods being impractical in environments with infinitely many agents. When a system contains a vast number of agents, these agents not only have preferences regarding their own state (e.g., wealth and resources) but also care about the position of other agents within the overall distribution of the population. Typical progress in the integration of AI and MFG theory includes the following:

1).DRL FOR SOLVING MFG EQUILIBRIUM

From a DRL perspective, in environments that are “approaching infinite agents and assuming an imprecise probabilistic model for operation,” existing DRL methods are not practically viable. However, MFG is a method for simulating precisely this type of DRL environment. By predefining that all agents have similar reward functions, the MFS can simplify the complexity of MARL models. For example, it can simulate collective behaviors like the unified schooling of fish, using a combination of two mathematical equations to describe both an individual agent’s response to the population and the behavioral patterns of the population as a whole:

  • (1)Policy Mixing and Fictitious Play (FP) Upgrades. In 2022, a Google team proposed DRL-FP based on historical policy distillation and a regularized online mixing method, addressing the challenge of averaging neural network policies, outperforming traditional methods in multiple MFG classes [28,30]. In 2025, the Density-Enhanced Deep-Average FP (DE-DAFP) method combined SAC/Proximal Policy Optimization (PPO) with Conditional Normalizing Flows (CNF) to efficiently model non-stationary population distributions, improving sampling efficiency by a factor of 10.
  • (2)Common Noise and Distribution-Dependent Policies. Population-aware Online Mirror Descent (M-OMD) handles MFGs with common noise using DRL, enabling the learning of population-dependent policies. It demonstrated superior convergence performance over mainstream algorithms on seven benchmark problems.

2).NEURAL DIFFERENTIAL EQUATIONS AND CONSTRAINT-PRESERVING NETWORKS

  • (1)Neural MFG (Neural SDE). Proposed in 2025 [29,30], this data-driven neural SDE approach modeled MFGs, solved them via automatic differentiation, reduces dependency on predefined dynamics, and maintains both numerical accuracy and robustness. It has been applied to modeling the dynamics of epidemic spread.
  • (2)Constraint-Preserving Neural Networks. This approach reformulates the MFG equilibrium as a McKean-Vlasov Forward-Backward Stochastic Differential Equation (FBSDE), approximates the value function and its gradient using neural networks, and enforces mathematical consistency of the density-coupled evolution through volume-invariance constraints, thereby improving stability for high-dimensional solutions.
  • (3)Neural Mean Field Games (Neural MFGs): This is the neuralized extension of MFG. It replaces traditional coupled partial differential equations with neural SDEs, achieving autonomous coordination through joint optimization of macroscopic density evolution and microscopic individual policies. It enables lightweight, data-driven modeling, retaining numerical accuracy while reducing dependency on system prior knowledge, enhancing solution efficiency and generalization capability for large-scale scenarios. It is applicable to modeling large-scale multi-agent collective behaviors, such as UAV swarm control and traffic flow optimization.

3).FUTURE RESEARCH DIRECTIONS

  • (1)Theoretical Foundations for RL. In-depth analysis of the convergence rates, generalization bounds, and sample complexity for solving MFGs using DRL and neural operators.
  • (2)Multi-Population and Networked MFGs. Extension to scenarios involving multiple interacting populations and graph-structured MFGs, adapting to social networks, supply chains, and other complex systems.
  • (3)Robust and Adversarial MFGs. Research on MFGs incorporating model uncertainty and adversarial attacks to enhance the security and reliability of AI algorithms.
  • (4)Hardware Acceleration and Real-time Deployment. Integration with GPU/TPU and lightweight models to enable the application of MFG-AI algorithms in real-time systems such as UAV swarms and autonomous driving.

V.CONCLUSION AND FUTURE WORK

This paper surveyed gaming intelligence in the drone domain by linking game-theoretic modeling with modern AI techniques. It summarized key foundations for Intelligent Game-based decision-making in drones. It covered the integration of game theory with deep learning and RL as well as typical approaches for both perfect-information and imperfect-information settings. Based on this synthesis, the paper highlighted five representative research hotspots—GANs, LLMs, differential games, EGT, and MFGs. Their relevance to adversarial learning, intent reasoning, continuous-time pursuit-evasion, adaptive swarm behaviors, and scalable multi-agent decision-making were demonstrated in the paper.

To advance the deployment of Intelligent Game methodologies in physical UAV systems, future research must address several critical bottlenecks. We distill these into four testable challenge statements to guide future work:

  • (1)Real-Time Equilibrium Computation in High-Dimensional Spaces. A primary challenge is the development of distributed solvers to compute game-theoretic equilibria in real time for large-scale, nonlinear coupled UAV swarms. Addressing this requires moving beyond current hierarchical and maneuver planning models to handle high-dimensional state spaces efficiently [31,32].
  • (2)Robustness Under Incomplete Information and Sim-to-Real Gaps. Formulating novel algorithms is necessary to ensure strategy stability and robust sim-to-real generalization. Research must focus on transitioning to physical environments characterized by partial observation, communication delays, and high uncertainty, particularly in aerial target tracking and dynamic confrontation [33].
  • (3)Multi-Objective Trade-Offs in Dynamic Environments. Future work must establish mathematical formulations for payoff functions that dynamically balance mission performance, energy efficiency limits, and strict safety constraints. These trade-offs need to be integrated within a unified game model that accounts for dynamic swarm topology and action coupling during autonomous maneuvers [34].
  • (4)Cross-Domain Cognitive Gaming Integration. A significant frontier is the effective integration of large models and digital twins with dynamic game solvers. This integration aims to elevate UAV interactions from physical trajectory confrontations to autonomous, cognitive-domain strategy generation and situation assessment across air, ground, and surface domains [35].