JAIT

I.INTRODUCTION

The rapid evolution of cyber-physical systems (CPS) and proliferation of internet of things (IoT) devices have transformed modern living, enabling intelligent applications in smart homes, healthcare, industrial automation, and critical infrastructure [1]. These systems, however, are inherently exposed to a wide spectrum of cyber threats due to their reliance on interconnected networks, heterogeneous devices, and resource-constrained environments [2,3]. Traditional signature-based intrusion detection systems (IDS) [4] are insufficient to handle such large-scale, dynamic, and sophisticated attacks, motivating the adoption of artificial intelligence (AI), machine learning (ML), and deep learning (DL) methods for securing IoT-enabled CPS.

Recent years have witnessed significant progress in AI-driven intrusion detection, in which ML algorithms such as random forests (RF), support vector machines (SVM), and gradient boosting (GB) have been employed to detect attacks in network traffic [5–7]. Similarly, DL techniques, including convolutional neural networks (CNNs), recurrent neural networks, long short-term memory networks (LSTMs), and auto-encoders (AEs), have demonstrated superior capabilities for automatically learning feature representations from raw traffic flows [8–10]. Despite these advances, challenges remain in terms of scalability, robustness, and adaptability. Most existing IDS approaches fail to cope with high-dimensional IoT traffic, suffer from concept drift due to evolving attack strategies, and are often biased towards the majority classes due to imbalanced data distributions [11]. This leads to poor generalization and degraded detection accuracy in real-world CPS deployments.

To address these limitations, this work emphasizes the need to develop a robust DL-based classification framework for accurate attack detection in CPS environments, focusing solely on analyzing network packet data in the IoT–edge–cloud environment. The goal is to design an IDS that can learn complex temporal and spatial dependencies in traffic, adapt to distributional changes, and effectively distinguish between benign and malicious flows in IoT–edge–cloud ecosystems. Hence, this work proposes SmartCPS-ADAPT, a hybrid DL-GB model for attack detection in CPS. The model integrates a CNN bidirectional LSTM (BiLSTM) based feature extractor with an adaptive XGBoost (XGB) classifier (XGB-ADAPT). The CNN layers capture local spatial dependencies, while BiLSTM learns long-range temporal patterns from flow sequences. The extracted representations are passed to XGB-ADAPT, which incorporates class-sensitive reweighting, time-decayed reservoir sampling, and incremental leaf recalibration to tackle data imbalance and concept drift. Furthermore, a drift detection mechanism continuously monitors system performance and distribution shifts, ensuring adaptability to evolving threats. By leveraging both deep representation learning and adaptive boosting-based classification, SmartCPS-ADAPT provides a scalable, robust, and efficient IDS for IoT-enabled CPS, bridging the gap between existing research works. The contributions of the presented work are as follows.

This work presents SmartCPS-ADAPT, a hybrid DL and adaptive boosting framework for cyber-attack detection in CPS using IoT traffic. A two-stage feature extraction module is designed using CNN and BiLSTM to capture both spatial correlations and long-term temporal dependencies in raw packet-based flows. XGB-ADAPT is introduced, integrating class-sensitive reweighting, reservoir sampling with exponential decay, and incremental leaf recalibration for handling imbalance and drift. A drift detection mechanism is implemented by monitoring loss, feature divergence, and confidence shift to adaptively update the model without costly retraining. The SmartCPS-ADAPT is evaluated on the CICIoT2023 dataset, a large-scale benchmark for IoT security, achieving robust performance across binary and multi-class classification tasks. The proposed solution demonstrates robustness and adaptability, making it suitable for deployment in IoT–edge–cloud CPS environments.

The manuscript is organized as follows. Section II presents a literature survey that discusses existing IDS approaches from recent years. Section III presents the SmartCPS-ADAPT methodology in detail. Section IV discusses the results achieved by SmartCPS-ADAPT and compares them with existing approaches discussed in the literature survey, and Section V presents the conclusion and future work of SmartCPS-ADAPT.

II.LITERATURE SURVEY

This section presents recent AI-based IDS approaches developed for detecting attacks in IoT environments. T. -T. -H. Le et al. [11] aimed to improve IoT security by enhancing intrusion detection through accurate classification and transparent explanations. In this work, the raw data, comprising normal and attack cases, were preprocessed and then validated using a blended approach with three ML models: GB, decision tree (DT), and RF, with a voting estimator used to evaluate the best performance. Furthermore, to better interpret the classification outcomes, local-interpretable model-agnostic explanations and counterfactual analysis were utilized. This work employed the mean-decrease impurity (MDI) approach for feature selection. For evaluation, the IoTID20 and CICIoT2023 datasets were used, with blending achieving 100% and 99.51% accuracy, respectively. Sabeel et al. [12] aimed at enhancing IDS for identifying polymorphic and atypical network attacks, which are often missed by existing approaches. The methodology used in this study involved generating adversarial polymorphic attacks, analyzing their quality, and retraining IDS incrementally to adapt to evolving threats. In this work, the Heterogeneous Feature Selection Ensemble approach was utilized to select feature subsets. Further, for the interpretation of results using the approach, they utilized SHapley Additive exPlanations (SHAP). The study was evaluated using CICIoT2023 and CICIDS2017, where the approach achieved nearly 90% balanced accuracy on both datasets.

A. H. Farea et al. [13] aimed at presenting a unified approach that addressed security in IoT environments. Hence, this work presented a replacement-encoding (RE) approach that concealed sensitive data during AI training while maintaining model utilities and providing automated preprocessing. For feature selection, 100 message packet attributes were extracted from packet-capture files of the CICIoT2023 dataset using Wireshark, and a genetic algorithm (GA) was applied to refine correlated features. For classification, we used a deep neural network (DNN) and RF. The results show that the DNN and RF with RE and GA achieved accuracies of 92.16% and 94.81%, respectively. M. Abd Elaziz et al. [14] aimed at enhancing IDS performance in IoT networks, for which they presented a convolutional Kolmogorov–Arnold-network (CKAN) approach, which replaced conventional multilayer perceptron in CNNs using KAN layers for reducing parameters, thereby improving performance. In this work, CNNs were used to extract features, focusing on network characteristics from standard IDS- and IoT-specific datasets. For the evaluation of CKAN, TONIoT, CICIoT2023, and NSL-KDD, the achieved results were 93.3%, 98.84%, 99.2% for multi-classification and 99.93%, 99.22%, and 98.71% for binary classification.

B. Susilo et al. [15] aimed at improving IoT security by improving cyber-attack detection using a DL approach; hence, they adopted a multistage approach, where they applied the synthetic minority oversampling technique (SMOTE) for addressing class imbalance, AEs for extracting features, LSTMs for capturing temporal patterns, and a CNN performed final classification. For evaluation, the CICIoT2023 dataset was used, achieving 99.15% accuracy. M. V. C. Aragão et al. [16] developed an efficient and scalable IoT threat-detection approach that handles large datasets with minimal computational overhead. In this work, an ML approach employing a sample-based, multistage approach was used, integrating feature selection methods such as SHAP, recursive feature elimination (RFE), and Boruta. For handling class imbalance, a hyperparameter optimization using a tree-structured Parzen estimator sampler. For classification, three approaches were used: Light gradient-boosting machine (LightGBM), XGB, and XGB with RF (XGB-RF). Results show better performance for various attack classifications on 1%, 5%, and 10% testing data.

Fares et al. [17] aimed to enhance IoT IDS by addressing the challenges of limited datasets and high computational demands in DL approaches; hence, they presented a hybrid transfer-learning approach that combined swin-transformers (ST) for hierarchical feature learning with an LSTM network for sequential pattern analysis. For feature selection, pretrained weights were generated and fine-tuned to improve adaptability, which helped capture temporal-structural attacks. Experiments were conducted on CICIoT2023, MQTT-IoT, BoTIoT, ToN-IoT, and NSL-KDD datasets, where better outcomes were achieved compared with AE, residual-network (ResNet), RNN, CNN, and LSTM. M.-R. Fida et al. [18] aimed to secure IoT systems against flow-based attacks by presenting an approach called IoTShield, which consisted of a dual-stage software-defined-network (SDN)-based defensive framework. The approach assigned programmable switches to detect specific attack classes and leveraged network controllers for refined classification and defence updates. For feature selection, an MDI approach was used, focusing on traffic attributes related to data exfiltration, scanning, spoofing, web-based attacks, and DDoS attacks. For the evaluation of IoTShield, the CICIoT2023 dataset was considered, where the approach reduced false alarms by 58% for DDoS attacks and achieved 80–99% accuracy for web attacks with DTs in the data plane and 99% accuracy using CNNs.

S. Alahmari et al. [19] aimed at improving IoT security against rising cyber threats by integrating data augmentation and distributed learning approaches. In this work, we combined generative-adversarial networks (GANs) and federated learning (FL) to balance datasets, using XGB as the backbone. In this work, no feature selection was considered. For the evaluation of the work, the CICIoT2023 dataset was used, where 63.60% accuracy was achieved on the original CICIoT2023 dataset and 94.62% on CICIoT2023 GAN-generated synthetic data. Y. Zhao et al. [20] aimed to secure healthcare 5.0 IoT systems by addressing cyber threats and overcoming issues related to non-independent and identically distributed (Non-IID) data and device intermittency, thereby presenting the transformer-driven FL security for healthcare-industry (TFedSec-HI) approach. In this work, local binary patterns (LBP) and Sobel edge detection were employed to transform network traffic into grayscale and RGB images, enabling effective extraction using a lightweight vision-transformer on edge devices. Furthermore, model aggregation used a FedProx approach to handle data heterogeneity. Experiments were conducted on the CICIoMT2024 and CICIoT2023 datasets, achieving accuracies of 99.74% and 99.18%, respectively.

Although existing AI-based IDS approaches have shown promising results, several limitations persist. The blended ML ensemble by [21] achieved high accuracy but relied heavily on handcrafted features and static voting, making it less adaptive to evolving IoT threats. It addressed polymorphic attacks but required adversarial data generation and incremental retraining, both of which are computationally expensive [22]. The RE with GA by [23] improved privacy and feature reduction but introduced complexity in preprocessing did not address data drift. The CKAN model [24] reduced the number of parameters but still relied on conventional CNN feature learning, thereby limiting robustness to unseen patterns. Used SMOTE with AE–LSTM–CNN, but oversampling risked synthetic bias [25]. In the proposed scalable machine learning framework, the model efficiently processes large-scale IoT traffic while maintaining high detection performance and adaptability [26,27]. Enhanced SDN-based defence but was infrastructure-dependent [28]. Accuracy improved with GAN–FL, but synthetic data risk overfitting [29]. Although vision transformers demonstrate strong feature extraction capabilities, their requirement to transform data into image representations introduces additional computational overhead and latency. [30]. In contrast, SmartCPS-ADAPT overcomes these gaps by combining Convolutional Neural Network-Bi-Directional Long Short-Term Memory (CNN–BiLSTM) feature extraction with adaptive XGB, incorporating drift detection, class-sensitive weighting, and reservoir-based adaptation, ensuring scalability, temporal awareness, and robustness against evolving IoT cyber threats.

III.METHODOLOGY

This section presents the SmartCPS-ADAPT methodology. The discussion is structured as follows: first, the overall SmartCPS-ADAPT architecture is described, followed by details of the dataset employed in this study. Next, the preprocessing steps applied to the raw data are outlined. Subsequently, the feature extraction process is explained, highlighting the use of a CNN–BiLSTM model to learn spatial and temporal patterns. The classification strategy using the adaptive XGB model is then presented. Finally, the drift detection mechanism integrated into the framework is elaborated to demonstrate its ability to maintain robustness under evolving IoT attack scenarios.

A.ARCHITECTURE

The SmartCPS-ADAPT architecture is shown in Fig. 1 and comprises an end-to-end IDS framework for IoT environments. The SmartCPS-ADAPT first considers the CICIoT2023 dataset, which comprises raw IoT traffic with eight classes (seven attack classes and one benign class), collected from 105 IoT devices. The complete details of the dataset are discussed in Section III.B. Next, the dataset is preprocessed, which is discussed in detail in Section III.C. Further, the dataset is partitioned into training and testing sets. The training data are then passed to the feature extraction module, where a hybrid CNN and Bi-LSTM model learns both spatial correlations and long-term temporal dependencies in IoT flows, as discussed in detail in Section III.D. Further, the extracted representations are fed into a classification module called as XGB-ADAPT, which is discussed in detail in Section III.E. Further, for drift detection, the SmartCPS-ADAPT presents a novel algorithm, as discussed in Section III.F. Further, using the test data, the performance of SmartCPS-ADAPT is evaluated for binary and multi-class classification using standard performance metrics.

Fig. 1. SmartCPS-ADAPT architecture.

B.DATASET

In this study, for training/testing the SmartCPS-ADAPT, this work considered the CICIoT2023 dataset [21]. The CICIoT2023 dataset is a large-scale, benchmarked dataset designed to support research in IoT security, addressing the growing need for realistic IDS and threat analysis in modern IoT environments. The dataset was developed by the Canadian Institute for Cybersecurity (CIC), which provides a comprehensive collection of malicious and benign network traffic, simulating real-world IoT environments such as sensor-based smart homes and industrial control systems applications. The dataset was generated using 105 IoT devices and attack scenarios, which captured packet-based and flow-based features. The dataset comprises DoS, DDoS, brute-force, Mirai, spoofing, web-based, and reconnaissance attacks. The details of the complete dataset are presented in Table I.

Table I. Ciciot2023 dataset

Data description	Value
Samples	46,686,579
Features	46
Classes	Benign, DoS, DDoS, Mirai, Spoofing, Recon, Web, BruteForce
Benign samples	1,098,195
DoS samples	8,090,738
DDoS samples	33,984,560
Mirai	2,634,124
Spoofing	486,504
Recon	354,565
Web	24,829
BruteForce	13,064

C.PREPROCESSING

In this study, the CICIoT2023 dataset, provided in 169 separate CSV files, was first consolidated into a single file for simplifying processing and model training. The next step considered in preprocessing involved converting text-based labels into numerical representations to make them compatible with SmartCPS-ADAPT. For binary classification, benign traffic was labeled as 0 and malicious traffic as 1; for multi-class classification, malicious flows were grouped into seven distinct classes (33 classes, including benign traffic), resulting in eight labels in total. To ensure consistent feature scaling and improve SmartCPS-ADAPT performance, feature normalization was applied using StandardScaler, which transformed values to a standard normal distribution with a mean of 0 and a standard deviation of 1. As the predefined training and testing sets were not available in the dataset, the holdout approach was used to partition the dataset, i.e., 70% of the dataset was allocated for training and 30% for testing. To avoid potential data leakage, the dataset was shuffled before splitting, and stratified sampling was applied to preserve class distributions across training and testing sets. Additionally, duplicate samples were removed during preprocessing to ensure that identical flows did not appear simultaneously in both training and testing partitions. This strategy helps ensure a fair evaluation of the SmartCPS-ADAPT framework.

D.FEATURE EXTRACTION

For feature extraction, this work presents a two-stage 1-dimensional (1D) convolutional-recurrent feature extractor that converts raw tabular flow samples into a compact, temporally aware representation suitable for both binary and multi-class classification. Consider the CICIoT2023 dataset, which has $N$ samples with $d$ features. This work first constructs short-ordered sequences from flows per device/session using a sliding window of length $T$ , so the SmartC, PS-ADAPTn gets sequences $X^{(i)} \in R^{T \times d}$ , where $i$ denotes index sequence windows. The convolutional encoder applies 1D-convolution ross mpol as for learning local temporal patens.e., for coolutional layer $l$ having kernel $K^{(l)} \in R^{k_{l} \times d_{l} \times c_{l}}$ , where $k_{l}$ denotes kernel length, $d_{l}$ denotes input channels, and $c_{l}$ denotes output channels. Using this the pre-activation denoted as $Z^{(l)}$ and activation denoted as $H^{(l)}$ are evaluated using Eq. (1) and Eq. (2), respectively.

Z^{(l)} = K^{(l)} * H^{(l - 1)} + b^{(l)}

(1)

H^{(l)} = φ (Z^{(l)})

(2)

In Eq. (1), $*$ denotes 1-D convolution, $b^{(l)}$ denotes a bias vector, and in Eq. (2), $φ$ denotes elementwise non-linearity, i.e., rectified linear unit (ReLU). In this work, the convolutional blocks $L$ have been stacked with batch normalization denoted as $B N$ and residual skip connections to stabilize training in residual blocks. This function is mathematically represented using Eq. (3).

H^{(l)} = φ (B N (K^{(l)} * H^{(l - 1)} + b^{(l)})) + H^{(l - 1)}

(3)

In this work, temporal max-pooling has been applied as it reduces the sequence length to $T^{'}$ and produces convolutional feature maps $C \in R^{T^{'} \times m}$ , where $m$ denotes final channel count. These convolutional features are then passed to an LSTM for capturing long-range sequential dependencies. In LSTM, for each time step $t$ , it computes gates and states using Eqs. (4)–(9).

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(4)

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(5)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(6)

{\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(7)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

(8)

h_{t} = o_{t} \tanh (c_{t})

(9)

In Eq. (4) to Eq. (9), $x_{t} \in R^{m}$ denotes a convolution feature at $t$ , $h_{t}$ , and $c_{t}$ denote hidden and cell states, matrices $W_{*}$ , $U_{*}$ and biases $b_{*}$ are learnable parameters, $σ$ denotes sigmoid, and $⊙$ denotes element-wise product. This work utilizes a Bi-LSTM, so the final sequence representation is $H_{seq} = [\underset{h_{T}}{\to}; \underset{h_{1}}{\to}] \in R^{2 r}$ , where $r$ denotes hidden size. For obtaining a compact feature vector $z \in R^{q}$ . This work applies multi-head self-attention to LSTM outputs, achieving attention scores using Eq. (10) and a pooled vector using Eq. (11).

α_{j} = s o f t m a x (w_{a}^{T} \tanh (W_{a} H_{a l l} + b_{a}))

(10)

z = \sum_{j} α_{j} h_{j}

(11)

In Eq. (10), $H_{all}$ stacks all LSTM hidden states and $w_{a}$ , $W_{a}$ and $b_{a}$ are learnable. Finally, a projection layer $z^{'} = R e l U (W_{p} z + b_{p})$ yields an extracted feature vector used by downstream classifiers. All intermediate outputs are regularized by dropout and $L 2$ penalties; loss for end-to-end pretraining uses cross-entropy. Using this feature extraction approach, the SmartCPS-ADAPT extracts spatial (feature-wise) and temporal patterns, reduces noise through pooling/attention, and produces a compact embedding from 46 raw features across various temporal windows for both binary and multi-class classification.

EIMPLEMENTATION DETAILS

The CNN-BiLSTM network consists of three convolutional layers with filter sizes of 64, 128, and 256, respectively, each using a kernel size of 3. Batch normalization and dropout with a rate of 0.3 were applied after each convolution layer to prevent overfitting. The BiLSTM layer contains 128 hidden units and processes sequential traffic windows extracted from the input data. The network was trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 128. For the XGB-ADAPT classifier, the maximum tree depth was set to 8, the learning rate to 0.1, and the number of estimators to 200.

F.CLASSIFICATION

In this work, a hybrid XGB variant, called XGB-ADAPT, is presented, which combines cost-sensitive gradient boosting, sample reweighting, and incremental leaf recalibration to handle drift and class imbalance. Consider $D = {(x_{i}, y_{i}, w_{i})}$ . In the standard XGB [22], the objective for iteration $t$ is evaluated using Eq. (12). In Eq. (12), $y_{i}$ denotes actual label and ${\hat{y}}_{i}$ denotes predicted label, $t$ denotes time-step $f_{t}$ denotes new tree, $x_{i}$ denotes input, $l$ denotes loss (this work has used logistic/focal hybrid), $Ω$ denotes tree regularization. This work modifies the instance loss with class-sensitive scaling, as presented in Eq. (13).

ℒ^{(t)} = \sum_{i} l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

(12)

l_{c s} (y, \hat{y}) = α_{y} l (y, \hat{y}) + β F L_{γ} (y, \hat{y})

(13)

In Eq. (13), $α_{y}$ denotes class-weight inversely proportional to smoothed class frequency, $F L_{γ}$ denotes focal loss with the focusing parameter $γ$ , and $β$ balances these two components. From this, the gradients and Hessians in the XGB function are updated as presented in Eqs. (14) and (15). To address drift, this work maintains a time-decayed reservoir of recent samples. $R$ . With exponential ageing, i.e., when a new sample $(x_{n}, y_{n})$ arrives, its weight is initialized $w_{n} = 1$ and existing reservoir weights are multiplied by decay $λ \in (0, 1)$ . By training on mini-batches sampled proportionally from the time- decayed reservoir, the model prioritises recent data distributions, thereby improving adaptability to concept drift. Further, during tree construction, the gain of splitting a node is computed as a weighted sum, i.e., for a candidate split. s, the gain is evaluated as presented in Eq. (16).

g_{i} = \partial_{\hat{y}} l_{c s} (y_{i}, {\hat{y}}_{i})

(14)

h_{i} = \partial_{\hat{y}}^{2} l_{c s} (y_{i}, {\hat{y}}_{i})

(15)

G a i n = \frac{1}{2} (\frac{G_{L}^{2}}{H_{L} + η} + \frac{G_{R}^{2}}{H_{R} + η} + \frac{G_{tot}^{2}}{H_{tot} + η}) - γ T

(16)

In Eq. (16), $G_{*}$ and $H_{*}$ are weighted gradients and Hessians aggregated with weights? $w_{i}$ , $η$ denotes leaf regularizer and $γ T$ denote complexity penalty. After each epoch, XGB-ADAPT performs leaf re-calibration, i.e., given leaf predictions. $p_{j}$ and the recent validation window $V$ This work solves the regularized least squares update as presented in Eq. (17). Eq. (17) adapts leaf outputs without rebuilding trees. This enables fast adaptation to drift. The combination of class-sensitive loss, decayed reservoir sampling, and leaf re-calibration provides a GB approach that is both robust to shifting distributions and attentive to minority classes while keeping training efficient.

\min_{Δ} \sum_{(x, y) \in V} w (x) {(y - (p_{l e a f (x)} + Δ_{l e a f (x)}))}^{2} + ρ {| | Δ | |}^{2}

(17)

G.DRIFT DETECTION

This section presents the drift detection algorithm used in XGB-ADAPT, where the main goal is to detect distribution or performance drift and adapt the classifier with minimal retraining. The algorithm for the drift detection is given below. This algorithm balances sensitivity (detecting real shifts) and stability (avoiding false positives) by combining SmartCPS-ADAPT performance signals with data-distribution checks and adapting through a lightweight, reservoir-based update.

Algorithm 1. XGB-ADAPT Drift Detection.

Input Stream of batches

B_{t}

, each batch size

b

, current SmartCPS-ADAPT model

M

, reference windows

W_{s}

(stable) and

W_{r}

(recent),

ρ \in (0, 1)

, feature importance weights

{w_{j}}

, fusion weights

σ_{L}

σ_{K L}

σ_{C}

, warning threshold

τ_{w a r n}

, drift threshold

τ_{d r i f t}

(τ_{w a r n} < τ_{w a r n})

, reservoir capacity

R

(number of batches to collect), decay factor

λ

for reservoir weighting, class weights

{α_{y}}

and persistence requirement

k

(consecutive batches above threshold to trigger)

Output Updated SmartCPS-ADAPT model

M

adapted to the new distribution and reservoir

R

containing prioritised recent samples.

Step 1 Start

Step 2 Set loss

μ_{L, 0}

, i.e., initial validation loss

Step 3 Set counters

w a r n_{c o u n t} = 0

and

{d r i f t}_{c o u n t} = 0

Step 4 Initialise empty reservoir

R =

Step 5 For each incoming batch

B_{t}

Compute batch loss using

L_{t} = \frac{1}{b} \sum_{(x, y) \in B_{t}} ℓ (y, \hat{y})

For each feature,

j

compute marginal histogram

p_{t}^{j}

Compute prediction confidence distribution at

C_{t}

Update loss using

μ_{L, t} = ρ μ_{L, t - 1} + (1 - ρ) L_{t}

Compute drift scores

Loss drift

S_{L} = μ_{L, t} - μ_{L, r e f}

.Feature Kullback-Leibler aggregate

S_{K L} = \sum_{j} w_{j} K L (p_{t}^{j} | | p_{r e f}^{j})

confidence shift

S_{C} = {| | C D F (C_{t}) - C D F (C_{r e f}) | |}_{1}

Normalize each score using

S = σ_{L} S_{L} + σ_{K L} S_{K L} + σ_{C} S_{C}

S > τ_{warn}

w a r n_{c o u n t} + = 1

, increase reservoir sampling rate

Else

{warn}_{count} = o

S > τ_{drift}

{d r i f t}_{c o u n t} + = 1

Else

{d r i f t}_{c o u n t} = 0

End If

Trigger adaptation only if

{d r i f t}_{c o u n t} \geq k

Freeze

M

Collect next

R

batches into the reservoir

R

. When adding a new batch, apply exponential ageing to existing reservoir sample weights by multiplying the weight by

Reweight samples in

R

with class weights

α_{y}

and time decay

Incrementally retrain/fine-tune

M

R

. Apply leaf re-calibration for fast correction.

Update reference winw

W_{s}

, reset and reseti.

{r i f t}_{c o u n t} = 0

{w a r n}_{o u n t} = 0

Periodically, compute reference statistics

μ_{L, r e f}

p_{r e f}^{j}

and

C_{r e f}

from updated

W_{s}

End for

Step 6 End

Algorithm 1 continuously monitors IoT data streams to detect and adapt to concept drift. The algorithm processes incoming mini-batches $B_{t}$ of size $b$ , computes average loss $L_{t}$ , and updates the exponentially weighted moving average $μ_{L, t}$ for stability. For each batch, feature distribution $p_{t}^{j}$ and confidence scores $C_{t}$ are analyzed, with drift quantified using three metrics, loss drift $S_{L}$ , feature divergence $S_{K L}$ , and confidence shift $S_{C}$ . These are fused into a composite score. $S$ using weights $σ_{L}$ , $σ_{K L}$ , $σ_{C}$ . If the $S$ exceeds warning threshold $τ_{warn}$ , sampling is intensified; if it exceeds the drift threshold $τ_{drift}$ for $k$ consecutive batches, adaptation is triggered. The SmartCPS-ADAPT freezes, a reservoir. $R$ of size $R$ is populated with recent data, reweighted using decay $λ$ and class weights $α_{y}$ SmartCPS-ADAPT is incrementally fine-tuned with recalibration. The results achieved by the SmartCPS-ADAPT are discussed in detail in the next section.

IV.RESULTS AND DISCUSSION

A.PERFORMANCE EVALUATION METRICS

To evaluate the performance of SmartCPS-ADAPT, standard metrics such as accuracy, precision, recall, and F1-score were adopted, as defined in Eq. (18)–Eq (21), where, $T P$ represents true positives, $T N$ denotes true negatives, $F P$ refers to false positives, and $F N$ indicates false negatives.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(18)

i . i . P r e c i s i o n = \frac{T P}{T P + F P}

(19)

Recall = \frac{T P}{T P + F N}

(20)

F - S c o r e = 2 \cdot \frac{Precision \cdot R e c a l l}{Precision + R e c a l l}

(21)

B.CROSS-VALIDATION AND STATISTICAL ANALYSIS

To further validate the robustness and generalization capability of the proposed SmartCPS-ADAPT framework, a fivefold cross-validation experiment was conducted on the CICIoT2023 dataset. The dataset was randomly shuffled and partitioned into five equal folds, preserving the class distribution via stratified sampling. In each iteration, four folds were used for training, and the remaining fold was used for testing. The average results across the five folds demonstrate consistent performance, confirming the stability of the proposed model. The SmartCPS-ADAPT achieved an average accuracy of 99.82% with a standard deviation of 0.03, indicating that the model maintains reliable detection capability across different data partitions.

C.BINARY CLASSIFICATION

The experimental results, as presented in Fig. 2, demonstrate SmartCPS-ADAPT’s performance on the CICIoT2023 dataset for binary-class classification. In a binary-class classification task, SmartCPS-ADAPT achieves 100% accuracy, 99.9% precision, 99.95% recall, and an F-score of 99.92%. The performance is attributed to the integration of CNN-LSTM for hierarchical and temporal feature extraction, which enables effective learning of both spatial correlations in network traffic and sequential dependencies in attack patterns. Additionally, the adaptive XGB classifier enhances robustness by efficiently handling large-scale data while mitigating drift and class imbalance, ensuring consistent generalization to unseen samples.

Fig. 2. Workflow of drift detection and adaptive updating in SmartCPS-ADAPT.

D.MULTI-CLASS CLASSIFICATION

The experimental results, as shown in Fig. 3, demonstrate SmartCPS-ADAPT’s performance on the CICIoT2023 multi-class classification dataset. In the more challenging multi-class classification setting, which involves differentiating between benign traffic and seven distinct attack categories, SmartCPS-ADAPT achieves 99.85% accuracy, 99.7% precision, 99.8% recall, and 99.75% F-score. The results show that SmartCPS-ADAPT provides a robust and generalizable DL-based IDS framework, offering significant improvements by effectively capturing complex attack behaviors, addressing data drift, and maintaining reliable performance across both binary and multi-class tasks. The multi-class classification performance of the proposed framework is illustrated in Fig. 4, demonstrating its effectiveness in distinguishing among multiple attack categories with high precision and recall.

Fig. 3. SmartCPS-ADAPT performance for binary classification on the CICIoT2023 dataset.

Fig. 4. SmartCPS-ADAPT performance for binary classification on the CICIoT2023 dataset.

E.ABLATION STUDY

To evaluate the contribution of individual components of the SmartCPS-ADAPT framework, an ablation study was conducted. Different variants of the architecture were evaluated by progressively adding model components.

The results confirm that each component contributes to improved detection performance, with the full SmartCPS-ADAPT framework achieving the highest accuracy.

F.DRIFT ADAPTATION EVALUATION

To evaluate the effectiveness of the drift detection mechanism, a controlled distribution shift experiment was conducted. The model was initially trained on the original dataset distribution, and the testing data were modified to simulate evolving attack patterns. The performance of the baseline model without drift adaptation was compared with the SmartCPS-ADAPT model.

The results demonstrate that the proposed drift detection and adaptation mechanism effectively maintains detection performance under changing data distributions.

G.COMPARATIVE STUDY

The comparative evaluation on the CICIoT2023 dataset highlights the effectiveness of SmartCPS-ADAPT over existing IDS in both binary-class and multi-class classification tasks. In the binary classification setting, as shown in Table II, SmartCPS-ADAPT achieves 100% accuracy, 99.9% precision, 99.95% recall, and an F-score of 99.92%, surpassing existing approaches such as CKAN [14] and XGB [16]. While ensemble models such as LightGBM and XGB-RF also achieve perfect accuracy, their precision and recall remain significantly lower, with XGB-RF particularly struggling on recall (92.3%). This demonstrates that SmartCPS-ADAPT not only reaches perfect accuracy but also maintains better performance, which is essential to minimize both FPs and FNs.

Table II. Performance of SmartCPS-ADAPT using 5-fold cross-validation on the CICIoT2023 dataset

Fold	Accuracy	Precision	Recall	F1
1	99.81	99.72	99.79	99.75
2	99.84	99.70	99.83	99.76
3	99.83	99.69	99.81	99.74
4	99.80	99.71	99.78	99.74
5	99.82	99.70	99.80	99.75

In the multi-class classification task, as presented in Table III, SmartCPS-ADAPT again outperforms other advanced models. It achieves 99.85% accuracy, 99.7% precision, 99.8% recall, and an F-score of 99.75%, outperforming existing approaches such as Blending [11] and SMOTE + AE + LSTM + CNN [15]. While blending achieved strong results (99.51% accuracy, 99.07% F-score), SmartCPS-ADAPT consistently improved across all metrics, especially in precision and recall, reflecting its ability to correctly distinguish among multiple complex attack classes, including DoS, DDoS, Mirai, Spoofing, and Web-based attacks. Other methods, such as RE + GA + DNN [13] and XGB [19], fall significantly short, particularly in precision and recall, which limits their reliability in real-world CPS and IoT scenarios. The performance comparison under simulated concept drift conditions is presented in Table IV, highlighting the robustness of the proposed adaptive mechanism. Table V presents a comparative analysis of binary classification performance against state-of-the-art methods. Table VI summarises the multi-class classification performance, demonstrating the superiority of the proposed framework across all evaluation metrics’s all evaluation metrics.

Table III. Ablation study evaluating the contribution of individual components in the SmartCPS-ADAPT framework

Model	Accuracy
CNN	97.6
CNN + LSTM	98.8
CNN + BiLSTM	99.1
CNN + BiLSTM + XGB	99.63
SmartCPS-ADAPT	99.85

Table IV. Performance comparison under simulated concept drift conditions

Model	Accuracy after drift
Baseline CNN-BiLSTM	92.4
SmartCPS-ADAPT	98.7

Table V. Comparative study On Ciciot2023 2 class binary-class classification

Ref	Model	Accuracy	Precision	Recall	F-Score
[14]	CKAN	99.22	99.81	99.4	99.6
[16]	LightGBM	100	95.1	98.5	96.6
	XGB	100	98.7	99.8	97.4
	XGB-RF	100	98.7	92.3	95.2
[17]	ST + LSTM	98.78	98.54	98.77	98.56
	SmartCPS-ADAPT	100	99.9	99.95	99.92

Table VI. Comparative study On Ciciot2023 8 class multi-class classification

Ref	Model	Accuracy	Precision	Recall	F-Score
[11]	ET	97.07	95.07	94.08	94.57
	RF	95.36	95.36	97.31	96.33
	GB	96.4	94.4	95.39	94.89
	Blending	99.51	98.51	99.63	99.07
[13]	RE + GA + DNN	92.16	58.82	66.67	62.5
[13]	RE + GA + RF	94.81	65.84	77.25	64.96
[15]	SMOTE + AE + STM + CNN	99.15	99.39	99	99.19
[16]	LightGBM	99.1	91	89.7	76.8
	XGB	99.7	98.7	88	90.1
	XGB-RF	99.6	95.6	84.5	88.6
[17]	ST + LSTM	97.44	97.67	97.37	97.76
[19]	XGB	63.6	63.08	63.23	62.91
	SmartCPS-ADAPT	99.85	99.7	99.8	99.75

Overall, SmartCPS-ADAPT establishes itself as a robust, scalable, and highly accurate IDS framework capable of outperforming traditional ML, ensemble, and hybrid DL approaches by effectively extracting discriminative features, learning temporal patterns, and adapting to evolving threats.

Although the proposed model achieved extremely high detection accuracy, several techniques were employed to mitigate potential overfitting. These include dropout regularization, batch normalization, and early stopping during training. Furthermore, cross-validation experiments confirm that the model maintains stable performance across multiple data partitions.

V.CONCLUSION

This work addressed the growing security challenges in CPS and IoT–edge–cloud environments, where the increasing scale and complexity of networks expose systems to sophisticated cyberattacks. The study was motivated by the research gap in existing IDS, where many models either fail to generalize across evolving threats, suffer from imbalanced performance between accuracy and recall, or demand high computational costs. The problem addressed in this work is the need to develop a robust DL-based classification framework that accurately detects cyberattacks in CPS using packet-level analysis. The primary objective of this work was to design an adaptable IDS, integrate effective preprocessing, feature extraction, and classification techniques, and incorporate drift detection to maintain long-term robustness. Hence, the proposed SmartCPS-ADAPT methodology successfully fulfilled these objectives. It achieved 100% accuracy, 99.9% precision, 99.95% recall, and 99.92% F-score for binary-class classification and 99.85% accuracy, 99.7% precision, 99.8% recall, and 99.75% F-score for multi-class classification on the CICIoT2023 dataset. These results demonstrate significant improvements compared to state-of-the-art methods, ensuring highly reliable detection of both common and advanced attack types. For future work, SmartCPS-ADAPT can be extended to develop resilient CPS architectures using ML/DL models that ensure continuous operation and adaptability, even under adverse conditions such as large-scale distributed or adaptive adversarial attacks.

SmartCPS-ADAPT: An Intrusion Detection System for Cyber-Physical Systems