I.INTRODUCTION

Food security has long been recognized as one of the most critical global challenges, closely linked to the achievement of the United Nations Sustainable Development Goals (SDGs) [1]. Ensuring consistent access to affordable, safe, and nutritious food is fundamental to human development, yet many regions continue to experience volatility in both supply and distribution [2]. Increasing socioeconomic pressures, climate variability, and disruptions in global trade have further undermined food system stability, creating a demand for innovative approaches in prediction, monitoring, and management. Within this context, artificial intelligence (AI) and machine learning (ML) are increasingly applied to address such complexities through predictive analytics and data-driven decision support [3]. Indonesia, with its diverse agricultural base, illustrates both the opportunities and vulnerabilities of food systems management [4]. Aceh Province, in particular, represents a critical case due to its reliance on staple commodities such as rice, starchy foods, and fish [5].

Although local production remains central to food availability, recurring issues including price instability, inefficient distribution networks, and regional disparities in supply–demand balance continue to compromise system resilience [6]. Traditional statistical models often fall short in capturing these multidimensional and nonlinear interactions, thereby limiting their utility for long-term planning and effective policy design [7].

ML provides a promising alternative by integrating heterogeneous datasets and uncovering latent patterns in complex systems [8]. Predictive models such as regression, clustering, and classification enable the forecasting of commodity trends, identification of vulnerable regions, and classification of supply chain stability [9]. Recent studies have applied ML to agricultural forecasting, commodity price modeling, and supply chain analysis [10]. However, most approaches have remained methodologically fragmented, focusing on a single ML technique or lacking an integrated framework that can provide comprehensive and actionable insights for policy [11]. In the context of Aceh, fragmented applications of ML risk overlooking critical interdependencies between production, consumption, and population growth [12]. For example, regression models may predict future price trajectories but fail to identify stability levels within the supply chain, while clustering methods may group regions by vulnerability without providing forward-looking projections [13]. Addressing these limitations requires a more holistic framework that combines complementary ML methods to deliver both predictive accuracy and diagnostic clarity [14]. This study introduces an integrated ML framework that combines ridge regression for commodity price forecasting, K-means clustering for regional vulnerability detection, and random forest (RF) classification for supply chain stability analysis.

The contributions of this study are threefold. First, it proposes a multistage ML framework that unifies regression, clustering, and classification into a single system for smart food systems management. Second, it applies this framework to Aceh Province, a region characterized by unique food system dynamics but limited empirical research using advanced data-driven methods. Third, it provides empirical evidence of the policy relevance of ML-based decision support, demonstrating how predictive and diagnostic outputs can inform targeted interventions and strengthen food resilience.

Unlike prior studies that typically apply a single ML technique in isolation, this research advances a fully integrated framework that captures interdependencies across forecasting, clustering, and classification tasks. This holistic approach not only enhances methodological robustness but also ensures greater policy applicability, marking a clear novelty in the context of regional food systems optimization.

The remainder of this paper is structured as follows. Section II reviews related works on ML applications in food systems management and supply chain analysis. Section III outlines the methodology, including dataset description and the integrated ML framework. Section IV presents the results of regression forecasting, vulnerability clustering, and supply stability classification. Finally, Section V concludes the study and outlines directions for future research.

II.RELATED WORKS

In recent years, a growing body of research has examined the role of ML in addressing food security and optimizing supply chain management. Previous studies have primarily focused on global or national contexts, applying various predictive and classification models to assess production trends, price dynamics, and distribution vulnerabilities. These works demonstrate the potential of data-driven approaches to support decision-making in agriculture and food systems, particularly through techniques such as regression forecasting, clustering, and classification.

Despite these contributions, most existing studies remain concentrated on broader regions or specific commodities, leaving significant gaps at the subnational level. In particular, research addressing localized food system challenges in Aceh, Indonesia, is virtually absent. The unique socioeconomic characteristics, agricultural diversity, and regional vulnerabilities in Aceh necessitate a tailored approach that integrates both forecasting and classification within a unified framework.

Therefore, this study distinguishes itself by proposing an integrated ML framework specifically designed for the food system in Aceh. Unlike prior works that address food security at the global or national scale, this research combines regression forecasting, clustering optimization, and stability classification to generate actionable insights for regional policymakers. This contribution highlights the novelty of applying a comprehensive, localized analytical framework to support food system resilience in Aceh. Table I presents a comparative overview of previous works on ML applications in food security and supply chain management.

Table I. Comparative analysis of food security and supply chain research and differences with current study

ReferenceMethodologyObjectivesTechniques usedKey contributions
[15]Purposive and random samplingAssess food security under COVID-19 and climate changeANOVACOVID-19 reduced yields and supply chain; climate change a major threat; recommended subsidies and adaptation
[16]Literature review; case studies (Canada and USA)Examine COVID-19 impact on food security and GFSC; propose resilience frameworkAnalysis of open data and prior studiesIdentified GFSC disruptions (labor, transport, production, and demand); proposed framework for smarter, resilient post-COVID-19 food supply chains
[17]Bibliometric analysisReview evolution of agri-food supply chain research and identify trendsTopic mappingIdentified emerging topics (blockchain, IoT, resilience, and short food supply chains), hot topics (LCA, environmental impact, and food waste), and common SCM and SSCM practices
[18]PESTEL analysis; ANP and MAIRCA methodsIdentify factors of blockchain in agri-food supply chainsPESTEL, Analytic Network Process (ANP), MAIRCADetermined 12 critical success factors; highlighted top factors: “prevent food waste,” “increase food security,” “product lifecycle tracking”; linked blockchain adoption with circular economy and sustainability
[19]Comparative reviewExplore urban farming’s impact on food supply in the USA and African citiesLiterature review, policy, and case analysisHighlighted role of urban farming in food security; identified success factors in USA
[20]Review and synthesis analysisExamine benefits and challenges in food supply chainsLiterature review, synthesis analysisEnhances efficiency in food supply chains
[21]Review and conceptual frameworkExamine blockchain and IoT integration in agri-food supply chains; propose architectureLiterature review, Agri-SCM-BIoT frameworkProposed blockchain + IoT architecture for transparency, traceability, security, privacy, and scalability
[22]Systematic literature review + single use-case analysisExplore blockchain’s role in achieving operational excellenceCIMO logic, semi-structured interviewsShowed blockchain features (immutability, transparency, traceability, and smart contracts) enhance responsiveness, flexibility, efficiency, and collaboration in PFSC under COVID-19
[23]Survey (n = 398, Thailand)Identify drivers of (FDAs)Partial least squares (PLS)Practical implications for FDA retention strategies
[24]Time-series analysisPredict food production for policymaking and food security planningMachine learning: Adaptive Network-based Fuzzy Inference System (ANFIS)ANFIS with Gbell membership functions provided lowest prediction error
Current studyMultistage machine learning frameworkOptimize smart food systems management in Aceh through forecasting, vulnerability assessment, and supply chain monitoringRidge regression, K-means clustering with SAW centroid initialization, SVM (RBF and sigmoid), random forestIntegrated prediction, clustering, and classification for actionable insights; high forecasting accuracy optimized clustering (CH: 40.887; Silhouette: 0.288), and robust classification (SVM-RBF 94.59%, random forest 98.89%); supports evidence-based policy and resilience planning

III.MATERIALS AND METHODS

This study applies a suite of ML approaches to advance smart food systems management in Aceh, Indonesia. The methodological framework encompasses three core components: commodity price forecasting, optimization of regional food vulnerability clustering, and classification of food supply chains.

A.PROPOSED METHOD

This study proposes a three-stage ML framework to enhance smart food systems management in Aceh, as illustrated in Fig. 1. The first stage focuses on commodity price forecasting using ridge regression, chosen for its ability to handle multicollinearity and provide stable predictions. Forecast accuracy is evaluated with mean squared error (MSE) and root mean squared error (RMSE) to ensure minimal prediction error.

Fig. 1. Proposed method of machine learning approaches for smart food systems management in Aceh.

The second stage addresses regional food vulnerability clustering. K-means groups districts based on vulnerability profiles, while the integration of simple additive weighting (SAW) optimizes centroid initialization, enhancing cluster stability and interpretability. Clustering performance is measured using the Calinski–Harabasz (CH) index and Silhouette Score (SS) to ensure cohesion and separation.

The final stage involves classifying the food supply chain into distinct categories using support vector machine (SVM) and RF, representing margin-based and ensemble learning approaches. Model robustness and generalization are validated through 10-fold cross-validation to ensure statistically reliable results resistant to overfitting.

Algorithm selection at each stage is systematically guided by the Design of Experiments (DOE) framework [25], ensuring that choices are grounded in methodological reasoning rather than arbitrariness. The process relies on three key criteria: interpretability and policy relevance, computational efficiency relative to the dataset’s size and structure, and robustness against overfitting, verified through comprehensive cross-validation.

Accordingly, ridge regression is chosen for price forecasting due to its ability to handle multicollinearity while maintaining transparent coefficient interpretation. K-means is employed for clustering because of its simplicity and efficiency and is further refined via the SAW method to stabilize centroid initialization. For classification, SVM and RF are utilized to capture two distinct learning paradigms—margin-based and ensemble-based enabling a thorough comparative evaluation.

The DOE-guided framework ensures that algorithm selection is conducted in a structured, criteria-driven manner, promoting methodological rigor, transparency, and reproducibility over intuition or random choice.

B.FOOD COMMODITY PRICE FORECASTING USING A RIDGE REGRESSION MODEL

Price forecasting is conducted using ridge regression, a regularized linear regression technique designed to address multicollinearity and mitigate overfitting by introducing an L2 penalty term into the cost function, as formally expressed in Equation (1) [26]:

y^i=β0j=1pβjxij+αj=1pβj2
where y^i represents the predicted commodity price, βj denotes the regression coefficients, and α is the regularization parameter optimized through cross-validation.

The methodological procedure comprises the following steps:

1).DATA COLLECTION

The dataset employed in this study is obtained from the Aceh Food Agency (Dinas Pangan Aceh), which records the average annual retail prices of 12 strategic food commodities. The commodities include rice (premium and medium), dried soybeans, shallots, garlic, red chili peppers, beef, broiler chicken meat, chicken eggs, granulated sugar, packaged cooking oil, and wheat flour. The dataset spans from 2017 to 2024, as shown in Table II.

Table II. Annual average retail prices of strategic food commodities in Aceh (2017–2024)

NoCommodity20172018201920202021202220232024
1Premium rice11.00011.20011.20012.26011.47012.50013.00013.200
2Medium rice10.50010.60010.90011.09811.00011.80012.20012.400
3Dried soybeans11.50011.76011.85010.10212.00013.00013.20013.500
4Shallots35.00033.40026.30038.06230.71534.00036.50037.000
5Garlic (bulb)24.30025.85030.00031.60026.20628.00029.00030.200
6Curly red chili peppers35.37528.50038.00032.37134.73936.00038.00040.000
7Pure beef125.000132.900140.000140.879147.500150.000153.000155.000
8Broiler chicken meat26.00027.00027.50029.73527.40630.00031.20032.000
9Broiler chicken eggs19.50020.80022.00022.42030.05031.50032.80034.000
10Local granulated sugar12.50013.00013.00014.83214.00015.00015.50016.000
11Packaged cooking oil (simple)10.00010.00010.50011.48416.17817.50018.00018.500
12Bulk wheat Flour7.5007.6507.7508.4069.0009.50010.00010.200

2).MODEL TRAINING

Ridge regression is applied, and cross-validation is performed to determine the optimal penalty parameter (α), minimizing predictive bias and variance.

3).FORECASTING

The trained model is employed to project commodity prices for 2025–2028.

4).MODEL EVALUATION

Predictive performance is assessed using standard statistical indicators, including MSE and RMSE, as expressed in Equations (2) and (3), respectively:

MSE=ini=1n(YiY^i)2
where n is the total number of observations. A lower MSE indicates that the predicted values are closer to the actual observed values, reflecting better model performance. MSE is particularly sensitive to large errors because deviations are squared, thus giving more weight to outliers.
RMSE=1ni=1n(YiY^i)2

In this formula, Yi denotes the observed value, Y^i is the predicted value, and n is the number of observations. A lower RMSE indicates higher forecasting accuracy and makes interpretation easier compared to MSE.

C.REGIONAL FOOD VULNERABILITY CLUSTERING OPTIMIZATION

This stage assesses and optimizes regional food vulnerability in Aceh to identify districts requiring prioritized interventions. The analysis uses the Annual Food Supply and Demand Data Across Commodities in Aceh, with variables listed in Table III. Two clustering approaches are applied: standard K-means and SAW-K-means, which integrates SAW to improve centroid initialization, enhancing clustering stability and robustness.

Table III. Annual Food Supply and Demand Data Across Commodities in Aceh, Indonesia

VariableDescriptionUnit
RegionName of the observed region/areaRegion name
YearYear of food data observationYear
PopulationTotal population in the region for a specific yearPeople
Rice supplyTotal rice availability in a specific yearTons
Rice surplusDifference between supply and demand of rice (positive = surplus, negative = deficit)Tons
Starchy food supplyAvailability of starchy foods (cassava, maize, sweet potato, etc.)Tons
Starchy food demandTotal consumption demand for starchy foodsTons
Starchy food surplusDifference between supply and demand of starchy foodsTons
Sugar supplyTotal sugar availabilityTons
Oilseed surplusDifference between supply and demand of oilseedsTons
Fruit supplyAvailability of fruitsTons
Fruit demandTotal fruit consumption demandTons
Fruit surplusDifference between supply and demand of fruitsTons
Vegetable supplyAvailability of vegetablesTons
Vegetable demandTotal vegetable consumption demandTons
Milk surplusDifference between supply and demand of milkTons
Oil supplyAvailability of edible oil (vegetable/animal-based)Tons
Oil demandTotal oil consumption demandTons
Oil surplusDifference between supply and demand of oilTons

1).K-MEANS CLUSTERING

The K-means algorithm is applied through the following steps [27]:

  • 1.Initialize centroidsRandomly select k initial centroids from the dataset to serve as starting points.
  • 2.Assign districtsEach district i is assigned to the nearest centroid by minimizing the Euclidean distance, defined in Equation (4):
    xi-μk
    where xi denotes the vector of vulnerability indicators for district i and μk is the centroid of cluster k.
  • 3.Update centroidsRecompute each centroid μk as the mean of all points in cluster Ck.
  • 4.Iterate until convergenceRepeat steps 2–3 until cluster assignments stabilize. The objective is to minimize the within-cluster sum of squares (WCSS), defined in Equation (5):
    WCSS=k=1KiεCkxi-μk2
    where K is the number of clusters, Ck is the set of districts in cluster k, and μk is the cluster centroid.
  • 5.Evaluate cluster qualityThe clustering performance was quantitatively evaluated using the CH index and SS, defined in Equations (6) and (7):
    CH=Tr(Bk)/(K-1)Tr(Wk)/(n-K)
    Tr(Bk) is the trace of the between-cluster dispersion matrix, Tr(Wk) is the trace of the within-cluster dispersion matrix, K is the number of clusters, and n is the total number of observations. Higher CH values indicate more distinct and well-separated clusters:
    s(i)=b(i)-a(i)max{b(i)-a(i)}
    a(i) is the average distance between observation i and other points in the same cluster, while b(i) is the minimum average distance to points in other clusters. The score ranges from −1 to 1, with higher values indicating more cohesive and well-separated clusters.

2).SAW-K-MEANS CLUSTERING

The SAW-K-means approach was implemented to enhance clustering stability and interpretability. By integrating SAW with K-means, cluster center initialization becomes more structured, reducing randomness, improving computational efficiency, and ensuring reliable clustering outcomes.

  • 1.Compute Composite Vulnerability Scores.Each district i receives a composite score Si using SAW, defined in Equation (8) [28]:
    Si=j=1mwj.rij
    where wj is the weight of indicator, rij is the normalized value of indicator j for district i, and m is the total number of indicators. In this study, the weight wj is automatically assigned using an equal distribution method, yielding a value of 0.027778 for each criterion. The weighting process follows an equal-weight approach, where the total weight is evenly divided among all identified criteria. This method ensures that each criterion contributes equally to the overall assessment, thereby promoting fairness and minimizing potential bias toward any specific attribute in the final ranking outcome.
  • 2.Initialize Centroids.Districts with the highest Si scores are selected as initial centroids to ensure highly vulnerable regions are represented from the outset.
  • 3.Apply K-Means Algorithm.Follow the standard K-means procedure (assignment, centroid update, and iteration) using SAW-based centroids.
  • 4.Evaluate Cluster Quality.Cluster quality was evaluated using CH index and SS
  • 5.Identify Food Vulnerable Districts.Districts belonging to the cluster with the highest average Si were designated as Food Vulnerable areas.

D.FOOD SUPPLY CHAIN CLASSIFICATION

This research employs SVM and RF to classify food supply chain stability, as both algorithms are capable of handling high-dimensional datasets and modeling nonlinear dependencies. The models assigned districts to distinct classes based on supply chain characteristics using the Annual Food Supply and Demand Data Across Commodities in Aceh, with variables detailed in Table IV, offering actionable insights for policymakers to identify both stable and vulnerable regions.

Table IV. Variables for food supply chain classification

VariableDescription
CommodityType of food commodity (e.g., rice, sugar, fish, vegetables, etc.)
DistrictAdministrative region in Aceh where the data were collected
YearYear of observation
Supply (tons)Total food supply available in the district
PopulationTotal number of inhabitants in the district
Consumption requirement (tons)Estimated food demand based on population size and dietary needs
Surplus (tons)Difference between supply and consumption requirement
StatusClassification label indicating supply chain condition (e.g., surplus, deficit, or balanced)

1).SUPPORT VECTOR MACHINE (SVM)

SVM is a supervised method designed to find the most effective hyperplane that distinguishes between classes [29]. For linearly separable data, the decision function is given in Equation (9):

f(x)=w*x+b
where w represents the weight vector, x is the input feature vector, and b is the bias. The objective is to maximize the margin between support vectors. Mathematically, this objective can be expressed in Equation (10):
minw,b12w2subjecttoYi(w*xi+b)1

For nonlinearly separable data, the kernel trick projects input features into a higher-dimensional space. Common kernels include the radial basis function (RBF), defined in Equation (11):

K(xi,xj)=exp(γxi-xj2)

Here, γ controls the influence of individual samples. SVM was used to classify districts by mapping multidimensional food system features to an optimal decision boundary.

2).RANDOM FOREST (RF)

As an ensemble learning approach, RF strengthens classification performance through the integration of predictions from many individual decision trees [30]. Each decision tree is constructed using a bootstrap sample, with node divisions chosen from a randomly selected group of features. In this model, the concluding prediction is determined through a majority-vote mechanism across all decision trees, as represented in Equation (12):

Y^=mode{h1(x),h2(x),,hr(x)}

In this equation, Y^ denotes the final predicted class label, where Y^ is determined based on the individual predictions h1(x),h2(x),,hr(x) generated by each decision tree within the ensemble. The mode function identifies the most frequently occurring class label among these predictions, thereby determining the overall output of the RF model through a majority voting mechanism. The splitting criterion in each decision tree is typically based on Gini Impurity, defined in Equation (13):

Gini(D)=1-i=1Cpi2
where pi is the proportion of samples belonging to class i and C is the number of classes.

IV.RESULTS AND DISCUSSION

A.RESULTS OF FOOD COMMODITY PRICE PREDICTION USING A RIDGE REGRESSION MODEL

The first analysis stage forecasted food commodity prices using ridge regression, which mitigates multicollinearity and enhances model generalization through L2 regularization. Historical annual supply and demand data for Aceh are used to train the model and generate forecasts for 2025–2028. Performance was evaluated using MSE and RMSE. Table V and Fig. 2 present the forecasted prices for key commodities.

Table V. Forecasted food commodity prices in Aceh, Indonesia (IDR/kg), using ridge regression model

Commodity2025202620272028
Premium rice15.27716.22317.17018.117
Medium rice14.33215.20716.08216.957
Dry soybeans15.39116.33717.28318.228
Red onion41.72244.09746.47248.848
Garlic bulb35.03237.08039.12941.177
Curly red chili44.88547.63550.38453.134
Fresh beef183.259194.737206.215217.694
Broiler chicken36.88639.18441.48243.781
Broiler eggs40.15643.52846.89950.270
Granulated sugar (local)18.57719.79021.00222.215
Packaged cooking oil (basic)22.09824.07026.04128.012
Wheat flour (bulk)11.90412.74813.59114.435

Fig. 2. Forecasted food commodity prices in Aceh, Indonesia (IDR/kg), using ridge regression model.

Forecasts indicate a general upward trend in Aceh’s food commodities from 2025 to 2028. Staple grains such as premium and medium rice and dry soybeans grow steadily, while high-value commodities like fresh beef and red chili rise sharply due to limited supply and market sensitivity. Perishable vegetables show significant increases, whereas processed goods grow moderately. Ridge regression highlights commodity vulnerability, with staples remaining resilient and high-demand items more exposed to shocks. Predictive accuracy, assessed using MSE and RMSE, provides insight into model performance, with Table VI comparing results across all commodities. Comparative error analysis of ridge regression reveals substantial variations across commodities. Beef shows the highest errors due to supply shocks and seasonal demand, while shallots, red chili, and garlic also exhibit elevated errors from perishability and climate sensitivity. Staples like wheat flour, medium rice, and cooking oil have the lowest errors, reflecting stable markets, with sugar and dried soybeans in the mid-range. MSE highlights extreme deviations, while RMSE provides a unit-consistent measure of forecast accuracy. Fig. 3 and 4 visualize these results.

Table VI. Comparative evaluation of predicted food commodity prices (2025–2028)

NoCommodityMSERMSE
1Beef (pure)394,006,10019,849.586
2Shallots31,715,2005,631.625
3Curly red chili29,310,4505,413.912
4Garlic bulbs18,905,2904,348.021
5Broiler chicken meat16,452,6404,056.185
6Broiler chicken eggs9,751,1553,122.684
7Granulated sugar (local)3,783,8631,945.215
8Dried soybeans3,224,8231,795.779
9Premium rice2,842,3241,685.919
10Simple packaged cooking oil2,756,2571,660.198
11Medium rice2,539,5931,593.610
12Bulk wheat flour1,263,6581,124.126
Average394,006,10019,849.586

Fig. 3. Mean squared error (MSE) of ridge regression predicted food commodity prices (2025–2028) in Aceh.

Fig. 4. Root mean squared error (RMSE) of ridge regression predicted food commodity prices (2025–2028) in Aceh.

Table VII displays the evaluation results of the ridge regression model across multiple K-fold cross-validation configurations. The primary objective of this experiment is to identify the optimal number of folds (K) that produces the most reliable and accurate model performance. Each configuration is evaluated using two principal performance metrics: MSE and RMSE, which quantify, respectively, the average magnitude of prediction errors and the extent of their variability. The cross-validation results show that the ridge regression model performs consistently across different fold settings. The lowest MSE occurs at K = 6, indicating a balanced bias–variance trade-off, while the lowest RMSE is achieved at K = 8, reflecting greater predictive stability. Therefore, K = 8 is considered the optimal configuration for subsequent evaluation and forecasting, as it minimizes prediction error and ensures robust validation.

Table VII. Evaluation results of ridge regression with various K-fold cross-validation values

NoNumber of folds (K)MSERMSERemark
1583,835,322.185,229.04
2677,351,844.184,834.10Lowest MSE
3781,319,392.705,010.14
4879,878,235.494,727.36Lowest RMSE

B.RESULTS OF REGIONAL FOOD VULNERABILITY CLUSTERING OPTIMIZATION

The second stage of this study examines regional food vulnerability clustering to identify districts in Aceh that require prioritized interventions. Two approaches are implemented: the standard K-means clustering and the enhanced SAW-K-means clustering. Both methods aimed to classify districts into distinct vulnerability groups based on their food supply–demand balance across multiple commodities. The integration of the SAW method into the initialization process was designed to reduce randomness in centroid selection, thereby enhancing clustering stability and interpretability.

Clustering identifies three categories: Food Secure, Food Vulnerable, and Food Insecure. Food Secure districts have stable surpluses, reflecting a resilient supply chain. Food Vulnerable districts experience fluctuating supply–demand balances, indicating potential exposure to shocks. Food Insecure districts faced persistent deficits, highlighting structural weaknesses in availability and distribution. SAW-K-means produces a more balanced and interpretable distribution than standard K-means, with the Food Insecure cluster aligning closely with official vulnerability indicators.

1).STANDARD K-MEANS CLUSTERING RESULTS

Standard K-means is applied to classify food vulnerability across Aceh districts into three categories: Food Secure, Food Vulnerable, and Food Insecure, with results shown in Table VIII. Coastal urban districts such as Banda Aceh, Sabang, and Lhokseumawe were Food Secure, reflecting stable food access and infrastructure. Remote inland districts, including Gayo Lues, Aceh Tenggara, Simeulue, and Bener Meriah, were Food Insecure due to geographic isolation and limited productivity. Most other districts, such as Pidie, Aceh Utara, and Aceh Timur, were Food Vulnerable, indicating susceptibility to supply–demand fluctuations. Cluster distribution from 2017 to 2024 is shown in Fig. 5, with a corresponding heatmap in Fig. 6.

Table VIII. Standard K-means clustering results for food vulnerability in Aceh Province, Indonesia

Cluster categoryDistricts
Food SecureBanda Aceh, Sabang, Lhokseumawe, Langsa, Subulussalam
Food VulnerableAceh Besar, Pidie, Bireuen, Aceh Utara, Aceh Timur, Aceh Tamiang, Aceh Jaya, Aceh Barat Daya, Nagan Raya, Aceh Selatan
Food InsecureGayo Lues, Aceh Tenggara, Aceh Singkil, Simeulue, Bener Meriah, Aceh Tengah

Fig. 5. Food security cluster distribution in Aceh, Indonesia (2017–2024).

Fig. 6. Food security heatmap cluster distribution in Aceh, Indonesia (2017–2024).

To assess the performance and stability of the K-means algorithm, ten runs with different initial centroid selections were conducted. Convergence speed was measured by iteration count, while clustering quality was evaluated using the CH index and the SS. Results show that the CH index remained high (≈45.5) and the SS stable at 0.30, reflecting consistent cluster separability with moderate cohesion. Test 4 achieved the best outcome (CH = 45.75, 6 iterations), while Test 6 performed worst (CH = 35.35, Silhouette = 0.29). Overall, K-means proved robust, though centroid initialization influenced efficiency and cluster quality as shown in Table IX.

Table IX. Performance evaluation of K-means iterations with different initial centroids

Test no.Initial centroids (data index)Number of iterationsCalinski–Harabasz indexSilhouette Score
142, 142, 1831145.540.30
24, 124, 43845.540.30
391, 178, 48945.380.30
4146, 169, 13645.750.30
561, 123, 84945.690.30
661, 155, 32635.350.29
7123, 137, 13745.670.30
870, 74, 221445.640.30
991, 62, 281145.380.30
1079, 80, 47645.670.30

2).SAW K-MEANS CLUSTERING RESULTS

The SAW-K-means method applies SAW to calculate feature weights, enabling more systematic centroid initialization and improving cluster distinction. This refinement addresses the key limitations of conventional K-means, namely its sensitivity to random centroid selection and the assumption of equal feature importance. The results are summarized in Table X.

Table X. SAW scores and rankings for the regions

No.SAW scoreRanking
15315.34461
15514.13722
1288.67373
1548.19764
1097.49915
887.35066
1257.18997
1686.83468
1666.76639
1676.709310

Table X presents the top 10 results from a dataset of 184 regions ranked using the SAW method. In the SAW-K-means hybrid framework, centroid initialization was guided by SAW rankings, with one centroid each selected from the highest, middle, and lowest scores across 10 test iterations. Clustering performance was subsequently assessed using CH index and SS, demonstrating improved stability and quality through the integration of prior ranking information as shown in Table XI.

Table XI. SAW-K-means performance evaluation

Test no.Initial centroids (data index)IterationsCH scoreSilhouette Score
1153, 116, 12835.350.29
2155, 102, 24835.170.29
3128, 103, 146945.370.3
4154, 115, 80433.720.3
5109, 75, 35845.210.3
688, 101, 38845.690.3
7125, 6, 1381145.090.3
8168, 20, 98731.850.2
9166, 53, 4745.720.3
10167, 36, 74545.70.3

Table XI presents the clustering performance of the SAW-K-means hybrid method across 10 iterations. Using SAW for centroid initialization enhances stability, with CH indices ranging from 31.85 to 45.72 and SSs mostly around 0.3. Iterations 3, 6, 9, and 10 achieve higher CH scores above 45, indicating clearer cluster separation, while iteration 8 records the weakest performance with the lowest CH (31.85) and Silhouette (0.2). These results confirm that SAW-based centroid selection improves consistency and reliability, with iteration counts (4–11) reflecting adaptive convergence. A comparison of average results with standard K-means is shown in Table XII and Fig. 7.

Table XII. Comparison of average performance metrics between standard K-means and SAW-enhanced K-means

MethodAverage iterationsAverage CH scoreAverage silhouette
SAW + K-means7.540.8870.288
K-means8.744.5610.299

Fig. 7. Comparison of average performance metrics: standard K-means vs. SAW-K-means.

The comparison between standard K-means and SAW-K-means shows that SAW-based centroid initialization improves convergence speed (7.5 vs. 8.7 iterations). While standard K-means achieves slightly higher CH (44.561 vs. 40.887) and SSs (0.299 vs. 0.288), the differences are marginal, indicating comparable clustering quality overall.

C.RESULTS OF FOOD SUPPLY CHAIN CLASSIFICATION

In this stage, the stability of the food supply chain across districts in Aceh is assessed using supervised learning algorithms, specifically SVM and RF. The classification models were trained to categorize regions into predefined supply chain stability classes, leveraging features such as commodity supply, population demand, and surplus levels.

1).SVM RESULTS

SVM is applied to classify the stability of food supply chains by testing two kernel functions, namely RBF and sigmoid. Both kernels are selected to capture nonlinear relationships within the dataset, while variations of the penalty parameter C and kernel coefficient γ are analyzed to optimize performance. The results of SVM classification with the RBF kernel are summarized in Table XIII and Fig. 8.

Table XIII. Classification accuracy of SVM with RBF kernel for different values of C and γ

Cγ (Gamma)Mean accuracy (%)
1000.0175.92
1000.183.70
1001.090.28
1005.091.59
10015.092.61
10050.093.51
2000.0177.99
2000.184.45
2001.091.53
2005.092.37
20015.093.48
20050.094.59

Fig. 8. Comparison of classification accuracy of SVM with RBF and sigmoid kernels for different values of C and γ.

SVM with RBF kernel showed strong sensitivity to C and γ, with accuracies below 85% at low γ (0.01–0.1) and improving above 90% for C = 100–200. The best accuracy of 94.59% occurred at C = 200 and γ = 50. SVM with sigmoid kernel was also evaluated across C and γ to assess its handling of nonlinear separability. In Table XIV, the SVM with a sigmoid kernel was evaluated across varying C and γ values to assess its capability in modeling nonlinear separability within the dataset.

Table XIV. Classification accuracy of SVM with sigmoid kernel for different values of C and γ

Cγ (Gamma)Mean accuracy (%)
1000.0146.00
1000.139.59
1001.030.74
1005.034.50
10015.038.92
10050.031.99
2000.0144.37
2000.136.33
2001.030.65
2005.034.50
20015.038.95
20050.031.99

The sigmoid kernel performed poorly, reaching a maximum accuracy of only 46% (C = 100, γ = 0.01), with most results fluctuating between 30 and 40%, indicating instability and underfitting. In contrast, the RBF kernel consistently delivered superior performance, attaining 94.59% at optimal settings, thereby demonstrating its effectiveness in modeling food supply chain stability.

2).RF RESULT

The classification performance of the RF model across the three stability classes is presented in Table XV. This report provides a detailed overview of the precision, recall, and F1-scores, highlighting the model’s consistency in classifying Adequate, Deficit, and Surplus conditions.

Table XV. Classification report of random forest classifier

Metric/classAdequateDeficitSurplusAccuracy/fold accuracies
Precision0.99480.98860.9835Fold 1: 99.13%
Recall0.99480.98080.9913Fold 2: 97.97%
F1-score0.99480.98470.9874Fold 3: 98.55%
Support114511451145Fold 4: 99.13%
Macro avg0.98900.98890.9889Fold 5: 98.84%
Weighted avg0.98900.98890.9889Fold 6: 99.71%
Overall accuracy0.9889Fold 7: 99.42%
Fold 8: 97.38%
Fold 9: 100.00%
Fold 10: 98.83%
Mean accuracy: 98.89%
Std. dev: 0.75%

RF consistently classified food supply chain stability with high performance, achieving F1-scores of 0.985–0.995, a mean 10-fold accuracy of 98.89%, and low variability (0.75%). The RF model is configured with the hyperparameter n_estimators = 100, which specifies the number of decision trees in the ensemble. This value strikes an effective balance between model accuracy and computational efficiency, as adding more trees beyond this point typically yields diminishing performance improvements. The parameter random_state = 42 is applied to ensure the reproducibility of results across different runs. Other hyperparameters, including max_depth, min_samples_split, and min_samples_leaf, are retained at their default settings, allowing the model to adjust dynamically to the dataset’s characteristics. The chosen configuration is evaluated using 10-fold stratified cross-validation, which confirms stable and reliable model performance without the need for extensive hyperparameter optimization.

3).COMPARISON OF SVM AND RF CLASSIFICATION PERFORMANCE

As shown in Fig. 9, RF outperformed SVM, achieving 98.89% mean accuracy, low variability (std. dev = 0.75%), and high F1-scores (0.985–0.995), making it the more robust and reliable model.

Fig. 9. Comparative accuracy of SVM and random forest models in classifying regional food supply chain stability.

Beyond the comparison between SVM and RF in this study, other ensemble methods, such as LightGBM, modified KNN, and decision trees, can be explored in future work. While boosting-based models often achieve slightly higher accuracy, they remain more sensitive to hyperparameter tuning and demand greater computational resources. In contrast, RF offers a balanced trade-off between accuracy, stability, and interpretability, making it particularly well suited for policy-oriented analyses. Consequently, its selection in this framework is methodologically justified, with boosting-based ensembles proposed for subsequent evaluation.

V.CONCLUSION

This study developed a multistage ML framework to analyze food commodity price dynamics and supply chain stability in Aceh, Indonesia. By combining ridge regression, SAW-K-means clustering, and RF classification, the framework provided a comprehensive approach to forecasting prices, identifying regional vulnerabilities, and classifying supply chain stability. Ridge regression effectively forecasted commodity prices for 2025–2028, addressed multicollinearity challenges, and captured trends across both staple and high-value commodities. While volatility was more evident in perishable and high-demand products, staple commodities demonstrated greater predictability.

In the clustering stage, standard K-means grouped districts into Food Secure, Food Vulnerable, and Food Insecure categories; however, its sensitivity to centroid initialization limited clustering stability. The SAW-K-means hybrid addressed this limitation by producing more balanced clusters that aligned better with official food vulnerability indicators, thereby improving interpretability for policymakers. For the classification stage, the SVM with an RBF kernel achieved 94.59% accuracy but required careful hyperparameter tuning, whereas the sigmoid kernel underperformed. The RF model achieved the highest performance, with a mean accuracy of 98.89%, low variance across folds (std. dev = 0.75%), and F1-scores between 0.985 and 0.995, confirming its robustness as a reliable classifier for food supply chain monitoring.

Overall, the framework underscored the value of integrating regression, clustering, and ensemble-based classification for the management of regional food systems. It provided policymakers with a decision-support capability to anticipate price fluctuations, prioritize interventions, and design targeted food security strategies. However, the study remained limited by its dataset, which covered only Aceh Province and therefore restricted the generalizability of the findings to broader contexts. Although the framework was developed and validated using food system data from Aceh Province, its modular structure enabled adaptation to other datasets and geographical regions. The ridge regression component could be retrained with local commodity price data to forecast market dynamics in new locations. The SAW-K-means clustering procedure could be recalibrated by redefining vulnerability indicators based on regional priorities. In the classification stage, both SVM and RF models could be applied to different commodity types or supply chain environments by relabeling classes and re-optimizing hyperparameters through cross-validation. The proposed framework was thus transferable and adaptive, allowing implementation across provinces, countries, or datasets with similar structural characteristics. Future studies should extend the framework to multiregional datasets, explore hybrid ensemble classifiers, and incorporate more advanced temporal forecasting techniques to improve scalability and policy relevance.