A Multi-Scale CNN–Transformer Fusion Framework with Stain Normalization and Focal Loss for High-Accuracy Multi-Stage Gastric Cancer Diagnosis

A Multi-Scale CNN–Transformer Fusion Framework with Stain Normalization and Focal Loss for High-Accuracy Multi-Stage Gastric Cancer Diagnosis

Authors

  • Radhika D S Department of Computer Science & Engineering, A.J. Institute of Engineering and Technology, Affiliated to Visvesvaraya Technological University, Belagavi, Mangalore, Karnataka, India https://orcid.org/0009-0008-6755-0361
  • Antony P J Department of Computer Science & Engineering, A.J. Institute of Engineering and Technology, Affiliated to Visvesvaraya Technological University, Belagavi, Mangalore, Karnataka, India https://orcid.org/0000-0003-1205-0534

DOI:

https://doi.org/10.37965/jait.2026.1049

Keywords:

early-stage gastric cancer, hyperparameter tuning, multi-path convolution, transformer attention optimization

Abstract

Early-stage gastric cancer (GC) diagnosis from histopathological images remains challenging due to subtle morphological variations and inter-slide staining variability. This study proposes a deep learning-based multi-stage GC classification framework that integrates convolutional feature extraction with attention-based contextual modeling. Eight pretrained convolutional neural networks (CNNs) are evaluated, among which DenseNet121 and MobileNetV2 achieve the strongest baseline performance (accuracy ≈ 85.8% and 85.9%, respectively). Building on these results, two novel architectures are developed. The first is an enhanced DenseNet121 model that incorporates multi-path convolution, squeeze-and-excitation(SE) channel recalibration, and attention optimization to capture multi-scale morphological patterns. The second is a Hybrid DenseNet121–Transformer framework that integrates global self-attention with convolutional representations to improve contextual understanding of tissue structures. The models are trained using standardized preprocessing, Macenko stain normalization, extensive data augmentation, and class balancing on a dataset of 7,010 histopathology images representing Normal, Stage I, and Stage II gastric tissues. The proposed hybrid CNN–Transformer framework achieves 90.2% classification accuracy, a macro F1-score of 91.4%, and an Area Under the Curve (AUC) of 0.985, outperforming baseline CNN architectures in stage-wise discrimination. Attention-based visualization highlights diagnostically relevant tissue regions and improves model interpretability. These findings demonstrate that combining multi-scale convolutional representations with Transformer-based global attention provides a robust and interpretable framework for automated GC histopathology analysis.

Downloads

Published

06/03/2026

How to Cite

D S, R., & P J, A. (2026). A Multi-Scale CNN–Transformer Fusion Framework with Stain Normalization and Focal Loss for High-Accuracy Multi-Stage Gastric Cancer Diagnosis. Journal of Artificial Intelligence and Technology, 6, 507–519. https://doi.org/10.37965/jait.2026.1049

Issue

Section

Research Articles
Loading...