CAAF: A Cross-Attention and Adaptive Fusion Framework for Automated Detection and Severity Grading of Fetal Ventriculomegaly in Ultrasound Imaging

Nootan Padia; Divyakant Meva

doi:10.37965/jait.2026.0958

CAAF: A Cross-Attention and Adaptive Fusion Framework for Automated Detection and Severity Grading of Fetal Ventriculomegaly in Ultrasound Imaging

Authors

Nootan Padia Marwadi University, Rajkot, Gujarat, India https://orcid.org/0009-0001-3790-0008
Divyakant Meva Marwadi University, Rajkot, Gujarat, India https://orcid.org/0000-0001-8804-7337

DOI:

https://doi.org/10.37965/jait.2026.0958

Keywords:

attention mechanisms, cross-attention, deep learning, fetal brain, ultrasound imaging, ventriculomegaly

Abstract

Background: Fetal ventriculomegaly (VM), which is defined by abnormal ventricular size, is one of the most common brain malformations detected during fetal screening. Early and precise diagnostic tools are important in predicting neurodevelopment, but ultrasound examination is highly operator-dependent, thereby creating subjectivity in its application.

Objective: To design a precise, reliable, and computationally efficient deep learning system for detecting fetal VM cases and grading their severity through ultrasound images.

Method: In this paper, we propose a novel deep learning architecture labeled “CAAF” (Cross-Attention and Adaptive Fusion) that leverages two different but equally powerful convolutional models – DenseNet121 and EfficientNet-B4 – which are both augmented with channel-spatial attention and bidirectional cross-attention to allow for information sharing between different representations. The model was trained on a dataset of fetal brain ultrasound images obtained from Roboflow (CC BY 4.0) using a stratified 5-fold cross-validation structure to test its performance on four different classes of VM ranging from normal to severe.

Due to the absence of explicit patient identifiers in the publicly available ultrasound dataset, cross-validation was performed at the image level. This limitation is acknowledged, and future work will incorporate patient-wise evaluation when metadata becomes available. The model was then assessed against other state-of-the-art models using accuracy, precision, recall, F1-score, Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) curve, and kappa coefficient.

Results: In total, across all 5-fold cross-validation folds, CAAF achieved a mean classification accuracy of 97.14% ± 2.29%, a Cohen’s Kappa of 0.9606 ± 0.0316, a macro F1-score of 0.9896, and a macro-averaged Receiver Operating Characteristic-Area Under the Curve ROC-AUC approaching 1.00, outperforming all baselines, including DenseNet121 with a classification accuracy of 84.9% and a DenseNet–EfficientNet fusion approach with a classification accuracy of 95.7%. In final testing, the model reached a peak accuracy of 98.92% and a test Cohen’s Kappa of 0.9856 with near-perfect sensitivity (1.00) for moderate and severe cases. To ensure generalizability, the framework was further validated on the Zenodo dataset (n = 597), maintaining high predictive stability with a mean confidence of 0.8982 and a low predictive entropy of 0.2611. Gradient Weighted Class Activation Maps were also employed to validate that the newly developed framework consistently localized clinically significant anatomical structures, particularly the lateral ventricles.

Conclusion: Thus, by leveraging the benefits of cross-attention-based interaction of features and adaptive fusion in a lightweight model, CAAF is shown to be an effective, stable, and interpretable model in the context of fetal VM detection.