CAAF: A Cross-Attention and Adaptive Fusion Framework for Automated Detection and Severity Grading of Fetal Ventriculomegaly in Ultrasound Imaging
DOI:
https://doi.org/10.37965/jait.2026.0958Keywords:
attention mechanisms, cross-attention, deep learning, fetal brain, ultrasound imaging, ventriculomegalyAbstract
Background: Fetal ventriculomegaly (VM), which is defined by abnormal ventricular size, is one of the most common brain malformations detected during fetal screening. Early and precise diagnostic tools are important in predicting neurodevelopment, but ultrasound examination is highly operator-dependent, thereby creating subjectivity in its application.
Objective: To design a precise, reliable, and computationally efficient deep learning system for detecting fetal VM cases and grading their severity through ultrasound images.
Method: In this paper, we propose a novel deep learning architecture labeled “CAAF” (Cross-Attention and Adaptive Fusion) that leverages two different but equally powerful convolutional models – DenseNet121 and EfficientNet-B4 – which are both augmented with channel-spatial attention and bidirectional cross-attention to allow for information sharing between different representations. The model was trained on a dataset of fetal brain ultrasound images obtained from Roboflow (CC BY 4.0) using a stratified 5-fold cross-validation structure to test its performance on four different classes of VM ranging from normal to severe.
Due to the absence of explicit patient identifiers in the publicly available ultrasound dataset, cross-validation was performed at the image level. This limitation is acknowledged, and future work will incorporate patient-wise evaluation when metadata becomes available. The model was then assessed against other state-of-the-art models using accuracy, precision, recall, F1-score, Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) curve, and kappa coefficient.
Results: In total, across all 5-fold cross-validation folds, CAAF achieved a mean classification accuracy of 97.14% ± 2.29%, a Cohen’s Kappa of 0.9606 ± 0.0316, a macro F1-score of 0.9896, and a macro-averaged Receiver Operating Characteristic-Area Under the Curve ROC-AUC approaching 1.00, outperforming all baselines, including DenseNet121 with a classification accuracy of 84.9% and a DenseNet–EfficientNet fusion approach with a classification accuracy of 95.7%. In final testing, the model reached a peak accuracy of 98.92% and a test Cohen’s Kappa of 0.9856 with near-perfect sensitivity (1.00) for moderate and severe cases. To ensure generalizability, the framework was further validated on the Zenodo dataset (n = 597), maintaining high predictive stability with a mean confidence of 0.8982 and a low predictive entropy of 0.2611. Gradient Weighted Class Activation Maps were also employed to validate that the newly developed framework consistently localized clinically significant anatomical structures, particularly the lateral ventricles.
Conclusion: Thus, by leveraging the benefits of cross-attention-based interaction of features and adaptive fusion in a lightweight model, CAAF is shown to be an effective, stable, and interpretable model in the context of fetal VM detection.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
