Pseudo-Temporal 3D CNN Fusion of Gradient and Deep Spatial Features for Hand Gesture Recognition
DOI:
https://doi.org/10.37965/jait.2026.0984Keywords:
Deep learning, hand gesture, histogram of oriented gradients, integrated model, sign language recognitionAbstract
Communication between people with disabilities and those who do not understand sign language is a growing social need and a challenging task. The usage of deep learning (DL) techniques acts as a gateway for people with communication impairments to bridge the communication gap. This research develops an integrated approach using DL architectures to recognize hand images and facilitates effective communication. Features from the raw data are extracted using the histogram of oriented gradients (HOG). HOG evaluates the magnitude and orientation of the gradient of the input image based on its outline, which is used as the edge direction. The extracted features are classified using the proposed integrated model, which comprises MobileNet V2 and a three-dimensional convolutional neural network (3D CNN). MobileNet V2 is utilized for extracting spatial features, while the 3D CNN detects spatial data in three dimensions to facilitate better classification accuracy. The proposed model fuses HOG-based gradient descriptors with deep spatial features from MobileNetV2 using a pseudo-temporal 3D CNN, enabling superior static sign language recognition. Experimental analysis shows that the proposed method achieves an accuracy of 99.55%, which is higher than that of existing techniques.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
