Video Anomaly Detection in Crowded Scenes Using Deep Learning

Sofia Nishath; Nithya Darisini P.S.

doi:10.37965/jait.2025.0775

Video Anomaly Detection in Crowded Scenes Using Deep Learning

Authors

Sofia Nishath School of Computer Science and Engineering, Vellore Institute of Technology https://orcid.org/0009-0009-0202-5360
Nithya Darisini P.S. School of Computer Science and Engineering, Vellore Institute of Technology

DOI:

https://doi.org/10.37965/jait.2025.0775

Keywords:

convolutional neural networks, Deep SORT, long short-term memory, video anomaly detection, YOLOv4

Abstract

Detecting abnormalities accurately in crowded settings remains a vital problem with many real-world applications, such as crowd video surveillance and crowd behavior analysis. For detecting anomalies in such situations, conventional techniques such as Optical Flow (OF), Histogram of Oriented Gradients (HOG), and Scale-Invariant Feature Transform (SIFT) have been applied, observing their computational complexity and the dynamic nature of the crowd behavior. The introduction of sophisticated deep learning techniques observes its impact on video surveillance systems protecting the public from heedless violent and illegal activities like robberies, thefts, fights, and vandalism. The suggested technique implements a novel deep learning method for video anomaly identification in crowded settings, applying Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) to extract abnormal temporal and spatial information from the UCSD dataset’s video sequences. YouOnlyLookOnce version4 (YOLOv4) accurately identifies and detects anomalies in the processed video frames cohesively with bounding box predictions. The Deep SORT tracking algorithm tracks the anomalies with the detected and computed input weights, preserving their distinct tracking identifications (IDs). With an accuracy value of 99.8%, experimental findings on the UCSD Ped2 dataset show that this method performs better than state-of-the-art techniques such as DTA, Ensemble Learning, and RNN-LSTM.