My Eye AI: A Hybrid Cloud-Mobile Object Detection System for the Visually Impaired Using YOLOv11, OWL-ViT, and BLIP
DOI:
https://doi.org/10.37965/jait.2025.0908Keywords:
assistive technology, BLIP, object detection, OWL-Vit, scene description, YOLOAbstract
My Eye AI is a hybrid cloud-mobile assistive system that delivers real-time object detection and scene description for visually impaired users. The system integrates three AI components: YOLOv11 for object detection, OWL-ViT for zero-shot open-vocabulary recognition, and Bootstrapping Language-Image Pretraining for natural-language scene captioning. Two YOLOv11 variants were trained on custom-curated datasets: the Medium model achieved mAP@0.5 = 0.443 and recall = 0.457, while the X-Large model improved to mAP@0.5 = 0.578 and recall = 0.603—reducing false negatives by 14.6 %. OWL-ViT extended detection to unseen objects with 71.4 % zero-shot accuracy. The cloud-based architecture offloads computation from the smartphone, maintaining low latency while supporting Android and iOS without special hardware. My Eye AI demonstrates measurable improvements in detection accuracy, adaptability, and real-time usability, directly benefiting visually impaired individuals through affordable, accessible mobile deployment.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
