GCC AI Research

Search

Results for "object detection"

Spot-the-Camel: Computer Vision for Safer Roads

arXiv ·

Researchers in Saudi Arabia are applying computer vision techniques to reduce Camel-Vehicle Collisions (CVCs). They tested object detection models including CenterNet, EfficientDet, Faster R-CNN, SSD, and YOLOv8 on the task, finding YOLOv8 to be the most accurate and efficient. Future work will focus on developing a system to improve road safety in rural areas.

Computer Vision for a Camel-Vehicle Collision Mitigation System

arXiv ·

Researchers are exploring computer vision models to mitigate Camel-Vehicle Collisions (CVC) in Saudi Arabia, which have a high fatality rate. They tested CenterNet, EfficientDet, Faster R-CNN, and SSD for camel detection, finding CenterNet to be the most accurate and efficient. Future work involves developing a comprehensive system to enhance road safety in rural areas.

From YOLO to VLMs: Advancing Zero-Shot and Few-Shot Detection of Wastewater Treatment Plants Using Satellite Imagery in MENA Region

arXiv ·

A new study compares vision-language models (VLMs) to YOLOv8 for wastewater treatment plant (WWTP) identification in satellite imagery across the MENA region. VLMs like Gemma-3 demonstrate superior zero-shot performance compared to YOLOv8, trained on a dataset of 83,566 satellite images from Egypt, Saudi Arabia, and UAE. The research suggests VLMs offer a scalable, annotation-free alternative for remote sensing of WWTPs.

Deep-Learning-based Automated Palm Tree Counting and Geolocation in Large Farms from Aerial Geotagged Images

arXiv ·

Researchers in Saudi Arabia have developed a deep learning framework for automated counting and geolocation of palm trees using aerial images. The system uses a Faster R-CNN model trained on a dataset of 10,000 palm tree instances collected in the Kharj region using DJI drones. Geolocation accuracy of 2.8m was achieved using geotagged metadata and photogrammetry techniques.

A Missing and Found Recognition System for Hajj and Umrah

arXiv ·

A proposed recognition system aims to identify missing persons, deceased individuals, and lost objects during the Hajj and Umrah pilgrimages in Saudi Arabia. The system intends to leverage facial recognition and object identification to manage the large crowds expected in the coming decade, estimated to reach 20 million pilgrims. It will be integrated into the CrowdSensing system for crowd estimation, management, and safety.

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance

arXiv ·

The paper introduces a framework for camel farm monitoring using a combination of automated annotation and fine-tune distillation. The Unified Auto-Annotation framework uses GroundingDINO and SAM to automatically annotate surveillance video data. The Fine-Tune Distillation framework then fine-tunes student models like YOLOv8, transferring knowledge from a larger teacher model, using data from Al-Marmoom Camel Farm in Dubai.

Hybrid Deep Feature Extraction and ML for Construction and Demolition Debris Classification

arXiv ·

This paper introduces a hybrid deep learning and machine learning pipeline for classifying construction and demolition waste. A dataset of 1,800 images from UAE construction sites was created, and deep features were extracted using a pre-trained Xception network. The combination of Xception features with machine learning classifiers achieved up to 99.5% accuracy, demonstrating state-of-the-art performance for debris identification.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

arXiv ·

The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.