Skip to content
GCC AI Research

Search

Results for "normalization"

Unveiling Hidden Energy Anomalies: Harnessing Deep Learning to Optimize Energy Management in Sports Facilities

arXiv ·

This paper explores the use of deep learning for anomaly detection in sports facilities, with the goal of optimizing energy management. The researchers propose a method using Deep Feedforward Neural Networks (DFNN) and threshold estimation techniques to identify anomalies and reduce false alarms. They tested their approach on an aquatic center dataset at Qatar University, achieving 94.33% accuracy and 92.92% F1-score. Why it matters: The research demonstrates the potential of AI to improve energy efficiency and operational effectiveness in sports facilities within the GCC region.

AraToken: Optimizing Arabic Tokenization with Normalization Pipeline and Language Extension for Qwen3

arXiv ·

The paper introduces AraToken, an Arabic-optimized tokenizer based on the SentencePiece Unigram algorithm that incorporates a normalization pipeline to handle Arabic-specific orthographic variations. Experiments show that AraToken achieves 18% lower fertility compared to unnormalized baselines. The Language Extension Pipeline (LEP) is introduced to integrate AraToken into Qwen3-0.6B, reducing evaluation loss from 8.28 to 2.43 within 800 training steps on 100K Arabic samples. Why it matters: This research provides an efficient tokenizer tailored for Arabic, improving performance of LLMs on Arabic text and benefiting Arabic NLP research by providing released resources.

Moving into a new normal

KAUST ·

KAUST is gradually reopening its campus after a period of lockdown, following the Saudi government's lifting of the curfew. The reopening plan incorporates best practices learned from universities worldwide and considers the evolving higher education and research landscape. KAUST has implemented comprehensive COVID-19 health and safety procedures across various aspects of life on campus. Why it matters: This measured reopening signals a return to normalcy for research and academic activities at KAUST, while prioritizing the health and safety of its community.

The "new normal" — major trends post COVID-19

KAUST ·

An article from KAUST discusses the impact of COVID-19 on automation, material science, and VR. It suggests increased automation, voice activation, and motion detection to reduce transmission in public spaces. KAUST faculty member Derya Baran is working on antimicrobial materials for high-touch locations, and KAUST is exploring VR for virtual labs. Why it matters: The pandemic is accelerating the adoption of AI-driven solutions and advanced materials research within Saudi Arabia to address public health challenges.

New Nature Index Ranks KAUST Among World Leaders

KAUST ·

KAUST was ranked first in Saudi Arabia and in the global top twenty in the Nature Index Annual Tables' new normalized ranking. The ranking considers the number of high-quality articles published as a proportion of an institute's overall output in the natural sciences. This normalized ranking allows institutions of different sizes to be compared on the same basis. Why it matters: This ranking highlights KAUST's growing impact on global scientific research and its commitment to producing high-quality publications.

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

arXiv ·

This paper introduces a unified deep autoregressive model (UAE) for cardinality estimation that learns joint data distributions from both data and query workloads. It uses differentiable progressive sampling with the Gumbel-Softmax trick to incorporate supervised query information into the deep autoregressive model. Experiments show UAE achieves better accuracy and efficiency compared to state-of-the-art methods.

Proper Noun Diacritization for Arabic Wikipedia: A Benchmark Dataset

arXiv ·

A new dataset for Arabic proper noun diacritization was introduced, addressing the ambiguity caused by undiacritized proper nouns in Arabic Wikipedia. The dataset includes manually diacritized Arabic proper nouns of various origins along with their English Wikipedia glosses. GPT-4o was benchmarked on the task of recovering full diacritization from undiacritized Arabic and English forms, achieving 73% accuracy. Why it matters: The release of this dataset should facilitate further research on Arabic Wikipedia proper noun diacritization, improving the accessibility and accuracy of Arabic NLP resources.