Researchers at MBZUAI release SlimPajama-DC, an empirical analysis of data combinations for pretraining LLMs using the SlimPajama dataset. The study examines the impact of global vs. local deduplication and the proportions of highly-deduplicated multi-source datasets. Results show that increased data diversity after global deduplication is crucial, with the best configuration outperforming models trained on RedPajama.
Xiaolin Huang from Shanghai Jiao Tong University presented a talk at MBZUAI on training deep neural networks in tiny subspaces. The talk covered the low-dimension hypothesis in neural networks and methods to find subspaces for efficient training. It suggests that training in smaller subspaces can improve training efficiency, generalization, and robustness. Why it matters: Investigating efficient training methods is crucial for resource-constrained environments and can enable broader access to advanced AI.
This paper proposes a smart dome model for mosques that uses AI to control dome movements based on weather conditions and overcrowding. The model utilizes Congested Scene Recognition Network (CSRNet) and fuzzy logic techniques in Python to determine when to open and close the domes to maintain fresh air and sunlight. The goal is to automatically manage dome operation based on real-time data, specifying the duration for which the domes should remain open each hour.
MBZUAI PhD graduate William de Vazelhes is researching hard-thresholding algorithms to enable AI to work from smaller datasets. His work focuses on optimization algorithms that simplify data, making it easier to analyze and work with, useful for energy-saving and deploying AI models on low-memory devices. He demonstrated that his approach can obtain results similar to those of convex algorithms in many usual settings. Why it matters: This research could broaden AI accessibility by reducing computational costs, and has potential applications in sectors like finance, particularly for portfolio management under budgetary constraints.
LUMA AI is expanding its presence in Saudi Arabia, establishing its regional headquarters in the Kingdom. The company is partnering with HUMAIN, a Saudi entity, to support the creative industry through AI tools. LUMA AI's technology enables the creation of 3D models from images and videos, catering to the growing demand for digital content in the region. Why it matters: This move signals increasing investment and interest in AI-driven solutions for creative applications within the Saudi Arabian market.
The Symposium on Data Mining and Applications (SDMA 2014) was organized by MEGDAM to foster collaboration among data mining and machine learning researchers in Saudi Arabia, GCC countries, and the Middle East. The symposium covered areas such as statistics, computational intelligence, pattern recognition, databases, Big Data Mining and visualization. Acceptance was based on originality, significance and quality of contribution.
YOLO26-RipeLoc Lite is a new lightweight deep learning architecture designed for simultaneous detection, ripeness classification, and center-point localization of greenhouse tomatoes for robotic harvesting. The model incorporates a Lightweight Feature Pyramid Network, a Ripeness-Aware Attention Module, and a Compact Detection Head for efficient and precise operation. Evaluated on a custom dataset from the SILAL greenhouse in Abu Dhabi, UAE, it achieved a [email protected] of 92.9% with only 2.38 million parameters, outperforming existing YOLO models in accuracy-efficiency. Why it matters: This research provides an efficient and accurate solution for automating a critical agricultural process, enhancing food security and technological capabilities in the region's greenhouse farming.