Skip to content
GCC AI Research

Search

Results for "data scale"

On the importance of Data Scale in Pretraining Arabic Language Models

arXiv ·

This paper studies the impact of data scale on Arabic Pretrained Language Models (PLMs). Researchers retrained BERT-base and T5-base models on large Arabic corpora, achieving state-of-the-art results on the ALUE and ORCA benchmarks. The analysis indicates that pretraining data volume is the most important factor for performance. Why it matters: This work provides valuable insights into building effective Arabic language models, emphasizing the importance of large, high-quality datasets for advancing Arabic NLP.

Scalable Community Detection in Massive Networks Using Aggregated Relational Data

MBZUAI ·

A new mini-batch strategy using aggregated relational data is proposed to fit the mixed membership stochastic blockmodel (MMSB) to large networks. The method uses nodal information and stochastic gradients of bipartite graphs for scalable inference. The approach was applied to a citation network with over two million nodes and 25 million edges, capturing explainable structure. Why it matters: This research enables more efficient community detection in massive networks, which is crucial for analyzing complex relationships in various domains, but this article has no clear connection to the Middle East.

Principled Scaling of Neural Networks

MBZUAI ·

Soufiane Hayou of the National University of Singapore presented a talk at MBZUAI on principled scaling of neural networks. The talk covered leveraging mathematical results to efficiently scale neural networks. He obtained his PhD in statistics in 2021 from Oxford. Why it matters: Understanding neural network scaling is crucial for developing more efficient and powerful AI models in the region.

Managing and Analyzing Big Traffic Data — An Uncertain Time Series Approach

MBZUAI ·

This article discusses the application of uncertain time series (UTS) approach to manage and analyze big traffic data for high-resolution vehicular transportation services. The study addresses challenges such as data sparseness, decision-making among multiple UTSs, and future forecasting with spatio-temporal correlations. Jilin Hui, previously a Research Associate at the Inception Institute of Artificial Intelligence (UAE), is applying this approach to solve problems related to increased congestion, greenhouse gas emissions, and reduced air quality in urban environments. Why it matters: The application of AI techniques to traffic management could significantly improve urban mobility and environmental sustainability in the GCC region and beyond.

Exploring science's fourth paradigm

KAUST ·

KAUST held a research conference on Computational and Statistical Interface to Big Data from March 19-21. The conference covered topics like data representation, visualization, parallel algorithms, and large-scale machine learning. Participants came from institutions including the American University of Sharjah, Aalborg University, and others to exchange ideas. Why it matters: The conference highlights KAUST's focus on promoting big data research and collaboration to address challenges and opportunities in various scientific fields within the Kingdom and globally.

A platform for material scientists

KAUST ·

Scimagine is a KAUST-based startup that provides a cloud-based platform for managing and storing experimental data for material scientists. The platform allows researchers to store, manage, and share their data, as well as create scientific visuals. It addresses the problem of experimental data being hidden in PDF files and not easily searchable. Why it matters: This platform improves data accessibility and collaboration in materials science research, potentially accelerating discovery and innovation in the field.

KAUST and the Big Data age

KAUST ·

KAUST held a research workshop on Optimization and Big Data, gathering researchers to discuss challenges and opportunities in the field. Speakers presented novel optimization algorithms and distributed systems for handling large datasets. The workshop featured 20 speakers from KAUST, global universities, and Microsoft Research. Why it matters: The event highlights KAUST's role as a regional hub for advancing research and development in big data and optimization, crucial for AI and various computational fields.

Building Planetary-Scale Collaborative Intelligence

MBZUAI ·

Sai Praneeth Karimireddy from UC Berkeley presented a talk on building planetary-scale collaborative intelligence, highlighting the challenges of using distributed data in machine learning due to data silos and ethical-legal restrictions. He proposed collaborative systems like federated learning as a solution to bring together distributed data while respecting privacy. The talk addressed the need for efficiency, reliability, and management of divergent goals in these systems, suggesting the use of tools from optimization, statistics, and economics. Why it matters: Collaborative AI systems can unlock valuable distributed data in the region, especially in sensitive sectors like healthcare, while ensuring privacy and addressing ethical concerns.