Skip to content
GCC AI Research

Search

Results for "data scale"

On the importance of Data Scale in Pretraining Arabic Language Models

arXiv ·

This paper studies the impact of data scale on Arabic Pretrained Language Models (PLMs). Researchers retrained BERT-base and T5-base models on large Arabic corpora, achieving state-of-the-art results on the ALUE and ORCA benchmarks. The analysis indicates that pretraining data volume is the most important factor for performance. Why it matters: This work provides valuable insights into building effective Arabic language models, emphasizing the importance of large, high-quality datasets for advancing Arabic NLP.

QF and Scale AI launch partnership to accelerate innovation, nurture tech talent - The Peninsula Qatar

Qatar Foundation ·

Qatar Foundation (QF) has announced a partnership with Scale AI, a leading data platform for artificial intelligence. The collaboration aims to accelerate innovation and foster tech talent development within Qatar's AI ecosystem. This initiative will leverage Scale AI's expertise in data infrastructure and model development to support QF's research and education efforts. Why it matters: This partnership strengthens Qatar's position as an emerging AI hub by integrating global AI expertise to cultivate local talent and drive technological advancement.

The role of data-driven models in quantifying uncertainty

KAUST ·

KAUST Professor Raul Tempone, an expert in Uncertainty Quantification (UQ), has been appointed as an Alexander von Humboldt Professor at RWTH Aachen University in Germany. This professorship will enable him to further his research on mathematics for uncertainty quantification with new collaborators. Tempone believes the KAUST Strategic Initiative for Uncertainty Quantification (SRI-UQ) contributed to this award. Why it matters: This appointment enhances KAUST's visibility and facilitates cross-fertilization between European and KAUST research groups, benefiting both institutions and attracting talent.

Scalable Community Detection in Massive Networks Using Aggregated Relational Data

MBZUAI ·

A new mini-batch strategy using aggregated relational data is proposed to fit the mixed membership stochastic blockmodel (MMSB) to large networks. The method uses nodal information and stochastic gradients of bipartite graphs for scalable inference. The approach was applied to a citation network with over two million nodes and 25 million edges, capturing explainable structure. Why it matters: This research enables more efficient community detection in massive networks, which is crucial for analyzing complex relationships in various domains, but this article has no clear connection to the Middle East.