Skip to content
GCC AI Research

Search

Results for "Data"

Fact checking with ChatGPT

MBZUAI ·

A new paper from MBZUAI researchers explores using ChatGPT to combat the spread of fake news. The researchers, including Preslav Nakov and Liangming Pan, demonstrate that ChatGPT can be used to fact-check published information. Their paper, "Fact-Checking Complex Claims with Program-Guided Reasoning," was accepted at ACL 2023. Why it matters: This research highlights the potential of large language models to address the growing challenge of misinformation, with implications for maintaining information integrity in the digital age.

The new way we do things

KAUST ·

Christopher Fabian, co-founder of UNICEF’s Innovation Unit, spoke at KAUST about using data and technology to improve lives. He highlighted how IoT and wearables can connect remote populations in developing countries with their governments. The talk emphasized using data to include unaccounted populations. Why it matters: The discussion reinforces KAUST's commitment to leveraging technology for global development and aligns with Saudi Arabia's broader goals for digital transformation.

Machine learning 101

MBZUAI ·

Machine learning (ML) algorithms use data to make decisions or predictions, improving over time as more data is provided. ML is a subset of AI, focused on models that learn from data, contrasting with rule-based systems. ML is superior in scenarios where rules are not exhaustive, such as medical scans, but rule-based systems and ML often complement each other. Why it matters: This overview clarifies the role of machine learning within the broader field of AI, highlighting its data-driven approach and its advantages over traditional rule-based systems in complex decision-making scenarios.

ArabJobs: A Multinational Corpus of Arabic Job Ads

arXiv ·

The ArabJobs dataset is a new corpus of over 8,500 Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the UAE. The dataset contains over 550,000 words and captures linguistic, regional, and socio-economic variation in the Arab labor market. It is available on GitHub and can be used for fairness-aware Arabic NLP and labor market research.

A platform for material scientists

KAUST ·

Scimagine is a KAUST-based startup that provides a cloud-based platform for managing and storing experimental data for material scientists. The platform allows researchers to store, manage, and share their data, as well as create scientific visuals. It addresses the problem of experimental data being hidden in PDF files and not easily searchable. Why it matters: This platform improves data accessibility and collaboration in materials science research, potentially accelerating discovery and innovation in the field.

SlimPajama-DC: Understanding Data Combinations for LLM Training

arXiv ·

Researchers at MBZUAI release SlimPajama-DC, an empirical analysis of data combinations for pretraining LLMs using the SlimPajama dataset. The study examines the impact of global vs. local deduplication and the proportions of highly-deduplicated multi-source datasets. Results show that increased data diversity after global deduplication is crucial, with the best configuration outperforming models trained on RedPajama.

Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East

arXiv ·

This paper analyzes the energy consumption and carbon footprint of LLM inference in the UAE compared to Iceland, Germany, and the USA. The study uses DeepSeek Coder 1.3B and the HumanEval dataset to evaluate code generation. It provides a comparative analysis of geographical trade-offs for climate-aware AI deployment, specifically addressing the challenges and potential of datacenters in desert regions.