Skip to content
GCC AI Research

Search

Results for "open source"

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

arXiv ·

Researchers from MBZUAI have released MobiLlama, a fully transparent open-source 0.5 billion parameter Small Language Model (SLM). MobiLlama is designed for resource-constrained devices, emphasizing enhanced performance with reduced resource demands. The full training data pipeline, code, model weights, and checkpoints are available on Github.

A Case Study for Compliance as Code with Graphs and Language Models: Public release of the Regulatory Knowledge Graph

arXiv ·

This paper introduces a Regulatory Knowledge Graph (RKG) for the Abu Dhabi Global Market (ADGM) regulations, constructed using language models and graph technologies. A portion of the regulations was manually tagged to train BERT-based models, which were then applied to the rest of the corpus. The resulting knowledge graph, stored in Neo4j, and code are open-sourced on GitHub to promote advancements in compliance automation.

ArabJobs: A Multinational Corpus of Arabic Job Ads

arXiv ·

The ArabJobs dataset is a new corpus of over 8,500 Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the UAE. The dataset contains over 550,000 words and captures linguistic, regional, and socio-economic variation in the Arab labor market. It is available on GitHub and can be used for fairness-aware Arabic NLP and labor market research.

Interpretable and synergistic deep learning for visual explanation and statistical estimations of segmentation of disease features from medical images

arXiv ·

The study compares deep learning models trained via transfer learning from ImageNet (TII-models) against those trained solely on medical images (LMI-models) for disease segmentation. Results show that combining outputs from both model types can improve segmentation performance by up to 10% in certain scenarios. A repository of models, code, and over 10,000 medical images is available on GitHub to facilitate further research.