Skip to content
GCC AI Research

Search

Results for "AI safety"

AI Safety Research

MBZUAI ·

Adel Bibi, a KAUST alumnus and researcher at the University of Oxford, presented his research on AI safety, covering robustness, alignment, and fairness of LLMs. The research addresses challenges in AI systems, alignment issues, and fairness across languages in common tokenizers. Bibi's work includes instruction prefix tuning and its theoretical limitations towards alignment. Why it matters: This research from a leading researcher highlights the importance of addressing safety concerns in LLMs, particularly regarding alignment and fairness in the Arabic language.

AI impacts must be ethical

MBZUAI ·

MBZUAI's Executive Program held a module on AI ethics, safety, and societal impacts, led by Professors Tom Mitchell and Justine Cassell. The session covered machine learning bias, privacy, AI's impact on jobs and education, and the ethical use of AI. Forty-two participants from ministerial leadership and top industry executives are part of the first cohort. Why it matters: This highlights MBZUAI and the UAE's commitment to ethical AI development as part of building a knowledge-based economy.

UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases

arXiv ·

Researchers introduce UnsafeChain, a new safety alignment dataset designed to improve the safety of large reasoning models (LRMs) by focusing on 'hard prompts' that elicit harmful outputs. The dataset identifies and corrects unsafe completions into safe responses, exposing models to unsafe behaviors and guiding their correction. Fine-tuning LRMs on UnsafeChain demonstrates enhanced safety and preservation of general reasoning ability compared to existing datasets like SafeChain and STAR-1.

ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

arXiv ·

The paper introduces ILION, a deterministic execution gate designed to ensure the safety of autonomous AI agents by classifying proposed actions as either BLOCK or ALLOW. ILION uses a five-component cascade architecture that operates without statistical training, API dependencies, or labeled data. Evaluation against existing text-safety infrastructures demonstrates ILION's superior performance in preventing unauthorized actions, achieving an F1 score of 0.8515 with sub-millisecond latency.

A two-stage approach for making AI image generators safer | CVPR

MBZUAI ·

Researchers from MBZUAI and other institutions have developed a new framework called STEREO to improve the safety of text-to-image diffusion models. STEREO uses a two-stage approach: STE (Search Thoroughly Enough) based on adversarial training and REO (Robustly Erase Once) for batch concept erasure. This framework aims to enhance safety without significantly impacting the model's performance on normal queries. Why it matters: The framework addresses vulnerabilities in AI image generation, reducing the creation of inappropriate images while preserving performance on harmless queries.

Responsible AI for the Future of Our Societies

MBZUAI ·

MBZUAI President Professor Eric Xing discussed AI's potential to augment human capabilities and the responsibility of AI researchers in shaping future leaders. Xing's background includes professorships at Carnegie Mellon University, leadership at Petuum Inc., and directorship of the Center for Machine Learning and Health. He also held visiting positions at Stanford University and Facebook Inc. Why it matters: The emphasis on responsible AI development and education aligns with the UAE's broader strategy to become a leader in ethical and human-centric AI.