Skip to content
GCC AI Research

Software-Directed Hardware Reliability for ML Systems

MBZUAI · Notable

Summary

Abdulrahman Mahmoud, a postdoctoral fellow at Harvard University, discusses software-directed tools and techniques for processor design and reliability enhancement in ML systems. He emphasizes the need for a nuanced approach to numerical data formats supported by robust hardware. He advocates for integrating reliability as a foundational element in the design process. Why it matters: This research addresses the critical challenge of hardware reliability in AI processors, particularly relevant as the field moves towards hardware-software co-design for sustained growth.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Reliability Exploration of Neural Network Accelerator

MBZUAI ·

This article discusses the reliability of Deep Neural Networks (DNNs) and their hardware platforms, especially regarding soft errors caused by cosmic rays. It highlights that while DNNs are robust against bit flips, errors can still lead to miscalculations in AI accelerators. The talk, led by Prof. Masanori Hashimoto from Kyoto University, will cover identifying vulnerabilities in neural networks and reliability exploration of AI accelerators for edge computing. Why it matters: As DNNs are deployed in safety-critical applications in the region, ensuring the reliability of AI hardware is crucial for safe and trustworthy operation.

Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search

arXiv ·

This paper analyzes the impact of device uncertainties on deep neural networks (DNNs) in emerging device-based Computing-in-memory (CiM) systems. The authors propose UAE, an uncertainty-aware Neural Architecture Search scheme, to identify DNN models robust to these uncertainties. The goal is to mitigate accuracy drops when deploying trained models on real-world platforms.

Hardware Security through the Lens of Dr ML

MBZUAI ·

NYU Abu Dhabi hosted a talk by Prof. Debdeep Mukhopadhyay on the intersection of machine learning and hardware security. The talk covered using ML/DL for side-channel attacks, leakage assessment in crypto-devices, and threats to hardware security primitives. Prof. Mukhopadhyay is a visiting professor at NYU Abu Dhabi and Institute Chair Professor at IIT Kharagpur. Why it matters: The talk highlights the growing importance of hardware security in modern systems and the role of machine learning in both attacking and defending hardware vulnerabilities.

Programmable Networks for Distributed Deep Learning: Advances and Perspectives

MBZUAI ·

A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.