Skip to content
GCC AI Research

Search

Results for "compute-in-memory"

Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search

arXiv ·

This paper analyzes the impact of device uncertainties on deep neural networks (DNNs) in emerging device-based Computing-in-memory (CiM) systems. The authors propose UAE, an uncertainty-aware Neural Architecture Search scheme, to identify DNN models robust to these uncertainties. The goal is to mitigate accuracy drops when deploying trained models on real-world platforms.

Optimizing AI Systems through Cross-Layer Design: A Data-Centric Approach

MBZUAI ·

A Duke University professor presented a data-centric approach to optimizing AI systems by addressing the memory capacity and bandwidth bottleneck. The presentation covered collaborative optimization across algorithms, systems, architecture, and circuit layers. It also explored compute-in-memory as a solution for integrating computation and memory. Why it matters: Optimizing AI systems through a data-centric approach can improve efficiency and performance, critical for advancing AI applications in the region.

Bring Your Own Kernel! Constructing High-Performance Data Management Systems from Components

MBZUAI ·

Holger Pirk from Imperial College London is developing a novel approach to data management system composition called BOSS. The system uses a homoiconic representation of data and code and partial evaluation of queries by components, drawing inspiration from compiler-construction research. BOSS achieves a fully composable design that effectively combines different data models, hardware platforms, and processing engines, enabling features like GPU acceleration and generative data cleaning with minimal overhead. Why it matters: This research on composable database systems can broaden the applicability of data management techniques in the GCC region, enabling more flexible and efficient data processing for various applications.

Computing in the Post-Moore Era

MBZUAI ·

A professor from EPFL (Lausanne) gave a talk at MBZUAI on computing in the post-Moore era, highlighting the slowing of Moore's Law due to physical limits in transistor miniaturization. He discussed research challenges and opportunities for future computing technologies. He presented examples of post-Moore technologies he helped develop in the datacenter space. Why it matters: As Moore's Law slows, research into alternative computing paradigms becomes critical for the continued advancement of AI and digital services in the UAE and globally.

On Optimizing Mobile Memory, Storage, and Beyond

MBZUAI ·

Prof. Chun Jason Xue from the City University of Hong Kong presented research on optimizing mobile memory and storage by analyzing mobile application characteristics, noting their differences from server applications. The research explores system software designs inherited from the Linux kernel and identifies optimization opportunities in mobile memory and storage management. Xue's work aims to enhance user experience on mobile devices through mobile application characterization, focusing on non-volatile and flash memories. Why it matters: Optimizing mobile systems based on the unique characteristics of mobile applications can significantly improve device performance and user experience in the region.

Programmable Networks for Distributed Deep Learning: Advances and Perspectives

MBZUAI ·

A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.

Duet: efficient and scalable hybriD neUral rElation undersTanding

arXiv ·

The paper introduces Duet, a hybrid neural relation understanding method for cardinality estimation. Duet addresses limitations of existing learned methods, such as high costs and scalability issues, by incorporating predicate information into an autoregressive model. Experiments demonstrate Duet's efficiency, accuracy, and scalability, even outperforming GPU-based methods on CPU.

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv ·

The paper introduces Sparse-Quantized Representation (SpQR), a new compression format and quantization technique for large language models (LLMs). SpQR identifies outlier weights and stores them in higher precision while compressing the remaining weights to 3-4 bits. The method achieves less than 1% accuracy loss in perplexity for LLaMA and Falcon LLMs and enables a 33B parameter LLM to run on a single 24GB consumer GPU. Why it matters: This enables near-lossless compression of LLMs, making powerful models accessible on resource-constrained devices and accelerating inference without significant accuracy degradation.