Search

Results for "compute-in-memory"

Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search

arXiv · Jul 6

This paper analyzes the impact of device uncertainties on deep neural networks (DNNs) in emerging device-based Computing-in-memory (CiM) systems. The authors propose UAE, an uncertainty-aware Neural Architecture Search scheme, to identify DNN models robust to these uncertainties. The goal is to mitigate accuracy drops when deploying trained models on real-world platforms.

Optimizing AI Systems through Cross-Layer Design: A Data-Centric Approach

MBZUAI · Invalid Date

A Duke University professor presented a data-centric approach to optimizing AI systems by addressing the memory capacity and bandwidth bottleneck. The presentation covered collaborative optimization across algorithms, systems, architecture, and circuit layers. It also explored compute-in-memory as a solution for integrating computation and memory. Why it matters: Optimizing AI systems through a data-centric approach can improve efficiency and performance, critical for advancing AI applications in the region.

Computing in the Post-Moore Era

MBZUAI · Invalid Date

A professor from EPFL (Lausanne) gave a talk at MBZUAI on computing in the post-Moore era, highlighting the slowing of Moore's Law due to physical limits in transistor miniaturization. He discussed research challenges and opportunities for future computing technologies. He presented examples of post-Moore technologies he helped develop in the datacenter space. Why it matters: As Moore's Law slows, research into alternative computing paradigms becomes critical for the continued advancement of AI and digital services in the UAE and globally.

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv · Jun 5

The paper introduces Sparse-Quantized Representation (SpQR), a new compression format and quantization technique for large language models (LLMs). SpQR identifies outlier weights and stores them in higher precision while compressing the remaining weights to 3-4 bits. The method achieves less than 1% accuracy loss in perplexity for LLaMA and Falcon LLMs and enables a 33B parameter LLM to run on a single 24GB consumer GPU. Why it matters: This enables near-lossless compression of LLMs, making powerful models accessible on resource-constrained devices and accelerating inference without significant accuracy degradation.