Search

Results for "quantized neural networks"

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv · Jun 5

The paper introduces Sparse-Quantized Representation (SpQR), a new compression format and quantization technique for large language models (LLMs). SpQR identifies outlier weights and stores them in higher precision while compressing the remaining weights to 3-4 bits. The method achieves less than 1% accuracy loss in perplexity for LLaMA and Falcon LLMs and enables a 33B parameter LLM to run on a single 24GB consumer GPU. Why it matters: This enables near-lossless compression of LLMs, making powerful models accessible on resource-constrained devices and accelerating inference without significant accuracy degradation.

Low-Complexity NN Technology: Model and Precision Search, Acceleration Circuit, and Applications

MBZUAI · Invalid Date

Researchers at National Taiwan University are developing low-complexity neural network technologies using quantization to reduce model size while maintaining accuracy. Their work includes binary-weighted CNNs and transformers, along with a neural architecture search scheme (TPC-NAS) applied to image recognition, object detection, and NLP tasks. They have also built a PE-based CNN/transformer hardware accelerator in Xilinx FPGA SoC with a PyTorch-based software framework. Why it matters: This research provides practical methods for deploying efficient deep learning models on resource-constrained hardware, potentially enabling broader adoption of AI in embedded systems and edge devices.

Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search

arXiv · Jul 6

This paper analyzes the impact of device uncertainties on deep neural networks (DNNs) in emerging device-based Computing-in-memory (CiM) systems. The authors propose UAE, an uncertainty-aware Neural Architecture Search scheme, to identify DNN models robust to these uncertainties. The goal is to mitigate accuracy drops when deploying trained models on real-world platforms.

Reaping the full benefits of AI-driven applications

MBZUAI · Invalid Date

MBZUAI Assistant Professors Bin Gu and Huan Xiong are advancing spiking neural networks (SNNs) to improve computational power and energy efficiency. They will present their latest research on SNNs at the 38th Annual AAAI Conference on Artificial Intelligence in Vancouver. SNNs process information in discrete events, mimicking biological neurons and offering improved energy efficiency compared to traditional neural networks. Why it matters: This research could enable running advanced AI applications like GPTs on mobile devices, unlocking their full potential due to the energy efficiency of SNNs.

Can AI Learn Like Us? Unveiling the Secrets of Spiking Neural Networks

MBZUAI · Invalid Date

MBZUAI Ph.D. graduate Hilal Mohammad Hilal AlQuabeh researched methods to improve the efficiency of machine learning algorithms, specifically focusing on pairwise learning and multi-instance learning. Pairwise learning teaches AI to make decisions by comparing options in pairs, useful for ranking and anomaly detection. Multi-instance learning involves learning from sets of data points, applicable in areas like drug discovery. Why it matters: Optimizing AI for low-resource environments expands its accessibility and applicability in critical sectors like healthcare and remote area operations.

Principled Scaling of Neural Networks

MBZUAI · Invalid Date

Soufiane Hayou of the National University of Singapore presented a talk at MBZUAI on principled scaling of neural networks. The talk covered leveraging mathematical results to efficiently scale neural networks. He obtained his PhD in statistics in 2021 from Oxford. Why it matters: Understanding neural network scaling is crucial for developing more efficient and powerful AI models in the region.

Training Deep Neural Networks in Tiny Subspaces

MBZUAI · Invalid Date

Xiaolin Huang from Shanghai Jiao Tong University presented a talk at MBZUAI on training deep neural networks in tiny subspaces. The talk covered the low-dimension hypothesis in neural networks and methods to find subspaces for efficient training. It suggests that training in smaller subspaces can improve training efficiency, generalization, and robustness. Why it matters: Investigating efficient training methods is crucial for resource-constrained environments and can enable broader access to advanced AI.

Emulating the energy efficiency of the brain

MBZUAI · Invalid Date

MBZUAI researchers are developing spiking neural networks (SNNs) to emulate the energy efficiency of the human brain. Traditional deep learning models like those powering ChatGPT consume significant energy, with a single query using 3.96 watts. SNNs aim to mimic biological neurons more closely to reduce energy consumption, as the human brain uses only a fraction of the energy compared to these models. Why it matters: This research could lead to more sustainable and energy-efficient AI technologies, addressing a major challenge in deploying large-scale AI systems.