Programmable Networks for Distributed Deep Learning: Advances and Perspectives

MBZUAI · Notable

Infrastructure Research Networking LLM Distributed Systems

Summary

A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.

Keywords

programmable networks · distributed deep learning · MBZUAI · in-network aggregation · gradient compression

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Enabling Fast, Robust, and Personalized Federated Learning

MBZUAI · Invalid Date

A talk at MBZUAI discussed federated learning, a distributed machine learning approach training models over devices while keeping data localized. The presentation covered a straggler-resilient federated learning scheme using adaptive node participation to tackle system heterogeneity. It also presented a robust optimization formulation for addressing data heterogeneity and a new algorithm for personalizing learned models. Why it matters: Federated learning is crucial for AI applications involving decentralized data sources, and research on improving its robustness and personalization is essential for real-world deployment in the region.

KAUST advances scalable AI through global collaboration

KAUST · Nov 12

KAUST is hosting a workshop on distributed training in November 2025, led by Professors Peter Richtarik and Marco Canini, focusing on scaling large models like LLMs and ViTs. Richtarik's team recently solved a 75-year-old problem in asynchronous optimization, developing time-optimal stochastic gradient descent algorithms. This research improves the speed and reliability of large model training and supports applications in distributed and federated learning. Why it matters: KAUST's focus on scalable AI and federated learning contributes to Saudi Arabia's Vision 2030 goals and addresses critical challenges in AI deployment and data privacy.

Optimizing AI Systems through Cross-Layer Design: A Data-Centric Approach

MBZUAI · Invalid Date

A Duke University professor presented a data-centric approach to optimizing AI systems by addressing the memory capacity and bandwidth bottleneck. The presentation covered collaborative optimization across algorithms, systems, architecture, and circuit layers. It also explored compute-in-memory as a solution for integrating computation and memory. Why it matters: Optimizing AI systems through a data-centric approach can improve efficiency and performance, critical for advancing AI applications in the region.

New approaches for machine learning optimization presented at ICML

MBZUAI · Invalid Date

MBZUAI and KAUST researchers collaborated to present new optimization methods at ICML 2024 for composite and distributed machine learning settings. The study addresses challenges in training large models due to data size and computational power. Their work focuses on minimizing the "loss function" by adjusting internal trainable parameters, using techniques like gradient clipping. Why it matters: This research contributes to the ongoing advancement of machine learning optimization, crucial for improving the performance and efficiency of AI models in the region and globally.

Programmable Networks for Distributed Deep Learning: Advances and Perspectives

Summary

Keywords

Related

Enabling Fast, Robust, and Personalized Federated Learning

KAUST advances scalable AI through global collaboration

Optimizing AI Systems through Cross-Layer Design: A Data-Centric Approach

New approaches for machine learning optimization presented at ICML