A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.
KAUST is hosting a workshop on distributed training in November 2025, led by Professors Peter Richtarik and Marco Canini, focusing on scaling large models like LLMs and ViTs. Richtarik's team recently solved a 75-year-old problem in asynchronous optimization, developing time-optimal stochastic gradient descent algorithms. This research improves the speed and reliability of large model training and supports applications in distributed and federated learning. Why it matters: KAUST's focus on scalable AI and federated learning contributes to Saudi Arabia's Vision 2030 goals and addresses critical challenges in AI deployment and data privacy.
Sai Praneeth Karimireddy from UC Berkeley presented a talk on building planetary-scale collaborative intelligence, highlighting the challenges of using distributed data in machine learning due to data silos and ethical-legal restrictions. He proposed collaborative systems like federated learning as a solution to bring together distributed data while respecting privacy. The talk addressed the need for efficiency, reliability, and management of divergent goals in these systems, suggesting the use of tools from optimization, statistics, and economics. Why it matters: Collaborative AI systems can unlock valuable distributed data in the region, especially in sensitive sectors like healthcare, while ensuring privacy and addressing ethical concerns.
A CMU researcher, Dr. Hongyi Wang, presented an evaluation of gradient compression methods in distributed training, finding limited speedup in most realistic setups. The research identifies the root causes and proposes desirable properties for gradient compression methods to provide significant speedup. The talk was promoted by MBZUAI. Why it matters: Understanding the limitations of gradient compression techniques can help optimize distributed training strategies for AI models in the region.
A talk at MBZUAI discussed federated learning, a distributed machine learning approach training models over devices while keeping data localized. The presentation covered a straggler-resilient federated learning scheme using adaptive node participation to tackle system heterogeneity. It also presented a robust optimization formulation for addressing data heterogeneity and a new algorithm for personalizing learned models. Why it matters: Federated learning is crucial for AI applications involving decentralized data sources, and research on improving its robustness and personalization is essential for real-world deployment in the region.
This paper introduces DaringFed, a novel dynamic Bayesian persuasion pricing mechanism for online federated learning (OFL) that addresses the challenge of two-sided incomplete information (TII) regarding resources. It formulates the interaction between the server and clients as a dynamic signaling and pricing allocation problem within a Bayesian persuasion game, demonstrating the existence of a unique Bayesian persuasion Nash equilibrium. Evaluations on real and synthetic datasets demonstrate that DaringFed optimizes accuracy and convergence speed and improves the server's utility.
MBZUAI and KAUST researchers collaborated to present new optimization methods at ICML 2024 for composite and distributed machine learning settings. The study addresses challenges in training large models due to data size and computational power. Their work focuses on minimizing the "loss function" by adjusting internal trainable parameters, using techniques like gradient clipping. Why it matters: This research contributes to the ongoing advancement of machine learning optimization, crucial for improving the performance and efficiency of AI models in the region and globally.
This paper presents a reinforcement learning framework for optimizing energy pricing in peer-to-peer (P2P) energy systems. The framework aims to maximize the profit of all components in a microgrid, including consumers, prosumers, the service provider, and a community battery. Experimental results on the Pymgrid dataset demonstrate the approach's effectiveness in price optimization, considering the interests of different components and the impact of community battery capacity.