Soufiane Hayou of the National University of Singapore presented a talk at MBZUAI on principled scaling of neural networks. The talk covered leveraging mathematical results to efficiently scale neural networks. He obtained his PhD in statistics in 2021 from Oxford. Why it matters: Understanding neural network scaling is crucial for developing more efficient and powerful AI models in the region.
Axel Sauer from the University of Tübingen presented research on scaling Generative Adversarial Networks (GANs) using pretrained representations. The work explores shaping GANs into causal structures, training them up to 40 times faster, and achieving state-of-the-art image synthesis. The presentation mentions "Counterfactual Generative Networks", "Projected GANs", "StyleGAN-XL”, and “StyleGAN-T". Why it matters: Scaling GANs and improving their training efficiency is crucial for advancing image and video synthesis, with implications for various applications in computer vision, graphics, and robotics.
KAUST is hosting a workshop on distributed training in November 2025, led by Professors Peter Richtarik and Marco Canini, focusing on scaling large models like LLMs and ViTs. Richtarik's team recently solved a 75-year-old problem in asynchronous optimization, developing time-optimal stochastic gradient descent algorithms. This research improves the speed and reliability of large model training and supports applications in distributed and federated learning. Why it matters: KAUST's focus on scalable AI and federated learning contributes to Saudi Arabia's Vision 2030 goals and addresses critical challenges in AI deployment and data privacy.
Xiaolin Huang from Shanghai Jiao Tong University presented a talk at MBZUAI on training deep neural networks in tiny subspaces. The talk covered the low-dimension hypothesis in neural networks and methods to find subspaces for efficient training. It suggests that training in smaller subspaces can improve training efficiency, generalization, and robustness. Why it matters: Investigating efficient training methods is crucial for resource-constrained environments and can enable broader access to advanced AI.
A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.