Skip to content
GCC AI Research

Topics

Distributed Systems

1 article RSS ↗

Programmable Networks for Distributed Deep Learning: Advances and Perspectives

MBZUAI · · Infrastructure Research

A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.