Immediate Communication for Distributed AI Tasks

辛继灏, Seongjong Bae, KyoungSoo Park, Marco Canini, Changho Hwang

二月, 2024

摘要

Large AI models have necessitated efficient communication strategies across multi-GPU and multi-node infrastructures due to their increasing complexity. Current methods focusing on inter-operator overlaps fail when dependencies exist, leading to underutilized hardware. DistFuse addresses this by enabling fine-grained overlapping of computation and communication, triggering communication as soon as data is ready, and reducing latency. Initial experiments show up to 44.3% reduction in communication latency of Llama3-70B inference on a single node, demonstrating its potential to accelerate diverse AI workloads.

类型

会议文章

出版物

SOSP Workshop on Hot Topics in System Infrastructure

Immediate Communication for Distributed AI Tasks

摘要

辛继灏

Ph.D. Student in Machine Learning Systems