Global QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

摘要

The increase of deep learning models and dataset sizes make training time-consuming, while heavy communication, primarily due to gradients synchronization as a key bottleneck. Sapio et al. [7] show that communication can take up to 90% of a training iteration. A popular remedy is to reduce the size of communication data between nodes by applying gradient compression methods [1, 3, 6, 8, 9, 11]. Unfortunately, a majority of the proposed compressors are not natively compatible with the AllReduce collective communication primitive because of the change in data format and the need for custom reduction operations. To the best of our knowledge, the only compressors compatible with AllReduce are PowerSGD [10] and IntSGD [5, 7]. However, practical implementations of these methods are heuristic-based and do not come with rigorous theoretical guarantees. Concurrently, C-Coll [4] proposes error-bounded lossy compression with MPI collectives. We address this question can we provide theoretical guarantees for gradient compression while retaining AllReduce compatibility for an efficient implementation?

类型
出版物
EuroSys 2023 Poster
辛继灏
辛继灏
Ph.D. Student in Machine Learning Systems