分布式训练框架 Reading List
- Horovod: Fast and Easy Distributed Deep Learning in TensorFlow (ArXiv’18) [PDF] [Code]
- 延伸阅读:Horovod - Distributed TensorFlow Made Easy [Original Link]
- 延伸阅读:Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow [Original Link]
- A Generic Communication Scheduler for Distributed DNN Training Acceleration (SOSP’19) [PDF] [阅读笔记]
- 框架:ByteScheduler
- 字节跳动设计的高效的通信调度方法
评论