You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
(此 ISSUE 为 PaddlePaddle Hackathon 活动的任务 ISSUE,更多详见PaddlePaddle Hackathon)
【任务说明】
任务标题:在 Paddle 中实现基于 DBTree 拓扑的 AllReduce
技术标签:深度学习框架,C++,通信拓扑
任务难度:困难
详细描述: DBTree 主要思想是利用二叉树中大约一半节点是叶子节点的特性,通过将叶子节点变换为非叶子节点,得到两颗二叉树,每个节点在其中一颗二叉树上是叶子节点,在另一颗二叉树上是非叶子节点。这种方法理论上是能够提供比 Ring 算法更低的延迟。此任务的目标是在 Paddle 分布式训练框架中,实现 DBTree 结构的 AllReduce,除使用 GPU 进行训练外,最好也能支持其他异构硬件。
NCCL 参考:https://developer.nvidia.com/blog/massively-scale-deep-learning-training-nccl-2-4/
【提交内容】
任务提案
任务 PR 到 Paddle
相关技术文档
任务单测文件
【技术要求】
了解 Paddle 分布式训练框架
熟练掌握 C++ 、Python
熟悉模型训练和集合通信实现、DBTree 通信算法
The text was updated successfully, but these errors were encountered: