【PaddlePaddle Hackathon 2】48、在 Paddle 中实现基于 DBTree 拓扑的 AllReduce #40284

TCChenlong · 2022-03-08T13:09:18Z

（此 ISSUE 为 PaddlePaddle Hackathon 第二期活动的任务 ISSUE，更多详见【PaddlePaddle Hackathon 第二期】任务总览）

【任务说明】

任务标题：在 Paddle 中实现基于 DBTree 拓扑的 AllReduce
技术标签：深度学习框架，C++，通信拓扑
任务难度：困难
详细描述： DBTree 主要思想是利用二叉树中大约一半节点是叶子节点的特性，通过将叶子节点变换为非叶子节点，得到两颗二叉树，每个节点在其中一颗二叉树上是叶子节点，在另一颗二叉树上是非叶子节点。这种方法理论上能够提供比 Ring 算法更低的延迟。此任务的目标是在 Paddle 分布式训练框架中，实现 DBTree 结构的 AllReduce，除使用 GPU 进行训练外，最好也能支持其他异构硬件。

NCCL 参考：https://developer.nvidia.com/blog/massively-scale-deep-learning-training-nccl-2-4/

【提交内容】

设计文档，并提 PR 至 PaddlePaddle/community 的 rfcs/Framework 目录
C ++ 实现代码，在 Paddle repo 的 fluid/operators/collective 目录
Python 实现代码 & 英文 API 文档，在 Paddle repo 的 python/paddle/distributed/collective 目录
单测代码，在 Paddle repo 的 python/paddle/fluid/tests/unittests 目录

【参考内容】

【技术要求】

了解 Paddle 分布式训练框架
熟练掌握 C++ 、Python
熟悉模型训练和集合通信实现、DBTree 通信算法

【答疑交流】

如果在开发中对于上述任务有任何问题，欢迎在本 ISSUE 下留言交流。
对于开发中的共性问题，在活动过程中，会定期组织答疑，请大家关注官网&QQ群的通知，及时参与。

paddle-bot-old · 2022-03-08T13:10:05Z

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

paddle-bot · 2023-03-28T06:31:12Z

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

paddle-bot-old bot assigned yingyibiao Mar 8, 2022

TCChenlong mentioned this issue Mar 8, 2022

【PaddlePaddle Hackathon 第二期】任务总览 #40234

Closed

TCChenlong added the PaddlePaddle Hackathon label Mar 9, 2022

TCChenlong unassigned yingyibiao Mar 25, 2022

paddle-bot bot closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【PaddlePaddle Hackathon 2】48、在 Paddle 中实现基于 DBTree 拓扑的 AllReduce #40284

【PaddlePaddle Hackathon 2】48、在 Paddle 中实现基于 DBTree 拓扑的 AllReduce #40284

TCChenlong commented Mar 8, 2022 •

edited

Loading

paddle-bot-old bot commented Mar 8, 2022

paddle-bot bot commented Mar 28, 2023

【PaddlePaddle Hackathon 2】48、在 Paddle 中实现基于 DBTree 拓扑的 AllReduce #40284

【PaddlePaddle Hackathon 2】48、在 Paddle 中实现基于 DBTree 拓扑的 AllReduce #40284

Comments

TCChenlong commented Mar 8, 2022 • edited Loading

paddle-bot-old bot commented Mar 8, 2022

paddle-bot bot commented Mar 28, 2023

TCChenlong commented Mar 8, 2022 •

edited

Loading