-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.112】 PR.md #4463
【Hackathon No.112】 PR.md #4463
Conversation
【队名】:xd_no-bad 【序号】:112 【状态】:PR提交
Thanks for your contribution! |
``` | ||
|
||
## (3)对比两者的易用性与区别 | ||
Pytorch的分布式环境在曙光平台安装时需要手动编译torchversion,这一点上pytorch比较繁琐。但是pytorch的环境在曙光平台比较稳定,而paddle环境在曙光平台经常不稳定,有时候能运行,有时候不能运行。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
『paddle环境在曙光平台经常不稳定,有时候能运行,有时候不能运行』为了后续paddle改善易用性,请补充详细点的不稳定的现象及问题描述
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您好已经添加了一些报错图片,并把错误汇总到了最后一章节
添加了一些报错图片。 『paddle环境在曙光平台经常不稳定,有时候能运行,有时候不能运行』给出了不稳定的现象的报错图片。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经添加了一些报错图片
``` | ||
|
||
## (3)对比两者的易用性与区别 | ||
Pytorch的分布式环境在曙光平台安装时需要手动编译torchversion,这一点上pytorch比较繁琐。但是pytorch的环境在曙光平台比较稳定,而paddle环境在曙光平台经常不稳定,有时候能运行,有时候不能运行。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您好已经添加了一些报错图片,并把错误汇总到了最后一章节
您好,已经对pr进行修改,添加了一些报错图片,已经添加了一些文字描述
https://github.com/PaddlePaddle/docs/pull/4463/files?short_path=a9cfb5f#diff-a9cfb5fe61e19af4915ff5149098a3f3558f8d055ed588969af5d6ee116a0a57
| |
缪孔苗
|
|
***@***.***
|
签名由网易邮箱大师定制
On 4/20/2022 ***@***.***> wrote:
@xymyeah requested changes on this pull request.
In docs/eval/【Hackathon No.112】 PR.md:
+- 13、故需要提前下载whl文件,下载链接:
+```python
+https://www.paddlepaddle.org.cn/whl/rocm/stable.whl
+```
+- 14、paddlepaddle_rocm-2.2.2-cp37-cp37m-linux_x86_64.whl,版本经过测试可以安装。安装指令:
+```python
+pip install paddlepaddle_rocm-2.2.2-cp37-cp37m-linux_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple/
+```
+- 15、在安装完上述操作后还需要手动安装两个库opencv-python以及scipy
+```python
+pip install scipy -i https://pypi.tuna.tsinghua.edu.cn/simple/
+pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple/
+```
+
+## (3)对比两者的易用性与区别
+Pytorch的分布式环境在曙光平台安装时需要手动编译torchversion,这一点上pytorch比较繁琐。但是pytorch的环境在曙光平台比较稳定,而paddle环境在曙光平台经常不稳定,有时候能运行,有时候不能运行。
『paddle环境在曙光平台经常不稳定,有时候能运行,有时候不能运行』为了后续paddle改善易用性,请补充详细点的不稳定的现象及问题描述
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
![image](https://user-images.githubusercontent.com/102226413/164143166-cde2793b-eb06-43a3-92d1-bfa68c2f1558.png) | ||
|
||
|
||
另外,我们在曙光上使用paddle的方法为开启镜像的方式,但是曙光平台对docker镜像的支持不太好,每次镜像保持的时间最多为72小时,而且每次关闭镜像后,无法重新开启原先镜像。为了方便使用,希望能够支持 任务提交方式运行的paddle分布式框架。而且任务提交的方式还方便管理多节点运行。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5、未解决问题(无法在曙光上使用paddle 的问题) | ||
![image](https://user-images.githubusercontent.com/102226413/164143125-70d0e4ff-46d7-4461-8cb0-72c14e98b8e0.png) | ||
|
||
![image](https://user-images.githubusercontent.com/102226413/164143166-cde2793b-eb06-43a3-92d1-bfa68c2f1558.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第二个问题可以配置上
export NCCL_IB_HCA=mlx5_0
export NCCL_SOCKET_IFNAME=eno1
export NCCL_IB_DISABLE=0
试试,
或者用export NCCL_IB_DISABLE=1禁用相关配置
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
【队名】:xd_no-bad
【序号】:112
【状态】:PR提交