Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which version of transformer engine should I use, when I try to open ub_tp_comm_overlap? #11683

Closed
sallyjunjun opened this issue Dec 20, 2024 · 2 comments
Assignees
Labels

Comments

@sallyjunjun
Copy link

I am using NeMo with version v2.0.0rc0. When I set ub_tp_comm_overlap to true with tp and sp 2, I met the following error:
Image

The version of transformer engine 1.6.0+c81733f. Should I update to newer te version?

When I update transformer engine to 1.13.0+e5edd6c. There occurs another error in NeMo:
Image

CUDA 11.8 should be used in NeMo. But transformer engine in 1.13.0+e5edd6c version requires CUDA newer than 12.0.

I'm stuck in these version issues.
Could you please tell me which version of NeMo and TE and CUDA should I use to enable tp_comm_overlap feature?

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jan 20, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants