Force use of torch.compile on deterministic roi_align implementation #8436

ezyang · 2024-05-21T15:19:21Z

Fixes #8168

Signed-off-by: Edward Z. Yang [email protected]

Signed-off-by: Edward Z. Yang <[email protected]>

pytorch-bot · 2024-05-21T15:19:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8436

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 1 Unrelated Failure

As of commit ee25749 with merge base 775dd2d ():

NEW FAILURES - The following jobs have failed:

CMake / macos (macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
CMake / windows (windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-macos (3.10, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.11, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.12, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.8, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.9, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-windows (3.10, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.11, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.12, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.8, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.9, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Tests / unittests-linux (3.8, linux.g5.4xlarge.nvidia.gpu, cuda, 11.8) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2024-05-21T15:21:37Z

cc @qqaatw, I removed the MPS knob because of how memory hungry the eager implementation is, I doubt torch.compile works on MPS.

Signed-off-by: Edward Z. Yang <[email protected]>

NicolasHug

Thanks @ezyang , 2 questions below but LGTM anyway. Unfortunately the MPS-related tests are all toasted (#8433), it's not related to this PR.

NicolasHug · 2024-05-23T12:05:21Z

torchvision/ops/roi_align.py

+def lazy_compile(**compile_kwargs):
+    """Lazily wrap a function with torch.compile on the first call
+
+    This avoids eagerly importing dynamo.


Am I understanding this correctly?

Suggested change

This avoids eagerly importing dynamo.

This avoids eagerly compiling a function at import time.

Nope. Even with torch.compile at top level it isn't compiled until you call it the first time. But importing dynamo has undesirable side effects for eager mode only users so it's best not to do it.

NicolasHug · 2024-05-23T12:05:47Z

torchvision/ops/roi_align.py

@@ -232,7 +250,9 @@ def roi_align(
    if not isinstance(rois, torch.Tensor):
        rois = convert_boxes_to_roi_format(rois)
    if not torch.jit.is_scripting():
-        if not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps)):
+        if (
+            not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps))


Should we just remove the mps part here since you mentioned MPS doesn't even work with torch.compile?

Suggested change

not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps))

not _has_ops() or (torch.are_deterministic_algorithms_enabled() and input.is_cuda)

I opted to keep it around, because it was explicitly added by @qqaatw, but I don't really mind either way

Sorry for the late reply! I'm ok with either way that is best for the development. From the mentioned issue it seems only relevant to CUDA, is MPS similarly memory hungry with deterministic algorithm?

…entation (#8436) Summary: Signed-off-by: Edward Z. Yang <[email protected]> Reviewed By: vmoens Differential Revision: D58283855 fbshipit-source-id: 914a91877c193b38f29af450a5935dd1ab5b20d7 Co-authored-by: Nicolas Hug <[email protected]>

Force use of torch.compile on deterministic roi_align implementation

3e2bf5d

Signed-off-by: Edward Z. Yang <[email protected]>

facebook-github-bot added the cla signed label May 21, 2024

ezyang requested a review from NicolasHug May 21, 2024 15:21

ezyang and others added 5 commits May 21, 2024 08:23

Fixup

4237c18

Signed-off-by: Edward Z. Yang <[email protected]>

fixup

0da86f1

Signed-off-by: Edward Z. Yang <[email protected]>

skip broken mps

d6e3353

Signed-off-by: Edward Z. Yang <[email protected]>

skip harder

0d8c510

Signed-off-by: Edward Z. Yang <[email protected]>

lint

31a78a9

NicolasHug approved these changes May 23, 2024

View reviewed changes

JohannesTheo mentioned this pull request May 29, 2024

OOM Error with roi_align in PyTorch 2.1.1 but fine in PyTorch 2.0.1 #8168

Open

NicolasHug added 2 commits May 29, 2024 12:44

Merge branch 'main' into roi-align-compile

9e56286

Merge branch 'main' into roi-align-compile

ee25749

NicolasHug added enhancement module: ops labels May 29, 2024

NicolasHug merged commit a5f531a into pytorch:main May 29, 2024
51 of 64 checks passed

NicolasHug mentioned this pull request May 29, 2024

Remove unused dynamo import #8451

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force use of torch.compile on deterministic roi_align implementation #8436

Force use of torch.compile on deterministic roi_align implementation #8436

ezyang commented May 21, 2024 •

edited

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading

ezyang commented May 21, 2024

NicolasHug left a comment

NicolasHug May 23, 2024

ezyang May 29, 2024

NicolasHug May 23, 2024

ezyang May 29, 2024

qqaatw May 29, 2024

	This avoids eagerly importing dynamo.
	This avoids eagerly compiling a function at import time.

	not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps))
	not _has_ops() or (torch.are_deterministic_algorithms_enabled() and input.is_cuda)

Force use of torch.compile on deterministic roi_align implementation #8436

Force use of torch.compile on deterministic roi_align implementation #8436

Conversation

ezyang commented May 21, 2024 • edited Loading

pytorch-bot bot commented May 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8436

❌ 12 New Failures, 1 Unrelated Failure

ezyang commented May 21, 2024

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug May 23, 2024

Choose a reason for hiding this comment

ezyang May 29, 2024

Choose a reason for hiding this comment

NicolasHug May 23, 2024

Choose a reason for hiding this comment

ezyang May 29, 2024

Choose a reason for hiding this comment

qqaatw May 29, 2024

Choose a reason for hiding this comment

ezyang commented May 21, 2024 •

edited

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading