Fix work around #906 #96

fcharras · 2023-02-27T14:09:07Z

We were having more and more occurences of IntelPython/numba-dpex#906 to the point that it felt counter-productive to continue developing with numba_dpex until it's fixed.

After torturing the reproducers it seems I've found a way to consistently work around the bug, without having to change the logic of the kernels.

All kernels that contain barriers seems to be affected. The workaround basically consists in moving instructions that contain array.__setitem__ calls (or atomic functions) in those kernels to dpex.func device functions.

The added unit test in this PR demonstrate how it works on a small example. I will also expand on it in IntelPython/numba-dpex#906 .

The minimal reproducer show that not all such instructions seems to require to be moved to device functions, only replacing a select few ones seems to be enough to make the bugs disappear, but since the trigger condition remain unclear for this bug I went the safe way and replaced them all.

The downside is of course readability, but I don't think we should try to improve it now (having the constraint of incorporating those unnatural dpex.func factorizations make it very challenging anyway) let's rather move on and hope the bug is fixed upstream in the coming month so we can revert all the hacks.

Three more unrelated fixes have infiltrated the PR:

A dtype mismatch that fails some sklearn tests when the gpu does not support float64 compute
I noticed that our tolerance threshold was severely off for large datasets (we forgot to divide it by n_samples...), not sure how we let this slip through
On my machine, the scikit-learn test test_kmeans_verbose[0-lloyd] suddenly decided that from now on it should fail half of the time. Its reason is a mechanism I'm skeptical about in scikit-learn that is called "strict convergence" that is met by comparing labels that have been computed by two successive iterations of lloyd. I had to implement the exact same behavior in our lloyd to consistently pass the test, but I think it should only be done when tol == 0 (which is enough to pass the tests), see arguments in an inline comment in the PR.

Co-authored-by: Julien Jerphanion <[email protected]>

Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

…atterns to `dpex.func` functions

ogrisel

LGTM. Just some nits:

ogrisel · 2023-02-27T17:15:23Z

sklearn_numba_dpex/kmeans/drivers.py

+        # TODO: open an issue at `scikit-learn` and propose to adopt this behavior
+        # instead ?


Suggested change

# TODO: open an issue at `scikit-learn` and propose to adopt this behavior

# instead ?

# See: https://github.com/scikit-learn/scikit-learn/issues/25716

ogrisel · 2023-02-27T17:15:47Z

sklearn_numba_dpex/kmeans/drivers.py

+        # behavior if and only if `tol == 0`, because in this case, it is easy to see
+        # that lloyd can indeed fail to stop at the right time due to numerical errors.
+        # Moreover, this is enough to pass scikit-learn unit tests. When `tol > 0`, we
+        # rely on the user setting an appropriate tolerance threshold.


I would rather track the exact behavior of scikit-learn w.r.t. convergence checks.

ogrisel · 2023-02-27T17:18:21Z

sklearn_numba_dpex/patches/tests/test_patches.py

@@ -86,3 +87,82 @@ def test_spirv_fix():
                kmeans.fit(X_array)
        finally:
            _load_numba_dpex_with_patches()
+
+
+def test_hack_906():


Let's use an explicit function name:

Suggested change

def test_hack_906():

def test_need_to_workaround_numba_dpex_906():

ogrisel · 2023-02-27T17:22:01Z

sklearn_numba_dpex/kmeans/kernels/compute_euclidean_distances.py

+        sample_idx,                 # PARAM
+        first_centroid_idx,         # PARAM
+        euclidean_distances_t,      # IN
+        sq_distances                # OUT


Please swap the last two args as sq_distances is the input and euclidean_distances_t is the output.

good catch, TY

…+ clarity Co-authored-by: Olivier Grisel <[email protected]>

jjerphan

LGTM.

Thank you, @fcharras.

jjerphan · 2023-02-28T10:26:51Z

sklearn_numba_dpex/common/kernels.py

+    local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()
+    global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()


Why do we need two of those and not just one? Is it for this reason?

Suggested change

local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

# HACK: must define twice to work around the bug highlighted in

# test_regression_fix

local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

No need to worry too much about this, the hack will be removed everywhere after bump to 0.20.0dev3, the PR has been opened #93, and hope can be merged soon after this one

jjerphan · 2023-02-28T10:27:12Z

sklearn_numba_dpex/common/kernels.py

+    local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()
+    global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()


Why do we need two of those and not just one? Is it for this reason?

Suggested change

local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

# HACK: must define twice to work around the bug highlighted in

# test_regression_fix

local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

jjerphan · 2023-02-28T10:32:04Z

sklearn_numba_dpex/patches/tests/test_patches.py

+    assert dpt.asnumpy(result)[0] == 10
+
+
+# HACK 906


Suggested change

# HACK 906

# HACK 906: see sklearn_numba_dpex.patches.tests.test_patches.test_need_to_workaround_numba_dpex_906 # noqa

jjerphan · 2023-02-28T10:33:17Z

sklearn_numba_dpex/common/kernels.py

-                argmin_indices[group_id] = local_argmin[zero_idx]
-            else:
-                argmin_indices[group_id] = local_argmin[one_idx]
+    # HACK 906


Suggested change

# HACK 906

# HACK 906: see sklearn_numba_dpex.patches.tests.test_patches.test_need_to_workaround_numba_dpex_906 # noqa

jjerphan · 2023-02-28T10:40:00Z

sklearn_numba_dpex/kmeans/drivers.py

+        # Thus, shouldnt strict convergence checking be enabled only if `tol == 0` ?
+        # (which is, moreover, the only case where strict convergence really is tested
+        # in scikit learn)


I think it must in this case for consistency, yes.

jjerphan · 2023-02-28T10:41:54Z

sklearn_numba_dpex/kmeans/drivers.py

+        # ???: if two successive assignations have been computed equal, it's called
+        # "strict convergence" and means that the algorithm has converged and can't get


Naive question for a better understanding: What is the meaning of the ??? here? Is it a question to be resolved? 🙂

it opens the discsussion regarding diverging from sklearn behavior or considering changing it upstream (scikit-learn/scikit-learn#25716)

jjerphan · 2023-02-28T10:45:27Z

sklearn_numba_dpex/kmeans/drivers.py

+        # Providing the user chooses a sensible value for `tol`, wouldn't the cost of
+        # this check be in general greater than what the benefits ?


I think both checks (i.e. the one using tol and the one checking for equality) should have the same computational cost. What do you think?

strict convergence checking has a additional cost that scales linearly with the amount of points in the data

jjerphan · 2023-02-28T10:46:10Z

sklearn_numba_dpex/patches/tests/test_patches.py

+    rationale = """If this test fails, it means that the bug reported at
+    https://github.com/IntelPython/numba-dpex/issues/906 has been fixed, and all the
+    hacks tags with `# HACK 906` that were used to work around it can now be removed.
+    This test can also be removed.
+    """


Thanks. 👍

jjerphan · 2023-02-28T10:53:53Z

sklearn_numba_dpex/kmeans/drivers.py

@@ -539,7 +582,7 @@ def prepare_data_for_lloyd(X_t, init, tol, sample_weight, copy_x):

    variance = variance_kernel(dpt.reshape(X_t, -1))
    # Use numpy type to work around https://github.com/IntelPython/dpnp/issues/1238
-    tol = (dpt.asnumpy(variance)[0] / n_features) * tol
+    tol = (dpt.asnumpy(variance)[0] / (n_features * n_samples)) * tol


Side-note: Is this change related to the scope of this PR?

I am fine with it being integrated as part of this PR.

Not related (but a one line fix)

Co-authored-by: Julien Jerphanion <[email protected]>

fcharras · 2023-02-28T11:07:24Z

TY for the reviews @jjerphan @ogrisel I've applied your suggestions and answered the question.

I've added an additional commit that fuse strict convergence checking with the main lloyd kernel.

I'll go ahead and merge if it's green after launch.

fcharras · 2023-02-28T16:29:26Z

So I noticed that using 2D work group size harms performance (kernels 30% slower). The last commit revert the 2D work group size diff from #88 . I will open afterward a revert PR for this revert commit, maybe if things are fixed in numba_dpex regarding 2D group sizes we can merge again later. (theoritically, it should be better)

fcharras and others added 13 commits February 6, 2023 11:08

wip: leftover enhancements

066d614

wip: apply 2D grid of work items to kmeans kernels

9d554db

wip: apply 2D grid of work items to kmeans kernels

7d2e719

Small enhancements accross repo

7b1da9e

empty commit for github refresh ?

05fa4a9

bump isort to version that fixes install issue at PyCQA/isort#2077

49dad1e

Fix (hack) issue with some work group sizes on CPU

28513cc

Apply suggestions

d70ab27

Co-authored-by: Julien Jerphanion <[email protected]>

fixup sum with wgs 1

a3a7530

hack group sizes for sum(axis=0) on cpu

f9e0994

investigating sum(axis=0) cpu issue

9bbdfd2

investigating sum(axis=0) cpu issue #2

90725db

Apply suggestions

33ef2dc

Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

fcharras changed the base branch from main to minor_changes_enh February 27, 2023 14:09

work around IntelPython/numba-dpex#906 by moving bugged instruction p…

a82b10d

…atterns to `dpex.func` functions

fcharras force-pushed the FIX_work_around_#906 branch from 563d59e to a82b10d Compare February 27, 2023 15:57

fcharras marked this pull request as ready for review February 27, 2023 15:57

fcharras requested review from jjerphan and ogrisel February 27, 2023 15:59

ogrisel approved these changes Feb 27, 2023

View reviewed changes

Use same implementation than sklearn for strict convergence checking …

89a8bdb

…+ clarity Co-authored-by: Olivier Grisel <[email protected]>

jjerphan approved these changes Feb 28, 2023

View reviewed changes

jjerphan reviewed Feb 28, 2023

View reviewed changes

fcharras and others added 2 commits February 28, 2023 11:58

fuse strict convergence checking in in lloyd with the main kernel

3f84788

Address jjerphan comments

6d2b205

Co-authored-by: Julien Jerphanion <[email protected]>

fcharras added 2 commits February 28, 2023 13:54

fixup

7ba4a2c

Revert use of 2D work group size

9e95e0c

nit revert removal of comment about 2D grid

b45d6b1

fcharras changed the base branch from minor_changes_enh to main February 28, 2023 16:40

fcharras merged commit 7b647cb into main Feb 28, 2023

fcharras deleted the FIX_work_around_#906 branch February 28, 2023 16:41

fcharras mentioned this pull request Feb 28, 2023

ENH small cleaning and optimizations accross the repo #88

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix work around #906 #96

Fix work around #906 #96

fcharras commented Feb 27, 2023 •

edited

Loading

ogrisel left a comment

ogrisel Feb 27, 2023

ogrisel Feb 27, 2023

ogrisel Feb 27, 2023

ogrisel Feb 27, 2023

fcharras Feb 28, 2023

jjerphan left a comment

jjerphan Feb 28, 2023

fcharras Feb 28, 2023

jjerphan Feb 28, 2023

jjerphan Feb 28, 2023

jjerphan Feb 28, 2023

jjerphan Feb 28, 2023

jjerphan Feb 28, 2023 •

edited

Loading

fcharras Feb 28, 2023

jjerphan Feb 28, 2023

fcharras Feb 28, 2023 •

edited

Loading

jjerphan Feb 28, 2023

jjerphan Feb 28, 2023 •

edited

Loading

fcharras Feb 28, 2023

fcharras commented Feb 28, 2023

fcharras commented Feb 28, 2023

		# TODO: open an issue at `scikit-learn` and propose to adopt this behavior
		# instead ?

	# TODO: open an issue at `scikit-learn` and propose to adopt this behavior
	# instead ?
	# See: https://github.com/scikit-learn/scikit-learn/issues/25716

	def test_hack_906():
	def test_need_to_workaround_numba_dpex_906():

		local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()
		global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func()

	# HACK 906
	# HACK 906: see sklearn_numba_dpex.patches.tests.test_patches.test_need_to_workaround_numba_dpex_906 # noqa

		# ???: if two successive assignations have been computed equal, it's called
		# "strict convergence" and means that the algorithm has converged and can't get

		# Providing the user chooses a sensible value for `tol`, wouldn't the cost of
		# this check be in general greater than what the benefits ?

Fix work around #906 #96

Fix work around #906 #96

Conversation

fcharras commented Feb 27, 2023 • edited Loading

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcharras Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcharras commented Feb 28, 2023

fcharras commented Feb 28, 2023

fcharras commented Feb 27, 2023 •

edited

Loading

jjerphan Feb 28, 2023 •

edited

Loading

fcharras Feb 28, 2023 •

edited

Loading

jjerphan Feb 28, 2023 •

edited

Loading