-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix work around #906 #96
Conversation
Co-authored-by: Julien Jerphanion <[email protected]>
Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
…atterns to `dpex.func` functions
563d59e
to
a82b10d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just some nits:
sklearn_numba_dpex/kmeans/drivers.py
Outdated
# TODO: open an issue at `scikit-learn` and propose to adopt this behavior | ||
# instead ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: open an issue at `scikit-learn` and propose to adopt this behavior | |
# instead ? | |
# See: https://github.com/scikit-learn/scikit-learn/issues/25716 |
sklearn_numba_dpex/kmeans/drivers.py
Outdated
# behavior if and only if `tol == 0`, because in this case, it is easy to see | ||
# that lloyd can indeed fail to stop at the right time due to numerical errors. | ||
# Moreover, this is enough to pass scikit-learn unit tests. When `tol > 0`, we | ||
# rely on the user setting an appropriate tolerance threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather track the exact behavior of scikit-learn w.r.t. convergence checks.
@@ -86,3 +87,82 @@ def test_spirv_fix(): | |||
kmeans.fit(X_array) | |||
finally: | |||
_load_numba_dpex_with_patches() | |||
|
|||
|
|||
def test_hack_906(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use an explicit function name:
def test_hack_906(): | |
def test_need_to_workaround_numba_dpex_906(): |
sample_idx, # PARAM | ||
first_centroid_idx, # PARAM | ||
euclidean_distances_t, # IN | ||
sq_distances # OUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please swap the last two args as sq_distances
is the input and euclidean_distances_t
is the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, TY
…+ clarity Co-authored-by: Olivier Grisel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you, @fcharras.
local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | ||
global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two of those and not just one? Is it for this reason?
local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | |
global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | |
# HACK: must define twice to work around the bug highlighted in | |
# test_regression_fix | |
local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | |
global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to worry too much about this, the hack will be removed everywhere after bump to 0.20.0dev3
, the PR has been opened #93, and hope can be merged soon after this one
local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | ||
global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two of those and not just one? Is it for this reason?
local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | |
global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | |
# HACK: must define twice to work around the bug highlighted in | |
# test_regression_fix | |
local_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() | |
global_sum_and_set_items_if = _make_sum_and_set_items_if_kernel_func() |
assert dpt.asnumpy(result)[0] == 10 | ||
|
||
|
||
# HACK 906 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# HACK 906 | |
# HACK 906: see sklearn_numba_dpex.patches.tests.test_patches.test_need_to_workaround_numba_dpex_906 # noqa |
sklearn_numba_dpex/common/kernels.py
Outdated
argmin_indices[group_id] = local_argmin[zero_idx] | ||
else: | ||
argmin_indices[group_id] = local_argmin[one_idx] | ||
# HACK 906 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# HACK 906 | |
# HACK 906: see sklearn_numba_dpex.patches.tests.test_patches.test_need_to_workaround_numba_dpex_906 # noqa |
# Thus, shouldnt strict convergence checking be enabled only if `tol == 0` ? | ||
# (which is, moreover, the only case where strict convergence really is tested | ||
# in scikit learn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it must in this case for consistency, yes.
# ???: if two successive assignations have been computed equal, it's called | ||
# "strict convergence" and means that the algorithm has converged and can't get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question for a better understanding: What is the meaning of the ???
here? Is it a question to be resolved? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it opens the discsussion regarding diverging from sklearn behavior or considering changing it upstream (scikit-learn/scikit-learn#25716)
# Providing the user chooses a sensible value for `tol`, wouldn't the cost of | ||
# this check be in general greater than what the benefits ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both checks (i.e. the one using tol
and the one checking for equality) should have the same computational cost. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strict convergence checking has a additional cost that scales linearly with the amount of points in the data
rationale = """If this test fails, it means that the bug reported at | ||
https://github.com/IntelPython/numba-dpex/issues/906 has been fixed, and all the | ||
hacks tags with `# HACK 906` that were used to work around it can now be removed. | ||
This test can also be removed. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. 👍
@@ -539,7 +582,7 @@ def prepare_data_for_lloyd(X_t, init, tol, sample_weight, copy_x): | |||
|
|||
variance = variance_kernel(dpt.reshape(X_t, -1)) | |||
# Use numpy type to work around https://github.com/IntelPython/dpnp/issues/1238 | |||
tol = (dpt.asnumpy(variance)[0] / n_features) * tol | |||
tol = (dpt.asnumpy(variance)[0] / (n_features * n_samples)) * tol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side-note: Is this change related to the scope of this PR?
I am fine with it being integrated as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related (but a one line fix)
Co-authored-by: Julien Jerphanion <[email protected]>
So I noticed that using 2D work group size harms performance (kernels 30% slower). The last commit revert the 2D work group size diff from #88 . I will open afterward a revert PR for this revert commit, maybe if things are fixed in |
We were having more and more occurences of IntelPython/numba-dpex#906 to the point that it felt counter-productive to continue developing with
numba_dpex
until it's fixed.After torturing the reproducers it seems I've found a way to consistently work around the bug, without having to change the logic of the kernels.
All kernels that contain barriers seems to be affected. The workaround basically consists in moving instructions that contain
array.__setitem__
calls (or atomic functions) in those kernels todpex.func
device functions.The added unit test in this PR demonstrate how it works on a small example. I will also expand on it in IntelPython/numba-dpex#906 .
The minimal reproducer show that not all such instructions seems to require to be moved to device functions, only replacing a select few ones seems to be enough to make the bugs disappear, but since the trigger condition remain unclear for this bug I went the safe way and replaced them all.
The downside is of course readability, but I don't think we should try to improve it now (having the constraint of incorporating those unnatural
dpex.func
factorizations make it very challenging anyway) let's rather move on and hope the bug is fixed upstream in the coming month so we can revert all the hacks.Three more unrelated fixes have infiltrated the PR:
n_samples
...), not sure how we let this slip throughscikit-learn
testtest_kmeans_verbose[0-lloyd]
suddenly decided that from now on it should fail half of the time. Its reason is a mechanism I'm skeptical about inscikit-learn
that is called "strict convergence" that is met by comparing labels that have been computed by two successive iterations of lloyd. I had to implement the exact same behavior in ourlloyd
to consistently pass the test, but I think it should only be done whentol == 0
(which is enough to pass the tests), see arguments in an inline comment in the PR.