add MAPE to regression metrics (fixes #691) #822

jameslamb · 2021-04-06T19:59:37Z

This PR proposes adding mean_absolute_percentage_error() ("MAPE"), as originally suggested in #691.

It follows the implementation from scikit-learn (https://github.com/scikit-learn/scikit-learn/blob/9cfacf1540a991461b91617c779c69753a1ee4c0/sklearn/metrics/_regression.py#L280), including the use of np.finfo(np.float64).eps in the denominator to prevent divide-by-0 errors.

Notes for reviewers

This PR adds a bit of test coverage by adding mean_absolute_percentage_error() to the metric_pairs fixture in tests. It would automatically get more specific coverage (like for combinations of multioutput and compute) if #820 is accepted.

Thanks for your time and consideration.

hristog · 2021-04-08T11:15:20Z

tests/metrics/test_regression.py

@@ -7,12 +7,13 @@
 import dask_ml.metrics


-@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "r2_score"])
+@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])


Suggested change

@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])

@pytest.fixture(

params=[

"mean_squared_error",

"mean_absolute_error",

"mean_absolute_percentage_error",

"r2_score",

]

)

Looks like black==19.10b0 isn't happy about the line length here.

Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.

Looks like black==19.10b0 isn't happy about the line length here.

Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable, since

dask-ml/setup.py

Line 29 in f5e5bb4

"black",

doesn't have a pin for things like black.

Once i switched to the conda instructions there, I got the expected diff. Updated in 1142fcc.

Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.

Can you clarify what you want me to change? As far as I can tell, that is exactly what happens by adding mean_squared_percentage_error to the metric_pairs fixture. Every metric in that fixture is tested against its scikit-learn equivalent by

dask-ml/tests/metrics/test_regression.py

Line 37 in f5e5bb4

assert abs(result - expected) < 1e-5

.

Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable

You're absolutely right! I've got a PR over at #813 waiting to be reviewed (for a couple of weeks now), and subsequently merged. It should improve the static-checking situation.

Every metric in that fixture is tested against its scikit-learn equivalent by

Indeed - ignore me about this one, please! I got confused that we should probably further introduce extra tests like the test_mean_squared_log_error one.

I'll bring up the question of whether the setup.py versions of the linters should be pinned, too, in #813.

jameslamb · 2021-04-09T16:50:49Z

I'm grateful for the linux earliest CI job, because it caught an issue with this PR. The original changes I proposed would have made dask-ml incompatible with scikit-learn < 0.24.

I just pushed b05213b to attempt to address it.

Basically, the use of @derived_from(sklearn.metrics) for the version of dask used in the "earliest" environment is incompatible with scikit-learn v0.23.x, and results in this error:

ImportError while loading conftest '/home/jlamb/repos/open-source/dask-ml/tests/conftest.py'.
tests/conftest.py:9: in <module>
    from dask_ml.datasets import (
dask_ml/__init__.py:4: in <module>
    from dask_ml.model_selection import _normalize
dask_ml/model_selection/__init__.py:6: in <module>
    from ._hyperband import HyperbandSearchCV
dask_ml/model_selection/_hyperband.py:12: in <module>
    from ._incremental import BaseIncrementalSearchCV
dask_ml/model_selection/_incremental.py:32: in <module>
    from ..wrappers import ParallelPostFit
dask_ml/wrappers.py:16: in <module>
    from .metrics import check_scoring, get_scorer
dask_ml/metrics/__init__.py:7: in <module>
    from .regression import (  # noqa
dask_ml/metrics/regression.py:91: in <module>
    ) -> ArrayLike:
../../../miniconda3/envs/dask-ml-earliest/lib/python3.6/site-packages/dask/utils.py:656: in wrapper
    module_name = original_klass.__module__.split(".")[0]
E   AttributeError: module 'sklearn.metrics' has no attribute '__module__'

@derived_from() is only used to inherit documentation, not any functionality, so I copied over the relevant scikit-learn docstring (https://github.com/scikit-learn/scikit-learn/blob/da3c2d2a19ade5ca69adb6952ecace811ed122ff/sklearn/metrics/_regression.py#L283). I'm assuming this is ok since scikit-learn uses BSD 3-clause and since this project bundles the scikit-learn license: https://github.com/dask/dask-ml/blob/main/licenses/scikit-learn/COPYING.

TomAugspurger · 2021-04-10T16:27:56Z

Looks good, thanks!

add MAPE to regression metrics (fixes #691)

49b9eb5

hristog suggested changes Apr 8, 2021

View reviewed changes

linting

1142fcc

hristog mentioned this pull request Apr 9, 2021

Pin static-checking versions and add pre-commit Action #813

Open

hristog approved these changes Apr 9, 2021

View reviewed changes

fix compatibility with older scikit-learn

b05213b

linting

99415c7

TomAugspurger merged commit 27d8d37 into dask:main Apr 10, 2021

jameslamb deleted the feat/mape branch April 10, 2021 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add MAPE to regression metrics (fixes #691) #822

add MAPE to regression metrics (fixes #691) #822

jameslamb commented Apr 6, 2021

hristog Apr 8, 2021 •

edited

Loading

hristog Apr 8, 2021 •

edited

Loading

jameslamb Apr 9, 2021

hristog Apr 9, 2021 •

edited

Loading

hristog Apr 9, 2021

jameslamb commented Apr 9, 2021

TomAugspurger commented Apr 10, 2021

add MAPE to regression metrics (fixes #691) #822

add MAPE to regression metrics (fixes #691) #822

Conversation

jameslamb commented Apr 6, 2021

Notes for reviewers

hristog Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

hristog Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

jameslamb Apr 9, 2021

Choose a reason for hiding this comment

hristog Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

hristog Apr 9, 2021

Choose a reason for hiding this comment

jameslamb commented Apr 9, 2021

TomAugspurger commented Apr 10, 2021

hristog Apr 8, 2021 •

edited

Loading

hristog Apr 8, 2021 •

edited

Loading

hristog Apr 9, 2021 •

edited

Loading