Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MAPE to regression metrics (fixes #691) #822

Merged
merged 4 commits into from
Apr 10, 2021
Merged

add MAPE to regression metrics (fixes #691) #822

merged 4 commits into from
Apr 10, 2021

Conversation

jameslamb
Copy link
Member

This PR proposes adding mean_absolute_percentage_error() ("MAPE"), as originally suggested in #691.

It follows the implementation from scikit-learn (https://github.com/scikit-learn/scikit-learn/blob/9cfacf1540a991461b91617c779c69753a1ee4c0/sklearn/metrics/_regression.py#L280), including the use of np.finfo(np.float64).eps in the denominator to prevent divide-by-0 errors.

Notes for reviewers

This PR adds a bit of test coverage by adding mean_absolute_percentage_error() to the metric_pairs fixture in tests. It would automatically get more specific coverage (like for combinations of multioutput and compute) if #820 is accepted.

Thanks for your time and consideration.

@@ -7,12 +7,13 @@
import dask_ml.metrics


@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "r2_score"])
@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])
Copy link
Contributor

@hristog hristog Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])
@pytest.fixture(
params=[
"mean_squared_error",
"mean_absolute_error",
"mean_absolute_percentage_error",
"r2_score",
]
)

Looks like black==19.10b0 isn't happy about the line length here.

Copy link
Contributor

@hristog hristog Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like black==19.10b0 isn't happy about the line length here.

Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable, since

"black",
doesn't have a pin for things like black.

Once i switched to the conda instructions there, I got the expected diff. Updated in 1142fcc.

Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.

Can you clarify what you want me to change? As far as I can tell, that is exactly what happens by adding mean_squared_percentage_error to the metric_pairs fixture. Every metric in that fixture is tested against its scikit-learn equivalent by

assert abs(result - expected) < 1e-5
.

Copy link
Contributor

@hristog hristog Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable

You're absolutely right! I've got a PR over at #813 waiting to be reviewed (for a couple of weeks now), and subsequently merged. It should improve the static-checking situation.

Every metric in that fixture is tested against its scikit-learn equivalent by

Indeed - ignore me about this one, please! I got confused that we should probably further introduce extra tests like the test_mean_squared_log_error one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll bring up the question of whether the setup.py versions of the linters should be pinned, too, in #813.

@jameslamb
Copy link
Member Author

I'm grateful for the linux earliest CI job, because it caught an issue with this PR. The original changes I proposed would have made dask-ml incompatible with scikit-learn < 0.24.

I just pushed b05213b to attempt to address it.

Basically, the use of @derived_from(sklearn.metrics) for the version of dask used in the "earliest" environment is incompatible with scikit-learn v0.23.x, and results in this error:

ImportError while loading conftest '/home/jlamb/repos/open-source/dask-ml/tests/conftest.py'.
tests/conftest.py:9: in <module>
    from dask_ml.datasets import (
dask_ml/__init__.py:4: in <module>
    from dask_ml.model_selection import _normalize
dask_ml/model_selection/__init__.py:6: in <module>
    from ._hyperband import HyperbandSearchCV
dask_ml/model_selection/_hyperband.py:12: in <module>
    from ._incremental import BaseIncrementalSearchCV
dask_ml/model_selection/_incremental.py:32: in <module>
    from ..wrappers import ParallelPostFit
dask_ml/wrappers.py:16: in <module>
    from .metrics import check_scoring, get_scorer
dask_ml/metrics/__init__.py:7: in <module>
    from .regression import (  # noqa
dask_ml/metrics/regression.py:91: in <module>
    ) -> ArrayLike:
../../../miniconda3/envs/dask-ml-earliest/lib/python3.6/site-packages/dask/utils.py:656: in wrapper
    module_name = original_klass.__module__.split(".")[0]
E   AttributeError: module 'sklearn.metrics' has no attribute '__module__'

@derived_from() is only used to inherit documentation, not any functionality, so I copied over the relevant scikit-learn docstring (https://github.com/scikit-learn/scikit-learn/blob/da3c2d2a19ade5ca69adb6952ecace811ed122ff/sklearn/metrics/_regression.py#L283). I'm assuming this is ok since scikit-learn uses BSD 3-clause and since this project bundles the scikit-learn license: https://github.com/dask/dask-ml/blob/main/licenses/scikit-learn/COPYING.

@TomAugspurger
Copy link
Member

Looks good, thanks!

@TomAugspurger TomAugspurger merged commit 27d8d37 into dask:main Apr 10, 2021
@jameslamb jameslamb deleted the feat/mape branch April 10, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants