-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] EHN add a model to fit and predict heart-rate from power #6
base: master
Are you sure you want to change the base?
Changes from 3 commits
5f4f2d4
9d5708f
a60fc36
87eee9c
aa6cf04
7a90197
bad31be
7856abe
f99d2ea
4ae2268
0a3a7fa
446707a
96c4e07
af0c989
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
"""Module to estimate heart rate from data.""" | ||
|
||
# Authors: Aart Goossens | ||
# Guillaume Lemaitre | ||
# License: MIT | ||
|
||
from __future__ import division | ||
|
||
import numpy as np | ||
import pandas as pd | ||
from scipy.optimize import leastsq, minimize | ||
|
||
from sklearn.base import BaseEstimator, RegressorMixin | ||
from sklearn.utils import check_X_y, check_array | ||
from sklearn.utils.validation import check_is_fitted | ||
|
||
|
||
def exp_heart_rate_model(power, hr_start, hr_max, hr_slope, hr_drift, | ||
rate_growth, rate_decay): | ||
"""Model heart-rate from power date based on exponential | ||
physiological-based model. | ||
|
||
The model used is based on [1]_. | ||
|
||
Parameters | ||
---------- | ||
power : ndarray, shape (n_samples,) | ||
The power data in watts. | ||
|
||
hr_start : float | ||
Initial heart-rate frequency. | ||
|
||
hr_max : float | ||
Heart-rate frequency of the athlete at maximum. | ||
|
||
hr_slope : float | ||
Slope considered if the model is linear. FIXME | ||
|
||
hr_drift : float | ||
Attenuation of the heart-rate other time. FIXME | ||
|
||
rate_growth : float | ||
Growth rate of the exponential when the power increases. | ||
|
||
rate_decay : float | ||
Decay rate of the exponential when the power decreases. | ||
|
||
Returns | ||
------- | ||
hr_pred : ndarray, shape (n_samples,) | ||
Prediction of the heart-rate frequencies associated to the power data. | ||
|
||
References | ||
---------- | ||
.. [1] de Smet, Dimitri, et al. "Heart rate modelling as a potential | ||
physical fitness assessment for runners and cyclists." Workshop at ECML | ||
& PKDD. 2016. | ||
|
||
Examples | ||
-------- | ||
>>> import numpy as np | ||
>>> power = np.array([100] * 25 + [240] * 25 + [160] * 25) | ||
>>> from sksports.model import exp_heart_rate_model | ||
>>> hr_pred = exp_heart_rate_model(power, 75, 200, 0.30, 3e-5, 24, 30) | ||
>>> print(hr_pred) # doctest: +ELLIPSIS | ||
[...] | ||
|
||
""" | ||
power = check_array(power, ensure_2d=False) | ||
power_corrected = power + power.cumsum() * hr_drift | ||
hr_lin = power_corrected * hr_slope + hr_start | ||
hr_ss = np.minimum(hr_max, hr_lin) | ||
|
||
diff_hr = np.pad(np.diff(power_corrected), pad_width=1, mode='constant') | ||
rate = np.where((diff_hr <= 0), rate_decay, rate_growth) | ||
|
||
hr_pred = np.ones(power.size) * hr_start | ||
for curr_idx in range(hr_pred.size): | ||
hr_prev = hr_pred[curr_idx - 1] if curr_idx > 0 else hr_start | ||
hr_pred[curr_idx] = (hr_prev + | ||
(hr_ss[curr_idx] - hr_prev) / rate[curr_idx]) | ||
|
||
return hr_pred | ||
|
||
|
||
def _model_residual(params, heart_rate, power): | ||
"""Residuals between prediction and ground-truth.""" | ||
return heart_rate - exp_heart_rate_model(power, *params) | ||
|
||
|
||
def _model_residual_least_square(params, heart_rate, power): | ||
"""Least-square error of the residuals.""" | ||
return np.sum(_model_residual(params, heart_rate, power) ** 2) | ||
|
||
|
||
class HeartRateRegressor(BaseEstimator, RegressorMixin): | ||
"""Estimate the heart-rate from power data. | ||
|
||
The model used is based on the formulation in [1]_. | ||
|
||
Parameters | ||
---------- | ||
hr_start : float, default=75 | ||
Initial heart-rate frequency. | ||
|
||
hr_max : float, default=200 | ||
Heart-rate frequency of the athlete at maximum. | ||
|
||
hr_slope : float, default=0.30 | ||
Slope considered if the model is linear. FIXME | ||
|
||
hr_drift : float, default=3e-5 | ||
Attenuation of the heart-rate other time. FIXME | ||
|
||
rate_growth : float, default=24 | ||
Growth rate of the exponential when the power increases. | ||
|
||
rate_decay : float, default=30 | ||
Decay rate of the exponential when the power decreases. | ||
|
||
Attributes | ||
---------- | ||
hr_start_ : float | ||
Fitted initial heart-rate frequency. | ||
|
||
hr_max_ : float | ||
Fitted heart-rate frequency of the athlete at maximum. | ||
|
||
hr_slope_ : float | ||
Fitted slope considered if the model is linear. FIXME | ||
|
||
hr_drift_ : float | ||
Fitted attenuation of the heart-rate other time. FIXME | ||
|
||
rate_growth_ : float | ||
Fitted growth rate of the exponential when the power increases. | ||
|
||
rate_decay_ : float | ||
Fitted decay rate of the exponential when the power decreases. | ||
|
||
References | ||
---------- | ||
.. [1] de Smet, Dimitri, et al. "Heart rate modelling as a potential | ||
physical fitness assessment for runners and cyclists." Workshop at ECML | ||
& PKDD. 2016. | ||
|
||
""" | ||
|
||
def __init__(self, hr_start=75, hr_max=200, hr_slope=0.30, hr_drift=3e-5, | ||
rate_growth=24, rate_decay=30): | ||
self.hr_start = hr_start | ||
self.hr_max = hr_max | ||
self.hr_slope = hr_slope | ||
self.hr_drift = hr_drift | ||
self.rate_growth = rate_growth | ||
self.rate_decay = rate_decay | ||
|
||
def _check_inputs(self, X, y=None): | ||
"""Validate the inputs. | ||
|
||
Check if the inputs are Pandas inputs. Resample with a sampling-rate of | ||
1 Hertz. | ||
|
||
""" | ||
if y is None: | ||
is_df = True if hasattr(X, 'loc') else False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although this is valid in most cases, why not just go for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was to ducktype. It usually allows to not import the library (which we do anyway) and accept data structure which are inheriting from it. |
||
X_ = X.resample('1s').mean() if is_df else X | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this raises a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, dataframe is not enough, we should check the index. |
||
X_ = check_array(X_) | ||
return X_.ravel(), is_df | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You still need to return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since it is confusing, I should always return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah now I get it. That could work indeed but I think it's less confusing if a method always returns the same number of arguments. |
||
else: | ||
is_df = True if hasattr(X, 'loc') and hasattr(y, 'loc') else False | ||
X_ = X.resample('1s').mean() if is_df else X | ||
y_ = y.resample('1s').mean() if is_df else y | ||
X_, y_ = check_X_y(X_, y_) | ||
return X_.ravel(), y_, is_df | ||
|
||
def fit(self, X, y): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Side note: It still itches when I see Python variables starting with a capital... 😅 (but I recognize that it's accepted for these cases) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, this is kinda of imposed by scikit-learn estimators. |
||
"""Compute the parameters model. | ||
|
||
Parameters | ||
---------- | ||
X : array-like, shape (n_samples,) | ||
The power data. | ||
|
||
y : array-like, shape (n_samples,) | ||
The heart-rate data. | ||
|
||
Returns | ||
------- | ||
self | ||
|
||
""" | ||
power, heart_rate, _ = self._check_inputs(X, y) | ||
params_init = [self.hr_start, self.hr_max, self.hr_slope, | ||
self.hr_drift, self.rate_growth, self.rate_decay] | ||
|
||
res_opt = minimize(_model_residual_least_square, params_init, | ||
args=(heart_rate, power), | ||
method='Nelder-Mead')['x'] | ||
|
||
self.hr_start_, self.hr_max_, self.hr_slope_, self. hr_drift_, \ | ||
self.rate_growth_, self.rate_decay_ = res_opt | ||
|
||
return self | ||
|
||
def predict(self, X): | ||
"""Predict the heart-rate frequency from the power data. | ||
|
||
Parameters | ||
---------- | ||
X : array-like, shape (n_samples,) | ||
The power data. | ||
|
||
Returns | ||
------- | ||
y : ndarray, shape (n_samples,) | ||
The heart-rate predictions. | ||
|
||
""" | ||
check_is_fitted(self, ['hr_start_', 'hr_max_', 'hr_slope_', | ||
'hr_drift_', 'rate_growth_', 'rate_decay_']) | ||
power, is_df = self._check_inputs(X) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. only 2 at predict |
||
|
||
hr_pred = exp_heart_rate_model(power, self.hr_start_, self.hr_max_, | ||
self.hr_slope_, self. hr_drift_, | ||
self.rate_growth_, self.rate_decay_) | ||
|
||
return pd.Series(hr_pred, index=X.index) if is_df else hr_pred |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
"""Tests the module used to estimate the heart-rate.""" | ||
|
||
# Authors: Aart Goossens | ||
# Guillaume Lemaitre | ||
# License: MIT | ||
|
||
import pytest | ||
import numpy as np | ||
import pandas as pd | ||
|
||
from sksports.model import exp_heart_rate_model | ||
from sksports.model import HeartRateRegressor | ||
|
||
POWER = np.array([100] * 10 + [240] * 100 + [160] * 100) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can consider moving these constants to a pytest fixture. It's a matter of preference but I think these fixtures are really nice to work with and also well reusable in other tests. But I'm fine with doing improvements like this later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I was playing with the tests for the moment :) |
||
HEART_RATE = exp_heart_rate_model(POWER, 100, 180, 0.40, 3e-5, 20, 35) | ||
HEART_RATE = HEART_RATE + np.random.random(HEART_RATE.size) | ||
|
||
|
||
def test_exp_heart_rate_model(): | ||
interval = [10, 1010, 2010] | ||
hr_pred = exp_heart_rate_model(POWER, 75, 200, 0.30, 3e-5, 24, 30) | ||
|
||
diff_hr_pred = np.diff(hr_pred) | ||
assert np.all(diff_hr_pred[interval[1]:interval[2]] > 0) | ||
assert np.all(diff_hr_pred[interval[2]:] < 0) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"power, heart_rate, output_type", | ||
[(POWER.reshape(-1, 1), HEART_RATE, np.ndarray), | ||
(pd.DataFrame( | ||
POWER, | ||
index=pd.date_range('1/1/2011', periods=POWER.size, freq='s')), | ||
pd.Series( | ||
HEART_RATE, | ||
index=pd.date_range('1/1/2011', periods=POWER.size, freq='s')), | ||
pd.Series), | ||
(POWER.reshape(-1, 1), | ||
pd.Series( | ||
HEART_RATE, | ||
index=pd.date_range('1/1/2011', periods=POWER.size, freq='s')), | ||
np.ndarray)] | ||
) | ||
def test_heart_rate_regressor(power, heart_rate, output_type): | ||
reg = HeartRateRegressor() | ||
reg.fit(power, heart_rate) | ||
|
||
assert reg.hr_start_ == pytest.approx(100, 1.0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More of a question than a comment: what's best practice for models like this for deciding how and which model parameters to test? Do you have any examples from scikit-learn? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usually we run a classifier or a regressor on a known dataset and check the score. It is non-trivial to actually test parameters. Here I think that we can make it on those 2 parameters with a large enough approximation to be sure that the optimization (or any change in the future) does not lead to a complete non-sense. Basically, I think that we could make something like: assert reg.score(X) > 0.9 to be sure that we are fitting the data properly. |
||
assert reg.hr_max_ == pytest.approx(180, 1.0) | ||
assert reg.hr_slope_ == pytest.approx(0.40, 0.01) | ||
|
||
hr_pred = reg.predict(power) | ||
assert isinstance(hr_pred, output_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the benefit of
BaseEstimator
andRegressorMixin
? I don't see it used anywhere in thefit
andpredict
methods. Is there other functionality of these classes that can be useful?Either way, I like implementing this model like a regressor class, it makes the api much more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically, it allows to use the tool as pipeline, grid-search, and cross-validation from scikit-learn.
The mixin just define a
score
function which is the R^2 score while theBaseEstimator
providesget_params
andset_params
to deal with the parameters.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also the model public to the user. We just need a good name.