-
-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating PMMLs from lifelines #833
Comments
I spent some time in this today. Code: import pandas as pd
from sklearn2pmml.pipeline import PMMLPipeline
from lifelines.utils.sklearn_adapter import sklearn_adapter
from lifelines import CoxPHFitter
df = pd.DataFrame({
'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
})
CoxSklearnRegression = sklearn_adapter(CoxPHFitter, event_col='E')
pipeline = PMMLPipeline([
("regressor", sklearn_adapter(CoxPHFitter, event_col='E')())
])
pipeline.fit(df[['var', 'age', 'E']], df[['T']])
print(pd.DataFrame({
'pred': pipeline.predict(df[['var', 'age', 'E']]),
'actual': df['T'].tolist(),
})) Creating the PMML Pipeline worked great. After this, when I try:
It errors:
I think I know why it happens. So, as long as the sklearn-adapter method is going to create dynamic classes - this won't work. @CamDavidsonPilon I'm curious to understand why dynamic classes was chosen for this ? And thoughts on moving it to static classes - while being more verbose, it would be a bit more robust I think. |
I'm open to this. As you know, current support for sklearn is limited because of these dynamic classes. Can you describe what this might look like? |
Could you point me to docs/info on what the sklearn adapter is intended for and what limitations are known? That way we may be able to identify a better architecture. I do have some thoughts, but wanted to check if they solve other needs. Note: With regard to PMML I think even if they are made static, some more things may be needed. I'll continue my exploration |
It's intention is to create an API that a) resembles and b) is compatible with scikit-learn. That is, have classes that behave like, for example, sklearn.linear_model.LinearRegression but contain a lifelines model. That way, these classes can plug into tools like GridSearchCV to find the best group of parameters for a model. Known limitations are the ones you've bumped into: serialization: i) because autograd/jax creates anonymous functions, they don't work will with most serialization libraries, ii) creating dynamic classes almost always falls with serialization libraries too. Docs are here: https://lifelines.readthedocs.io/en/latest/Compatibility%20with%20scikit-learn.html |
@CamDavidsonPilon Now that the pickling issues are resolved, I wanted to take a look at PMML too. I was wondering if you're aware of any library (in python/R/java etc.) which supports PMMLs for similar estimators to the ones in lifelines? The part that gets me a bit confused is that lifelines returns a prob value for every time T in the timeline - while most PMMLs I see for Regression etc. have a single score value and not an array of scores. |
This is true if you are predicting the survival function. Choosing |
Following the conversation on #188
Thought I'd create this issue so it can be tracked and a solution can be found.
It would be awesome if we can figure out a way to create PMML/PFAs from lifeline models as that is the standard
Currently, PMML creation using sklearn2pmml does not work on lifelines because of a pickling error.
The text was updated successfully, but these errors were encountered: