-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#313 enabled fix differing versions of dependency scikit learn #316
Changes from 15 commits
d159c7d
c42614d
080c226
98cc20b
8ae04e4
df1ee06
8966b63
744024a
162408e
61a91ea
8bac5f5
8cd8877
61f8b69
1ec7e86
35da5e8
7ef7036
c546107
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you split this code cell in two parts and add markdown cells for each part explaining the code?
In the first part we create a UDF that reads the version of the scikit_learn library. The UDF runs inside the language container, therefore the library version detected by the UDF is the version installed in this container.
In the second part we compare the version returned by the UDF with the version in the AI-Lab environment. If they differ we install the UDF's version in the AI-Lab environment.
BTW, do we need to run pip with the --upgrade option? Reply via ReviewNB There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, see next push. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That I cannot answer. My tests have been successful without this option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good to me |
||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "289d2a8c-953d-46e5-8c73-ad810c29b20f", | ||
"metadata": {}, | ||
"source": [ | ||
"# Fix the Version of Python Library Scikit-learn\n", | ||
"\n", | ||
"This notebook ensures the AI-Lab using the same version of python library `scikit-learn` as the one used by the <a href=\"https://docs.exasol.com/db/latest/database_concepts/udf_scripts/adding_new_packages_script_languages.htm\" target=\"_blank\" rel=\"noopener\">Script Language Container (SLC)</a> inside Exasol database.\n", | ||
"\n", | ||
"## Rationale\n", | ||
"\n", | ||
"Using identical versions is required when transferring the sklearn model from the AI-Lab to the database SLC.\n", | ||
"\n", | ||
"The AI-Lab serializes the sklearn model and uploads it into the database SLC. The SLC can only _deserialize_ the model if using the same version of the `scikit-learn` library. The specific version of the library used by the default SLC depends on the release version of the database and cannot be controlled by the AI-Lab. Running the following script will update the version of the library used in the AI-Lab, if required.\n", | ||
"\n", | ||
"## Open Secure Configuration Storage" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d86ca808-044e-4fbd-be30-5ba8324f501e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%run ../utils/access_store_ui.ipynb\n", | ||
"display(get_access_store_ui('../'))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "055ed302-69aa-426c-b5ec-861c63b82d33", | ||
"metadata": {}, | ||
"source": [ | ||
"## Detect the Version of sklearn Used in the SLC\n", | ||
"\n", | ||
"The following cell creates a User Defined Function (UDF) called `detect_skikit_learn_version()` to be run in the SLC and then executes the UDF via an SQL statement.\n", | ||
"\n", | ||
"The UDF inquires and returns the version of sklearn installed in the SLC which is then stored in variable `slc_version`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fa6c628f-853e-4850-8bab-46f7f645856e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import textwrap\n", | ||
"import sklearn\n", | ||
"from importlib import reload\n", | ||
"from exasol.nb_connector.connections import open_pyexasol_connection\n", | ||
"\n", | ||
"# Create script to test the model\n", | ||
"sql = textwrap.dedent(\"\"\"\n", | ||
"CREATE OR REPLACE PYTHON3 SCALAR SCRIPT {schema!q}.detect_skikit_learn_version() RETURNS VARCHAR(100) AS\n", | ||
"import sklearn\n", | ||
"def run(ctx):\n", | ||
" return sklearn.__version__ \n", | ||
"/\n", | ||
"\"\"\")\n", | ||
"\n", | ||
"with open_pyexasol_connection(ai_lab_config, compression=True) as conn:\n", | ||
" query_params={'schema': ai_lab_config.db_schema}\n", | ||
" conn.execute(sql, query_params)\n", | ||
" result = conn.execute(\"select {schema!q}.detect_skikit_learn_version()\", query_params).fetchone()\n", | ||
" slc_version = result[0]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e4b0dc24-6e02-4305-8fa1-15f68afac360", | ||
"metadata": {}, | ||
"source": [ | ||
"## Compare the Version and Update the AI-Lab if Required\n", | ||
"\n", | ||
"The next cell compares the version returned by the UDF with the version in the AI-Lab environment. If they differ, then the cell installs the UDF's version in the AI-Lab environment." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "50b88871-4c37-4cc1-ac85-841a22e98153", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"my_version = sklearn.__version__\n", | ||
"\n", | ||
"if slc_version == my_version:\n", | ||
" print(f\"AI-Lab scikit-learn version {my_version} is identical to that of the SLC.\\nNothing to do.\")\n", | ||
"else:\n", | ||
" print(f\"AI-Lab scikit-learn version {my_version} differs from SLC.\\nInstalling version {slc_version} ...\")\n", | ||
" %pip install \"scikit_learn=={slc_version}\"\n", | ||
" sklearn = reload(sklearn)\n", | ||
" print(f\"Updated AI-Lab scikit-learn to version {sklearn.__version__}.\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b0ea6891-d171-4841-a2d8-edf8ac252d86", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,7 +91,7 @@ | |
"\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line #12. model.fit(X_train.values, y_train) Why? The model is happy to take a DataFrame as an input. Reply via ReviewNB There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We observed an error message as specified in comment-2293085752. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe earlier versions of scikit-learn didn't care about feature names. It seems that when we read input in udf like |
||
"# Create and train the model.\n", | ||
"model = tree.DecisionTreeClassifier()\n", | ||
"model.fit(X_train, y_train)\n", | ||
"model.fit(X_train.values, y_train)\n", | ||
"\n", | ||
"print(f\"Training took: {stopwatch}\")" | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific version of the library used by the default SLC depends ...
"AI-Lab adapts its own version of the library if required." I don't think it's a good explanation of what is happening here. AI-Lab itself is not doing anything. The user should run the script like the one in this notebook to make sure the versions are aligned.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the paragraph in the next push.
Please review again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahsimb thx, that was also what I meant needs to be formulated differently and @ckunki for changing