From c63fb2d0632b254dbf88570eaa42e121050afc87 Mon Sep 17 00:00:00 2001 From: peplaul0 Date: Fri, 28 Jun 2024 18:22:23 +0200 Subject: [PATCH 01/19] finished everything from the issue and improved text --- docs/tutorials/classification.ipynb | 432 +++++++++++++++++++++------- 1 file changed, 327 insertions(+), 105 deletions(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 0bba08deb..9f34c05d5 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -2,195 +2,417 @@ "cells": [ { "cell_type": "markdown", - "source": [ - "# Classification\n", - "\n", - "This tutorial uses safeds on **titanic passenger data** to predict who will survive and who will not, using sex as a feature for the prediction.\n" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "In this tutorial, we use `safeds` on **Titanic passenger data** to predict who will survive and who will not, using sex as a feature for the prediction." + ] }, { "cell_type": "markdown", - "source": [ - "1. Load your data into a `Table`, the data is available under `docs/tutorials/data/titanic.csv`:\n" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "### Load your data into a `Table`\n", + "- The data is available under `docs/tutorials/data/titanic.csv`:\n" + ] }, { "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (5, 12)
idnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
i64strstrf64i64i64stri64f64strstri64
0"Abbing, Mr. Anthony""male"42.000"C.A. 5547"37.55null"Southampton"0
1"Abbott, Master. Eugene Joseph""male"13.002"C.A. 2673"320.25null"Southampton"0
2"Abbott, Mr. Rossmore Edward""male"16.011"C.A. 2673"320.25null"Southampton"0
3"Abbott, Mrs. Stanton (Rosa Hun…"female"35.011"C.A. 2673"320.25null"Southampton"1
4"Abelseth, Miss. Karen Marie""female"16.000"348125"37.65null"Southampton"1
" + ], + "text/plain": [ + "+-----+----------------------+--------+----------+---+----------+-------+---------------+----------+\n", + "| id | name | sex | age | … | fare | cabin | port_embarked | survived |\n", + "| --- | --- | --- | --- | | --- | --- | --- | --- |\n", + "| i64 | str | str | f64 | | f64 | str | str | i64 |\n", + "+==================================================================================================+\n", + "| 0 | Abbing, Mr. Anthony | male | 42.00000 | … | 7.55000 | null | Southampton | 0 |\n", + "| 1 | Abbott, Master. | male | 13.00000 | … | 20.25000 | null | Southampton | 0 |\n", + "| | Eugene Joseph | | | | | | | |\n", + "| 2 | Abbott, Mr. Rossmore | male | 16.00000 | … | 20.25000 | null | Southampton | 0 |\n", + "| | Edward | | | | | | | |\n", + "| 3 | Abbott, Mrs. Stanton | female | 35.00000 | … | 20.25000 | null | Southampton | 1 |\n", + "| | (Rosa Hun… | | | | | | | |\n", + "| 4 | Abelseth, Miss. | female | 16.00000 | … | 7.65000 | null | Southampton | 1 |\n", + "| | Karen Marie | | | | | | | |\n", + "+-----+----------------------+--------+----------+---+----------+-------+---------------+----------+" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "from safeds.data.tabular.containers import Table\n", "\n", - "titanic = Table.from_csv_file(\"data/titanic.csv\")\n", + "raw_data = Table.from_csv_file(\"data/titanic.csv\")\n", "#For visualisation purposes we only print out the first 15 rows.\n", - "titanic.slice_rows(0, 15)" - ], - "metadata": { - "collapsed": false - }, - "execution_count": null, - "outputs": [] + "raw_data.slice_rows(0, 5)" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "2. Split the titanic dataset into two tables. A training set, that we will use later to implement a training model to predict the survival of passengers, containing 60% of the data, and a testing set containing the rest of the data.\n", - "Delete the column `survived` from the test set, to be able to predict it later:" + "### Removing olumns with high idness / stability / missing value ratio" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (9, 13)
metricidnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
strf64strstrf64f64f64strf64f64strstrf64
"min"0.0"Abbing, Mr. Anthony""female"0.16670.00.0"110152"1.00.0"A10""Cherbourg"0.0
"max"1308.0"van Melkebeke, Mr. Philemon""male"80.08.09.0"WE/P 5735"3.0512.3292"T""Southampton"1.0
"mean"654.0"-""-"29.8811350.4988540.385027"-"2.29488233.295479"-""-"0.381971
"median"654.0"-""-"28.00.00.0"-"3.014.4542"-""-"0.0
"standard deviation"378.020061"-""-"14.41351.0416580.86556"-"0.83783651.758668"-""-"0.486055
"distinct value count"1309.0"1307""2"98.07.08.0"929"3.0281.0"186""3"2.0
"idness"1.0"0.998472116119175""0.0015278838808250573"0.075630.0053480.006112"0.7097020626432391"0.0022920.215432"0.14285714285714285""0.0030557677616501145"0.001528
"missing value ratio"0.0"0.0""0.0"0.2009170.00.0"0.0"0.00.000764"0.774637127578304""0.0015278838808250573"0.0
"stability"0.000764"0.0015278838808250573""0.6440030557677616"0.0449330.6806720.76547"0.008403361344537815"0.5416350.045872"0.020338983050847456""0.6993114001530222"0.618029
" + ], + "text/plain": [ + "+-----------+-----------+-----------+-----------+---+-----------+-----------+-----------+----------+\n", + "| metric | id | name | sex | … | fare | cabin | port_emba | survived |\n", + "| --- | --- | --- | --- | | --- | --- | rked | --- |\n", + "| str | f64 | str | str | | f64 | str | --- | f64 |\n", + "| | | | | | | | str | |\n", + "+==================================================================================================+\n", + "| min | 0.00000 | Abbing, | female | … | 0.00000 | A10 | Cherbourg | 0.00000 |\n", + "| | | Mr. | | | | | | |\n", + "| | | Anthony | | | | | | |\n", + "| max | 1308.0000 | van Melke | male | … | 512.32920 | T | Southampt | 1.00000 |\n", + "| | 0 | beke, Mr. | | | | | on | |\n", + "| | | Philemon | | | | | | |\n", + "| mean | 654.00000 | - | - | … | 33.29548 | - | - | 0.38197 |\n", + "| median | 654.00000 | - | - | … | 14.45420 | - | - | 0.00000 |\n", + "| standard | 378.02006 | - | - | … | 51.75867 | - | - | 0.48606 |\n", + "| deviation | | | | | | | | |\n", + "| distinct | 1309.0000 | 1307 | 2 | … | 281.00000 | 186 | 3 | 2.00000 |\n", + "| value | 0 | | | | | | | |\n", + "| count | | | | | | | | |\n", + "| idness | 1.00000 | 0.9984721 | 0.0015278 | … | 0.21543 | 0.1428571 | 0.0030557 | 0.00153 |\n", + "| | | 16119175 | 838808250 | | | 428571428 | 677616501 | |\n", + "| | | | 573 | | | 5 | 145 | |\n", + "| missing | 0.00000 | 0.0 | 0.0 | … | 0.00076 | 0.7746371 | 0.0015278 | 0.00000 |\n", + "| value | | | | | | 27578304 | 838808250 | |\n", + "| ratio | | | | | | | 573 | |\n", + "| stability | 0.00076 | 0.0015278 | 0.6440030 | … | 0.04587 | 0.0203389 | 0.6993114 | 0.61803 |\n", + "| | | 838808250 | 557677616 | | | 830508474 | 001530222 | |\n", + "| | | 573 | | | | 56 | | |\n", + "+-----------+-----------+-----------+-----------+---+-----------+-----------+-----------+----------+" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } ], - "metadata": { - "collapsed": false - } + "source": [ + "raw_data.summarize_statistics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We remove certain columns for the following reasons:\n", + "1. **high idness**: `name`, `id` , `ticket`\n", + "2. **high stability**: `parents_children` \n", + "3. **high missing value ratio**: `cabin`" + ] }, { "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], "source": [ - "train_table, testing_table = titanic.split_rows(0.6)\n", - "\n", - "test_table = testing_table.remove_columns([\"survived\"]).shuffle_rows()" - ], - "metadata": { - "collapsed": false - }, - "execution_count": null, - "outputs": [] + "raw_data = raw_data.remove_columns([\"name\",\"id\",\"ticket\", \"parents_children\", \"cabin\"])" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "3. Use `OneHotEncoder` to create an encoder, that will be used later to transform the training table.\n", - "* We use `OneHotEncoder` to transform non-numerical categorical values into numerical representations with values of zero or one. In this example we will transform the values of the sex column, hence they will be used in the model for predicting the surviving of passengers.\n", - "* Use the `fit` function of the `OneHotEncoder` to pass the table and the column names, that will be used as features to predict who will survive to the encoder.\n", - "* The names of the column before transformation need to be saved, because `OneHotEncoder` changes the names of the fitted `Column`s:\n" - ], - "metadata": { - "collapsed": false - } + "### Imputing columns `age` and `fare`\n", + "We fill in missing values in the `age` and `fare` columns with the mean of each column\n" + ] }, { "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], "source": [ - "from safeds.data.tabular.transformation import OneHotEncoder\n", + "from safeds.data.tabular.transformation import SimpleImputer\n", "\n", - "encoder = OneHotEncoder(column_names=\"sex\").fit(train_table)" - ], - "metadata": { - "collapsed": false - }, - "execution_count": null, - "outputs": [] + "simple_transformer = SimpleImputer(column_names=[\"age\",\"fare\"],strategy=SimpleImputer.Strategy.mean())\n", + "_, transformed_raw_data = simple_transformer.fit_and_transform(raw_data)" + ] }, { "cell_type": "markdown", - "source": [ - "4. Transform the training table using the fitted encoder, and create a set with the new names of the fitted `Column`s:\n" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "### Using `OneHotEncoder` to create an encoder and fit and transform the table\n", + "We use `OneHotEncoder` to transform categorical, non-numerical values into numerical representations with values of zero or one. In this example, we will transform the values of the `sex` column, so they can be used in the model to predict passenger survival.\n", + "- Use the `fit_and_transform` function of the `OneHotEncoder` to pass the table and the column names to be used as features for the prediction." + ] }, { "cell_type": "code", - "source": "transformed_table = encoder.transform(train_table)", + "execution_count": 5, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "from safeds.data.tabular.transformation import OneHotEncoder\n", + "\n", + "one_hot_encoder = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"])\n", + "_, transformed_raw_data = one_hot_encoder.fit_and_transform(transformed_raw_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Satistics after imputing / removing / encoding\n", + "Check the data after cleaning and transformation to ensure the changes were made correctly.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (9, 11)
metricagesiblings_spousestravel_classfaresurvivedsex__malesex__femaleport_embarked__Southamptonport_embarked__Cherbourgport_embarked__Queenstown
strf64f64f64f64f64f64f64f64f64f64
"min"0.16670.01.00.00.00.00.00.00.00.0
"max"80.08.03.0512.32921.01.01.01.01.01.0
"mean"29.8811350.4988542.29488233.2954790.3819710.6440030.3559970.6982430.2062640.093965
"median"29.8811350.03.014.45420.01.00.01.00.00.0
"standard deviation"12.8831991.0416580.83783651.7388790.4860550.4789970.4789970.4591960.4047770.291891
"distinct value count"99.07.03.0282.02.02.02.02.02.02.0
"idness"0.075630.0053480.0022920.2154320.0015280.0015280.0015280.0015280.0015280.001528
"missing value ratio"0.00.00.00.00.00.00.00.00.00.0
"stability"0.2009170.6806720.5416350.0458370.6180290.6440030.6440030.6982430.7937360.906035
" + ], + "text/plain": [ + "+-----------+----------+-----------+-----------+---+-----------+-----------+-----------+-----------+\n", + "| metric | age | siblings_ | travel_cl | … | sex__fema | port_emba | port_emba | port_emba |\n", + "| --- | --- | spouses | ass | | le | rked__Sou | rked__Che | rked__Que |\n", + "| str | f64 | --- | --- | | --- | thampton | rbourg | enstown |\n", + "| | | f64 | f64 | | f64 | --- | --- | --- |\n", + "| | | | | | | f64 | f64 | f64 |\n", + "+==================================================================================================+\n", + "| min | 0.16670 | 0.00000 | 1.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", + "| max | 80.00000 | 8.00000 | 3.00000 | … | 1.00000 | 1.00000 | 1.00000 | 1.00000 |\n", + "| mean | 29.88113 | 0.49885 | 2.29488 | … | 0.35600 | 0.69824 | 0.20626 | 0.09396 |\n", + "| median | 29.88113 | 0.00000 | 3.00000 | … | 0.00000 | 1.00000 | 0.00000 | 0.00000 |\n", + "| standard | 12.88320 | 1.04166 | 0.83784 | … | 0.47900 | 0.45920 | 0.40478 | 0.29189 |\n", + "| deviation | | | | | | | | |\n", + "| distinct | 99.00000 | 7.00000 | 3.00000 | … | 2.00000 | 2.00000 | 2.00000 | 2.00000 |\n", + "| value | | | | | | | | |\n", + "| count | | | | | | | | |\n", + "| idness | 0.07563 | 0.00535 | 0.00229 | … | 0.00153 | 0.00153 | 0.00153 | 0.00153 |\n", + "| missing | 0.00000 | 0.00000 | 0.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", + "| value | | | | | | | | |\n", + "| ratio | | | | | | | | |\n", + "| stability | 0.20092 | 0.68067 | 0.54163 | … | 0.64400 | 0.69824 | 0.79374 | 0.90604 |\n", + "+-----------+----------+-----------+-----------+---+-----------+-----------+-----------+-----------+" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transformed_raw_data.summarize_statistics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Spliting the `raw_data` into train and test sets\n", + "- **Training set**: Contains 60% of the data and will be used to train the model.\n", + "- **Testing set**: Contains 40% of the data and will be used to test the model's accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "train_table, test_table = transformed_raw_data.shuffle_rows().split_rows(0.6)" + ] }, { "cell_type": "markdown", - "source": "5. Mark the `survived` `Column` as the target variable to be predicted. Include some columns only as extra columns, which are completely ignored by the model:", "metadata": { "collapsed": false - } + }, + "source": [ + "### Mark the `survived` column as the target variable to be predicted" + ] }, { "cell_type": "code", - "source": [ - "extra_names = [\"id\", \"name\", \"ticket\", \"cabin\", \"port_embarked\", \"age\", \"fare\"]\n", - "\n", - "train_tabular_dataset = transformed_table.to_tabular_dataset(\"survived\", extra_names=extra_names)" - ], + "execution_count": 8, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "tagged_train_table = train_table.to_tabular_dataset(\"survived\")" + ] }, { "cell_type": "markdown", - "source": "6. Use `RandomForest` classifier as a model for the classification. Pass the \"train_tabular_dataset\" table to the fit function of the model:", "metadata": { "collapsed": false - } + }, + "source": [ + "## Using `RandomForest` classifier as a model for classification\n", + "We use the `RandomForest` classifier as our model and pass the training dataset to the model's `fit` function to train it." + ] }, { "cell_type": "code", - "source": [ - "from safeds.ml.classical.classification import RandomForestClassifier\n", - "\n", - "model = RandomForestClassifier()\n", - "fitted_model= model.fit(train_tabular_dataset)" - ], + "execution_count": 9, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "from safeds.ml.classical.classification import RandomForestClassifier\n", + "\n", + "classifier = RandomForestClassifier()\n", + "fitted_classifier= classifier.fit(tagged_train_table)" + ] }, { "cell_type": "markdown", - "source": [ - "7. Use the fitted random forest model, that we trained on the training dataset to predict the survival rate of passengers in the test dataset.\n", - "Transform the test data with `OneHotEncoder` first, to be able to pass it to the predict function, that uses our fitted random forest model for prediction:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "### Using the trained random forest model to predict survival\n", + "Use the trained `RandomForest` model to predict the survival rate of passengers in the test dataset.\n", + "Pass the `test_table` into the `predict` function, which uses our trained model for prediction." + ] }, { "cell_type": "code", - "source": [ - "transformed_test_table = encoder.transform(test_table)\n", - "\n", - "prediction = fitted_model.predict(\n", - " transformed_test_table\n", - ")\n", - "#For visualisation purposes we only print out the first 15 rows.\n", - "prediction.to_table().slice_rows(start=0, length=15)" - ], + "execution_count": 10, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (15, 10)
agesiblings_spousestravel_classfaresex__malesex__femaleport_embarked__Southamptonport_embarked__Cherbourgport_embarked__Queenstownsurvived
f64i64i64f64u8u8u8u8u8i64
45.00230.0011001
29.881135037.8958100100
29.881135038.05101000
29.881135037.75100010
4.00322.025011000
29.0037.875101000
44.0038.05101000
48.0037.8542101000
21.00177.2875101000
56.00130.6958100100
" + ], + "text/plain": [ + "+----------+------------+------------+----------+---+-----------+-----------+-----------+----------+\n", + "| age | siblings_s | travel_cla | fare | … | port_emba | port_emba | port_emba | survived |\n", + "| --- | pouses | ss | --- | | rked__Sou | rked__Che | rked__Que | --- |\n", + "| f64 | --- | --- | f64 | | thampton | rbourg | enstown | i64 |\n", + "| | i64 | i64 | | | --- | --- | --- | |\n", + "| | | | | | u8 | u8 | u8 | |\n", + "+==================================================================================================+\n", + "| 45.00000 | 0 | 2 | 30.00000 | … | 1 | 0 | 0 | 1 |\n", + "| 29.88113 | 0 | 3 | 7.89580 | … | 0 | 1 | 0 | 0 |\n", + "| 29.88113 | 0 | 3 | 8.05000 | … | 1 | 0 | 0 | 0 |\n", + "| 29.88113 | 0 | 3 | 7.75000 | … | 0 | 0 | 1 | 0 |\n", + "| 4.00000 | 0 | 3 | 22.02500 | … | 1 | 0 | 0 | 0 |\n", + "| … | … | … | … | … | … | … | … | … |\n", + "| 29.00000 | 0 | 3 | 7.87500 | … | 1 | 0 | 0 | 0 |\n", + "| 44.00000 | 0 | 3 | 8.05000 | … | 1 | 0 | 0 | 0 |\n", + "| 48.00000 | 0 | 3 | 7.85420 | … | 1 | 0 | 0 | 0 |\n", + "| 21.00000 | 0 | 1 | 77.28750 | … | 1 | 0 | 0 | 0 |\n", + "| 56.00000 | 0 | 1 | 30.69580 | … | 0 | 1 | 0 | 0 |\n", + "+----------+------------+------------+----------+---+-----------+-----------+-----------+----------+" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prediction = fitted_classifier.predict(test_table)\n", + "#For visualisation purposes we only print out the first 15 rows.\n", + "prediction.to_table().slice_rows(start=0, length=15)" + ] }, { "cell_type": "markdown", - "source": [ - "8. You can test the accuracy of that model with the initial testing_table as follows:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "### Testing the accuracy of the model" + ] }, { "cell_type": "code", - "source": [ - "testing_table = encoder.transform(testing_table)\n", - "\n", - "test_tabular_dataset = testing_table.to_tabular_dataset(\"survived\", extra_names=extra_names)\n", - "fitted_model.accuracy(test_tabular_dataset)\n" - ], + "execution_count": 11, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "data": { + "text/plain": [ + "0.7938931297709924" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_tabular_dataset = test_table.to_tabular_dataset(\"survived\")\n", + "fitted_classifier.accuracy(test_tabular_dataset)" + ] } ], "metadata": { @@ -202,14 +424,14 @@ "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.6" + "pygments_lexer": "ipython3", + "version": "3.12.3" } }, "nbformat": 4, From 49b0eec415e03c7547d54698ac300e4cb079c826 Mon Sep 17 00:00:00 2001 From: peplaul0 Date: Fri, 28 Jun 2024 18:53:05 +0200 Subject: [PATCH 02/19] added a reverse transformation to the tutorial --- docs/tutorials/classification.ipynb | 36 +++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 9f34c05d5..60007130f 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -195,7 +195,23 @@ "from safeds.data.tabular.transformation import OneHotEncoder\n", "\n", "one_hot_encoder = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"])\n", - "_, transformed_raw_data = one_hot_encoder.fit_and_transform(transformed_raw_data)" + "transformer_one_hot_encoder, transformed_raw_data = one_hot_encoder.fit_and_transform(transformed_raw_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Reverse transforming `transformed_raw_data`" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "reverse_transfromed_raw_data = transformed_raw_data.inverse_transform_table(transformer_one_hot_encoder)" ] }, { @@ -208,7 +224,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -248,7 +264,7 @@ "+-----------+----------+-----------+-----------+---+-----------+-----------+-----------+-----------+" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -268,7 +284,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ @@ -286,7 +302,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": { "collapsed": false }, @@ -307,7 +323,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "metadata": { "collapsed": false }, @@ -332,7 +348,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 11, "metadata": { "collapsed": false }, @@ -371,7 +387,7 @@ "+----------+------------+------------+----------+---+-----------+-----------+-----------+----------+" ] }, - "execution_count": 10, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -393,7 +409,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "metadata": { "collapsed": false }, @@ -404,7 +420,7 @@ "0.7938931297709924" ] }, - "execution_count": 11, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } From 1272a57a62aef30fe56792a18ee0ba43aba62593 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Sat, 29 Jun 2024 16:40:40 +0200 Subject: [PATCH 03/19] Update docs/tutorials/classification.ipynb Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 60007130f..6711d7b60 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -6,7 +6,7 @@ "collapsed": false }, "source": [ - "In this tutorial, we use `safeds` on **Titanic passenger data** to predict who will survive and who will not, using sex as a feature for the prediction." + "In this tutorial, we use `safeds` on **Titanic passenger data** to predict who will survive and who will not." ] }, { From 55c85aaa7e7dec06ad8380f873d1b7a2b71a2bc5 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Sat, 29 Jun 2024 16:41:40 +0200 Subject: [PATCH 04/19] Update docs/tutorials/classification.ipynb Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 6711d7b60..c9d19228a 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -73,7 +73,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Removing olumns with high idness / stability / missing value ratio" + "### Removing Low-Quality Columns ] }, { From 7fd0ae5107c7fb3fb019889542bb2729a87095be Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Sat, 29 Jun 2024 16:42:02 +0200 Subject: [PATCH 05/19] Update docs/tutorials/classification.ipynb Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index c9d19228a..e5d738ef9 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -66,7 +66,7 @@ "\n", "raw_data = Table.from_csv_file(\"data/titanic.csv\")\n", "#For visualisation purposes we only print out the first 15 rows.\n", - "raw_data.slice_rows(0, 5)" + "raw_data.slice_rows(lenght=5)" ] }, { From 7da51ae3bfff348038baa67cb5e8d1fae0e45070 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Sat, 29 Jun 2024 16:42:59 +0200 Subject: [PATCH 06/19] Update docs/tutorials/classification.ipynb Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index e5d738ef9..c6390dc67 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -195,7 +195,7 @@ "from safeds.data.tabular.transformation import OneHotEncoder\n", "\n", "one_hot_encoder = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"])\n", - "transformer_one_hot_encoder, transformed_raw_data = one_hot_encoder.fit_and_transform(transformed_raw_data)" + "fitted_one_hot_encoder, transformed_raw_data = one_hot_encoder.fit_and_transform(transformed_raw_data)" ] }, { From 155939d0083be19f9a08318e05dc87eacd84e1e9 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Sat, 29 Jun 2024 16:43:18 +0200 Subject: [PATCH 07/19] Update docs/tutorials/classification.ipynb Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index c6390dc67..0e627409a 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -218,7 +218,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Satistics after imputing / removing / encoding\n", + "### Statistics after Imputing / Removing / Encoding\n", "Check the data after cleaning and transformation to ensure the changes were made correctly.\n" ] }, From 80797738b9a8b0ab3a8814db32e6bb2dc28f5d06 Mon Sep 17 00:00:00 2001 From: peplaul0 Date: Sun, 30 Jun 2024 19:38:27 +0200 Subject: [PATCH 08/19] imporoved headings and overall code --- docs/tutorials/classification.ipynb | 278 ++++++++++++++++------------ 1 file changed, 155 insertions(+), 123 deletions(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 0e627409a..da89056e6 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -15,8 +15,8 @@ "collapsed": false }, "source": [ - "### Load your data into a `Table`\n", - "- The data is available under `docs/tutorials/data/titanic.csv`:\n" + "### Load Your Data into a `Table`\n", + "The data is available under [Titanic - Machine Learning from Disaster](https://www.kaggle.com/c/titanic/data):\n" ] }, { @@ -36,7 +36,7 @@ " white-space: pre-wrap;\n", "}\n", "\n", - "shape: (5, 12)
idnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
i64strstrf64i64i64stri64f64strstri64
0"Abbing, Mr. Anthony""male"42.000"C.A. 5547"37.55null"Southampton"0
1"Abbott, Master. Eugene Joseph""male"13.002"C.A. 2673"320.25null"Southampton"0
2"Abbott, Mr. Rossmore Edward""male"16.011"C.A. 2673"320.25null"Southampton"0
3"Abbott, Mrs. Stanton (Rosa Hun…"female"35.011"C.A. 2673"320.25null"Southampton"1
4"Abelseth, Miss. Karen Marie""female"16.000"348125"37.65null"Southampton"1
" + "shape: (15, 12)
idnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
i64strstrf64i64i64stri64f64strstri64
0"Abbing, Mr. Anthony""male"42.000"C.A. 5547"37.55null"Southampton"0
1"Abbott, Master. Eugene Joseph""male"13.002"C.A. 2673"320.25null"Southampton"0
2"Abbott, Mr. Rossmore Edward""male"16.011"C.A. 2673"320.25null"Southampton"0
3"Abbott, Mrs. Stanton (Rosa Hun…"female"35.011"C.A. 2673"320.25null"Southampton"1
4"Abelseth, Miss. Karen Marie""female"16.000"348125"37.65null"Southampton"1
10"Adahl, Mr. Mauritz Nils Martin""male"30.000"C 7076"37.25null"Southampton"0
11"Adams, Mr. John""male"26.000"341826"38.05null"Southampton"0
12"Ahlin, Mrs. Johan (Johanna Per…"female"40.010"7546"39.475null"Southampton"0
13"Aks, Master. Philip Frank""male"0.833301"392091"39.35null"Southampton"1
14"Aks, Mrs. Sam (Leah Rosen)""female"18.001"392091"39.35null"Southampton"1
" ], "text/plain": [ "+-----+----------------------+--------+----------+---+----------+-------+---------------+----------+\n", @@ -53,6 +53,16 @@ "| | (Rosa Hun… | | | | | | | |\n", "| 4 | Abelseth, Miss. | female | 16.00000 | … | 7.65000 | null | Southampton | 1 |\n", "| | Karen Marie | | | | | | | |\n", + "| … | … | … | … | … | … | … | … | … |\n", + "| 10 | Adahl, Mr. Mauritz | male | 30.00000 | … | 7.25000 | null | Southampton | 0 |\n", + "| | Nils Martin | | | | | | | |\n", + "| 11 | Adams, Mr. John | male | 26.00000 | … | 8.05000 | null | Southampton | 0 |\n", + "| 12 | Ahlin, Mrs. Johan | female | 40.00000 | … | 9.47500 | null | Southampton | 0 |\n", + "| | (Johanna Per… | | | | | | | |\n", + "| 13 | Aks, Master. Philip | male | 0.83330 | … | 9.35000 | null | Southampton | 1 |\n", + "| | Frank | | | | | | | |\n", + "| 14 | Aks, Mrs. Sam (Leah | female | 18.00000 | … | 9.35000 | null | Southampton | 1 |\n", + "| | Rosen) | | | | | | | |\n", "+-----+----------------------+--------+----------+---+----------+-------+---------------+----------+" ] }, @@ -66,20 +76,38 @@ "\n", "raw_data = Table.from_csv_file(\"data/titanic.csv\")\n", "#For visualisation purposes we only print out the first 15 rows.\n", - "raw_data.slice_rows(lenght=5)" + "raw_data.slice_rows(length=15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Removing Low-Quality Columns + "### Spliting the `raw_data` into Train and Test Sets\n", + "- **Training set**: Contains 60% of the data and will be used to train the model.\n", + "- **Testing set**: Contains 40% of the data and will be used to test the model's accuracy." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, + "outputs": [], + "source": [ + "train_table, test_table = raw_data.shuffle_rows().split_rows(0.6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Removing Low-Quality Columns" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, "outputs": [ { "data": { @@ -91,7 +119,7 @@ " white-space: pre-wrap;\n", "}\n", "\n", - "shape: (9, 13)
metricidnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
strf64strstrf64f64f64strf64f64strstrf64
"min"0.0"Abbing, Mr. Anthony""female"0.16670.00.0"110152"1.00.0"A10""Cherbourg"0.0
"max"1308.0"van Melkebeke, Mr. Philemon""male"80.08.09.0"WE/P 5735"3.0512.3292"T""Southampton"1.0
"mean"654.0"-""-"29.8811350.4988540.385027"-"2.29488233.295479"-""-"0.381971
"median"654.0"-""-"28.00.00.0"-"3.014.4542"-""-"0.0
"standard deviation"378.020061"-""-"14.41351.0416580.86556"-"0.83783651.758668"-""-"0.486055
"distinct value count"1309.0"1307""2"98.07.08.0"929"3.0281.0"186""3"2.0
"idness"1.0"0.998472116119175""0.0015278838808250573"0.075630.0053480.006112"0.7097020626432391"0.0022920.215432"0.14285714285714285""0.0030557677616501145"0.001528
"missing value ratio"0.0"0.0""0.0"0.2009170.00.0"0.0"0.00.000764"0.774637127578304""0.0015278838808250573"0.0
"stability"0.000764"0.0015278838808250573""0.6440030557677616"0.0449330.6806720.76547"0.008403361344537815"0.5416350.045872"0.020338983050847456""0.6993114001530222"0.618029
" + "shape: (9, 13)
metricidnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
strf64strstrf64f64f64strf64f64strstrf64
"min"1.0"Abbott, Master. Eugene Joseph""female"0.16670.00.0"110152"1.00.0"A11""Cherbourg"0.0
"max"1307.0"van Melkebeke, Mr. Philemon""male"76.08.06.0"WE/P 5735"3.0512.3292"T""Southampton"1.0
"mean"654.408917"-""-"29.5421910.5184710.396178"-"2.29808933.849861"-""-"0.37707
"median"658.0"-""-"28.00.00.0"-"3.014.5"-""-"0.0
"standard deviation"376.780514"-""-"14.1643251.0678410.818931"-"0.83471255.721765"-""-"0.484962
"distinct value count"785.0"784""2"89.07.07.0"618"3.0239.0"134""3"2.0
"idness"1.0"0.9987261146496815""0.0025477707006369425"0.114650.0089170.008917"0.7872611464968153"0.0038220.305732"0.17197452229299362""0.003821656050955414"0.002548
"missing value ratio"0.0"0.0""0.0"0.1898090.00.0"0.0"0.00.001274"0.7745222929936306""0.0"0.0
"stability"0.001274"0.0025477707006369425""0.6522292993630573"0.0487420.6700640.75414"0.007643312101910828"0.5414010.043367"0.02824858757062147""0.7019108280254777"0.62293
" ], "text/plain": [ "+-----------+-----------+-----------+-----------+---+-----------+-----------+-----------+----------+\n", @@ -100,38 +128,39 @@ "| str | f64 | str | str | | f64 | str | --- | f64 |\n", "| | | | | | | | str | |\n", "+==================================================================================================+\n", - "| min | 0.00000 | Abbing, | female | … | 0.00000 | A10 | Cherbourg | 0.00000 |\n", - "| | | Mr. | | | | | | |\n", - "| | | Anthony | | | | | | |\n", - "| max | 1308.0000 | van Melke | male | … | 512.32920 | T | Southampt | 1.00000 |\n", + "| min | 1.00000 | Abbott, | female | … | 0.00000 | A11 | Cherbourg | 0.00000 |\n", + "| | | Master. | | | | | | |\n", + "| | | Eugene | | | | | | |\n", + "| | | Joseph | | | | | | |\n", + "| max | 1307.0000 | van Melke | male | … | 512.32920 | T | Southampt | 1.00000 |\n", "| | 0 | beke, Mr. | | | | | on | |\n", "| | | Philemon | | | | | | |\n", - "| mean | 654.00000 | - | - | … | 33.29548 | - | - | 0.38197 |\n", - "| median | 654.00000 | - | - | … | 14.45420 | - | - | 0.00000 |\n", - "| standard | 378.02006 | - | - | … | 51.75867 | - | - | 0.48606 |\n", + "| mean | 654.40892 | - | - | … | 33.84986 | - | - | 0.37707 |\n", + "| median | 658.00000 | - | - | … | 14.50000 | - | - | 0.00000 |\n", + "| standard | 376.78051 | - | - | … | 55.72177 | - | - | 0.48496 |\n", "| deviation | | | | | | | | |\n", - "| distinct | 1309.0000 | 1307 | 2 | … | 281.00000 | 186 | 3 | 2.00000 |\n", - "| value | 0 | | | | | | | |\n", + "| distinct | 785.00000 | 784 | 2 | … | 239.00000 | 134 | 3 | 2.00000 |\n", + "| value | | | | | | | | |\n", "| count | | | | | | | | |\n", - "| idness | 1.00000 | 0.9984721 | 0.0015278 | … | 0.21543 | 0.1428571 | 0.0030557 | 0.00153 |\n", - "| | | 16119175 | 838808250 | | | 428571428 | 677616501 | |\n", - "| | | | 573 | | | 5 | 145 | |\n", - "| missing | 0.00000 | 0.0 | 0.0 | … | 0.00076 | 0.7746371 | 0.0015278 | 0.00000 |\n", - "| value | | | | | | 27578304 | 838808250 | |\n", - "| ratio | | | | | | | 573 | |\n", - "| stability | 0.00076 | 0.0015278 | 0.6440030 | … | 0.04587 | 0.0203389 | 0.6993114 | 0.61803 |\n", - "| | | 838808250 | 557677616 | | | 830508474 | 001530222 | |\n", - "| | | 573 | | | | 56 | | |\n", + "| idness | 1.00000 | 0.9987261 | 0.0025477 | … | 0.30573 | 0.1719745 | 0.0038216 | 0.00255 |\n", + "| | | 146496815 | 707006369 | | | 222929936 | 560509554 | |\n", + "| | | | 425 | | | 2 | 14 | |\n", + "| missing | 0.00000 | 0.0 | 0.0 | … | 0.00127 | 0.7745222 | 0.0 | 0.00000 |\n", + "| value | | | | | | 929936306 | | |\n", + "| ratio | | | | | | | | |\n", + "| stability | 0.00127 | 0.0025477 | 0.6522292 | … | 0.04337 | 0.0282485 | 0.7019108 | 0.62293 |\n", + "| | | 707006369 | 993630573 | | | 875706214 | 280254777 | |\n", + "| | | 425 | | | | 7 | | |\n", "+-----------+-----------+-----------+-----------+---+-----------+-----------+-----------+----------+" ] }, - "execution_count": 2, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "raw_data.summarize_statistics()" + "train_table.summarize_statistics()" ] }, { @@ -139,38 +168,40 @@ "metadata": {}, "source": [ "We remove certain columns for the following reasons:\n", - "1. **high idness**: `name`, `id` , `ticket`\n", + "1. **high idness**: `id` , `ticket`\n", "2. **high stability**: `parents_children` \n", "3. **high missing value ratio**: `cabin`" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ - "raw_data = raw_data.remove_columns([\"name\",\"id\",\"ticket\", \"parents_children\", \"cabin\"])" + "train_table = train_table.remove_columns([\"id\",\"ticket\", \"parents_children\", \"cabin\"])\n", + "test_table = test_table.remove_columns([\"id\",\"ticket\", \"parents_children\", \"cabin\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Imputing columns `age` and `fare`\n", + "### Imputing Columns `age` and `fare`\n", "We fill in missing values in the `age` and `fare` columns with the mean of each column\n" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from safeds.data.tabular.transformation import SimpleImputer\n", "\n", - "simple_transformer = SimpleImputer(column_names=[\"age\",\"fare\"],strategy=SimpleImputer.Strategy.mean())\n", - "_, transformed_raw_data = simple_transformer.fit_and_transform(raw_data)" + "simple_imputer = SimpleImputer(column_names=[\"age\",\"fare\"],strategy=SimpleImputer.Strategy.mean())\n", + "fitted_simple_imputer_train, transformed_train_data = simple_imputer.fit_and_transform(train_table)\n", + "transformed_test_data = fitted_simple_imputer_train.transform(test_table)" ] }, { @@ -179,14 +210,14 @@ "collapsed": false }, "source": [ - "### Using `OneHotEncoder` to create an encoder and fit and transform the table\n", + "### Using `OneHotEncoder` to `fit_and_transform` the Table\n", "We use `OneHotEncoder` to transform categorical, non-numerical values into numerical representations with values of zero or one. In this example, we will transform the values of the `sex` column, so they can be used in the model to predict passenger survival.\n", "- Use the `fit_and_transform` function of the `OneHotEncoder` to pass the table and the column names to be used as features for the prediction." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": { "collapsed": false }, @@ -194,24 +225,8 @@ "source": [ "from safeds.data.tabular.transformation import OneHotEncoder\n", "\n", - "one_hot_encoder = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"])\n", - "fitted_one_hot_encoder, transformed_raw_data = one_hot_encoder.fit_and_transform(transformed_raw_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Reverse transforming `transformed_raw_data`" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "reverse_transfromed_raw_data = transformed_raw_data.inverse_transform_table(transformer_one_hot_encoder)" + "fitted_one_hot_encoder_train, transformed_train_data = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"]).fit_and_transform(transformed_train_data)\n", + "transformed_test_data = fitted_one_hot_encoder_train.transform(transformed_test_data)" ] }, { @@ -237,31 +252,39 @@ " white-space: pre-wrap;\n", "}\n", "\n", - "shape: (9, 11)
metricagesiblings_spousestravel_classfaresurvivedsex__malesex__femaleport_embarked__Southamptonport_embarked__Cherbourgport_embarked__Queenstown
strf64f64f64f64f64f64f64f64f64f64
"min"0.16670.01.00.00.00.00.00.00.00.0
"max"80.08.03.0512.32921.01.01.01.01.01.0
"mean"29.8811350.4988542.29488233.2954790.3819710.6440030.3559970.6982430.2062640.093965
"median"29.8811350.03.014.45420.01.00.01.00.00.0
"standard deviation"12.8831991.0416580.83783651.7388790.4860550.4789970.4789970.4591960.4047770.291891
"distinct value count"99.07.03.0282.02.02.02.02.02.02.0
"idness"0.075630.0053480.0022920.2154320.0015280.0015280.0015280.0015280.0015280.001528
"missing value ratio"0.00.00.00.00.00.00.00.00.00.0
"stability"0.2009170.6806720.5416350.0458370.6180290.6440030.6440030.6982430.7937360.906035
" + "shape: (9, 12)
metricnameagesiblings_spousestravel_classfaresurvivedsex__malesex__femaleport_embarked__Southamptonport_embarked__Cherbourgport_embarked__Queenstown
strstrf64f64f64f64f64f64f64f64f64f64
"min""Abbott, Master. Eugene Joseph"0.16670.01.00.00.00.00.00.00.00.0
"max""van Melkebeke, Mr. Philemon"76.08.03.0512.32921.01.01.01.01.01.0
"mean""-"29.5421910.5184712.29808933.8498610.377070.6522290.3477710.7019110.2089170.089172
"median""-"29.5421910.03.014.50.01.00.01.00.00.0
"standard deviation""-"12.7474911.0678410.83471255.6862170.4849620.4765660.4765660.457710.4067940.285174
"distinct value count""784"90.07.03.0240.02.02.02.02.02.02.0
"idness""0.9987261146496815"0.114650.0089170.0038220.3057320.0025480.0025480.0025480.0025480.0025480.002548
"missing value ratio""0.0"0.00.00.00.00.00.00.00.00.00.0
"stability""0.0025477707006369425"0.1898090.6700640.5414010.0433120.622930.6522290.6522290.7019110.7910830.910828
" ], "text/plain": [ - "+-----------+----------+-----------+-----------+---+-----------+-----------+-----------+-----------+\n", - "| metric | age | siblings_ | travel_cl | … | sex__fema | port_emba | port_emba | port_emba |\n", - "| --- | --- | spouses | ass | | le | rked__Sou | rked__Che | rked__Que |\n", - "| str | f64 | --- | --- | | --- | thampton | rbourg | enstown |\n", - "| | | f64 | f64 | | f64 | --- | --- | --- |\n", - "| | | | | | | f64 | f64 | f64 |\n", + "+-----------+-----------+----------+-----------+---+-----------+-----------+-----------+-----------+\n", + "| metric | name | age | siblings_ | … | sex__fema | port_emba | port_emba | port_emba |\n", + "| --- | --- | --- | spouses | | le | rked__Sou | rked__Che | rked__Que |\n", + "| str | str | f64 | --- | | --- | thampton | rbourg | enstown |\n", + "| | | | f64 | | f64 | --- | --- | --- |\n", + "| | | | | | | f64 | f64 | f64 |\n", "+==================================================================================================+\n", - "| min | 0.16670 | 0.00000 | 1.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", - "| max | 80.00000 | 8.00000 | 3.00000 | … | 1.00000 | 1.00000 | 1.00000 | 1.00000 |\n", - "| mean | 29.88113 | 0.49885 | 2.29488 | … | 0.35600 | 0.69824 | 0.20626 | 0.09396 |\n", - "| median | 29.88113 | 0.00000 | 3.00000 | … | 0.00000 | 1.00000 | 0.00000 | 0.00000 |\n", - "| standard | 12.88320 | 1.04166 | 0.83784 | … | 0.47900 | 0.45920 | 0.40478 | 0.29189 |\n", - "| deviation | | | | | | | | |\n", - "| distinct | 99.00000 | 7.00000 | 3.00000 | … | 2.00000 | 2.00000 | 2.00000 | 2.00000 |\n", - "| value | | | | | | | | |\n", - "| count | | | | | | | | |\n", - "| idness | 0.07563 | 0.00535 | 0.00229 | … | 0.00153 | 0.00153 | 0.00153 | 0.00153 |\n", - "| missing | 0.00000 | 0.00000 | 0.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", - "| value | | | | | | | | |\n", - "| ratio | | | | | | | | |\n", - "| stability | 0.20092 | 0.68067 | 0.54163 | … | 0.64400 | 0.69824 | 0.79374 | 0.90604 |\n", - "+-----------+----------+-----------+-----------+---+-----------+-----------+-----------+-----------+" + "| min | Abbott, | 0.16670 | 0.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", + "| | Master. | | | | | | | |\n", + "| | Eugene | | | | | | | |\n", + "| | Joseph | | | | | | | |\n", + "| max | van Melke | 76.00000 | 8.00000 | … | 1.00000 | 1.00000 | 1.00000 | 1.00000 |\n", + "| | beke, Mr. | | | | | | | |\n", + "| | Philemon | | | | | | | |\n", + "| mean | - | 29.54219 | 0.51847 | … | 0.34777 | 0.70191 | 0.20892 | 0.08917 |\n", + "| median | - | 29.54219 | 0.00000 | … | 0.00000 | 1.00000 | 0.00000 | 0.00000 |\n", + "| standard | - | 12.74749 | 1.06784 | … | 0.47657 | 0.45771 | 0.40679 | 0.28517 |\n", + "| deviation | | | | | | | | |\n", + "| distinct | 784 | 90.00000 | 7.00000 | … | 2.00000 | 2.00000 | 2.00000 | 2.00000 |\n", + "| value | | | | | | | | |\n", + "| count | | | | | | | | |\n", + "| idness | 0.9987261 | 0.11465 | 0.00892 | … | 0.00255 | 0.00255 | 0.00255 | 0.00255 |\n", + "| | 146496815 | | | | | | | |\n", + "| missing | 0.0 | 0.00000 | 0.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", + "| value | | | | | | | | |\n", + "| ratio | | | | | | | | |\n", + "| stability | 0.0025477 | 0.18981 | 0.67006 | … | 0.65223 | 0.70191 | 0.79108 | 0.91083 |\n", + "| | 707006369 | | | | | | | |\n", + "| | 425 | | | | | | | |\n", + "+-----------+-----------+----------+-----------+---+-----------+-----------+-----------+-----------+" ] }, "execution_count": 7, @@ -270,25 +293,27 @@ } ], "source": [ - "transformed_raw_data.summarize_statistics()" + "transformed_train_data.summarize_statistics()" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "collapsed": false + }, "source": [ - "### Spliting the `raw_data` into train and test sets\n", - "- **Training set**: Contains 60% of the data and will be used to train the model.\n", - "- **Testing set**: Contains 40% of the data and will be used to test the model's accuracy." + "### Mark the `survived` Column as the Target Variable to Be Predicted" ] }, { "cell_type": "code", "execution_count": 8, - "metadata": {}, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ - "train_table, test_table = transformed_raw_data.shuffle_rows().split_rows(0.6)" + "tagged_train_table = transformed_train_data.to_tabular_dataset(\"survived\",extra_names=[\"name\"])" ] }, { @@ -297,7 +322,8 @@ "collapsed": false }, "source": [ - "### Mark the `survived` column as the target variable to be predicted" + "### Using `RandomForest` Classifier as a Model for Classification\n", + "We use the `RandomForest` classifier as our model and pass the training dataset to the model's `fit` function to train it." ] }, { @@ -308,7 +334,10 @@ }, "outputs": [], "source": [ - "tagged_train_table = train_table.to_tabular_dataset(\"survived\")" + "from safeds.ml.classical.classification import RandomForestClassifier\n", + "\n", + "classifier = RandomForestClassifier()\n", + "fitted_classifier = classifier.fit(tagged_train_table)" ] }, { @@ -317,8 +346,9 @@ "collapsed": false }, "source": [ - "## Using `RandomForest` classifier as a model for classification\n", - "We use the `RandomForest` classifier as our model and pass the training dataset to the model's `fit` function to train it." + "### Using the Trained Random Forest Model to Predict Survival\n", + "Use the trained `RandomForest` model to predict the survival rate of passengers in the test dataset.
\n", + "Pass the `test_table` into the `predict` function, which uses our trained model for prediction." ] }, { @@ -329,29 +359,20 @@ }, "outputs": [], "source": [ - "from safeds.ml.classical.classification import RandomForestClassifier\n", - "\n", - "classifier = RandomForestClassifier()\n", - "fitted_classifier= classifier.fit(tagged_train_table)" + "prediction = fitted_classifier.predict(transformed_test_data)" ] }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ - "### Using the trained random forest model to predict survival\n", - "Use the trained `RandomForest` model to predict the survival rate of passengers in the test dataset.\n", - "Pass the `test_table` into the `predict` function, which uses our trained model for prediction." + "### Reverse Transforming the `prediction`" ] }, { "cell_type": "code", "execution_count": 11, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -363,28 +384,40 @@ " white-space: pre-wrap;\n", "}\n", "\n", - "shape: (15, 10)
agesiblings_spousestravel_classfaresex__malesex__femaleport_embarked__Southamptonport_embarked__Cherbourgport_embarked__Queenstownsurvived
f64i64i64f64u8u8u8u8u8i64
45.00230.0011001
29.881135037.8958100100
29.881135038.05101000
29.881135037.75100010
4.00322.025011000
29.0037.875101000
44.0038.05101000
48.0037.8542101000
21.00177.2875101000
56.00130.6958100100
" + "shape: (15, 8)
nameagesiblings_spousestravel_classfaresurvivedsexport_embarked
strf64i64i64f64i64strstr
"Christy, Mrs. (Alice Frances)"45.00230.01"female""Southampton"
"Gheorgheff, Mr. Stanio"29.542191037.89580"male""Cherbourg"
"Miles, Mr. Frank"29.542191038.050"male""Southampton"
"Foley, Mr. William"29.542191037.750"male""Queenstown"
"Kink-Heilmann, Miss. Luise Gre…4.00322.0250"female""Southampton"
"Zimmerman, Mr. Leo"29.0037.8750"male""Southampton"
"Kelly, Mr. James"44.0038.050"male""Southampton"
"Jensen, Mr. Niels Peder"48.0037.85420"male""Southampton"
"White, Mr. Richard Frasar"21.00177.28750"male""Southampton"
"Smith, Mr. James Clinch"56.00130.69580"male""Cherbourg"
" ], "text/plain": [ - "+----------+------------+------------+----------+---+-----------+-----------+-----------+----------+\n", - "| age | siblings_s | travel_cla | fare | … | port_emba | port_emba | port_emba | survived |\n", - "| --- | pouses | ss | --- | | rked__Sou | rked__Che | rked__Que | --- |\n", - "| f64 | --- | --- | f64 | | thampton | rbourg | enstown | i64 |\n", - "| | i64 | i64 | | | --- | --- | --- | |\n", - "| | | | | | u8 | u8 | u8 | |\n", + "+--------------+----------+-------------+-------------+----------+----------+--------+-------------+\n", + "| name | age | siblings_sp | travel_clas | fare | survived | sex | port_embark |\n", + "| --- | --- | ouses | s | --- | --- | --- | ed |\n", + "| str | f64 | --- | --- | f64 | i64 | str | --- |\n", + "| | | i64 | i64 | | | | str |\n", "+==================================================================================================+\n", - "| 45.00000 | 0 | 2 | 30.00000 | … | 1 | 0 | 0 | 1 |\n", - "| 29.88113 | 0 | 3 | 7.89580 | … | 0 | 1 | 0 | 0 |\n", - "| 29.88113 | 0 | 3 | 8.05000 | … | 1 | 0 | 0 | 0 |\n", - "| 29.88113 | 0 | 3 | 7.75000 | … | 0 | 0 | 1 | 0 |\n", - "| 4.00000 | 0 | 3 | 22.02500 | … | 1 | 0 | 0 | 0 |\n", - "| … | … | … | … | … | … | … | … | … |\n", - "| 29.00000 | 0 | 3 | 7.87500 | … | 1 | 0 | 0 | 0 |\n", - "| 44.00000 | 0 | 3 | 8.05000 | … | 1 | 0 | 0 | 0 |\n", - "| 48.00000 | 0 | 3 | 7.85420 | … | 1 | 0 | 0 | 0 |\n", - "| 21.00000 | 0 | 1 | 77.28750 | … | 1 | 0 | 0 | 0 |\n", - "| 56.00000 | 0 | 1 | 30.69580 | … | 0 | 1 | 0 | 0 |\n", - "+----------+------------+------------+----------+---+-----------+-----------+-----------+----------+" + "| Christy, | 45.00000 | 0 | 2 | 30.00000 | 1 | female | Southampton |\n", + "| Mrs. (Alice | | | | | | | |\n", + "| Frances) | | | | | | | |\n", + "| Gheorgheff, | 29.54219 | 0 | 3 | 7.89580 | 0 | male | Cherbourg |\n", + "| Mr. Stanio | | | | | | | |\n", + "| Miles, Mr. | 29.54219 | 0 | 3 | 8.05000 | 0 | male | Southampton |\n", + "| Frank | | | | | | | |\n", + "| Foley, Mr. | 29.54219 | 0 | 3 | 7.75000 | 0 | male | Queenstown |\n", + "| William | | | | | | | |\n", + "| Kink-Heilman | 4.00000 | 0 | 3 | 22.02500 | 0 | female | Southampton |\n", + "| n, Miss. | | | | | | | |\n", + "| Luise Gre… | | | | | | | |\n", + "| … | … | … | … | … | … | … | … |\n", + "| Zimmerman, | 29.00000 | 0 | 3 | 7.87500 | 0 | male | Southampton |\n", + "| Mr. Leo | | | | | | | |\n", + "| Kelly, Mr. | 44.00000 | 0 | 3 | 8.05000 | 0 | male | Southampton |\n", + "| James | | | | | | | |\n", + "| Jensen, Mr. | 48.00000 | 0 | 3 | 7.85420 | 0 | male | Southampton |\n", + "| Niels Peder | | | | | | | |\n", + "| White, Mr. | 21.00000 | 0 | 1 | 77.28750 | 0 | male | Southampton |\n", + "| Richard | | | | | | | |\n", + "| Frasar | | | | | | | |\n", + "| Smith, Mr. | 56.00000 | 0 | 1 | 30.69580 | 0 | male | Cherbourg |\n", + "| James Clinch | | | | | | | |\n", + "+--------------+----------+-------------+-------------+----------+----------+--------+-------------+" ] }, "execution_count": 11, @@ -393,9 +426,9 @@ } ], "source": [ - "prediction = fitted_classifier.predict(test_table)\n", + "reverse_transformed_prediction = prediction.to_table().inverse_transform_table(fitted_one_hot_encoder_train)\n", "#For visualisation purposes we only print out the first 15 rows.\n", - "prediction.to_table().slice_rows(start=0, length=15)" + "reverse_transformed_prediction.slice_rows(length=15)" ] }, { @@ -404,7 +437,7 @@ "collapsed": false }, "source": [ - "### Testing the accuracy of the model" + "### Testing the Accuracy of the Model" ] }, { @@ -426,8 +459,7 @@ } ], "source": [ - "test_tabular_dataset = test_table.to_tabular_dataset(\"survived\")\n", - "fitted_classifier.accuracy(test_tabular_dataset)" + "fitted_classifier.accuracy(transformed_test_data)" ] } ], From 51f226910a7ed9371bb5a2865d05dc7cec107f61 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 11:54:00 +0200 Subject: [PATCH 09/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index da89056e6..c9c7cc7ab 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -15,7 +15,7 @@ "collapsed": false }, "source": [ - "### Load Your Data into a `Table`\n", + "### Loading Data\n", "The data is available under [Titanic - Machine Learning from Disaster](https://www.kaggle.com/c/titanic/data):\n" ] }, From 7e10a9210c49f6db9b4a84455ced65cdddb4dfa4 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 11:54:24 +0200 Subject: [PATCH 10/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index c9c7cc7ab..a257f18fa 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -83,7 +83,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Spliting the `raw_data` into Train and Test Sets\n", + "### Splitting Data into Train and Test Sets\n", "- **Training set**: Contains 60% of the data and will be used to train the model.\n", "- **Testing set**: Contains 40% of the data and will be used to test the model's accuracy." ] From 9170135173c24abebff62e88b31551de1e737266 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 11:54:40 +0200 Subject: [PATCH 11/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index a257f18fa..cf681c3c9 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -187,7 +187,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Imputing Columns `age` and `fare`\n", + "### Handling Missing Values\n", "We fill in missing values in the `age` and `fare` columns with the mean of each column\n" ] }, From d09ffeec98b7a21d3d5a586baaf30c193142a656 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 11:55:02 +0200 Subject: [PATCH 12/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index cf681c3c9..1cfa8109b 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -210,7 +210,7 @@ "collapsed": false }, "source": [ - "### Using `OneHotEncoder` to `fit_and_transform` the Table\n", + "### Handling Nominal Categorical Data\n", "We use `OneHotEncoder` to transform categorical, non-numerical values into numerical representations with values of zero or one. In this example, we will transform the values of the `sex` column, so they can be used in the model to predict passenger survival.\n", "- Use the `fit_and_transform` function of the `OneHotEncoder` to pass the table and the column names to be used as features for the prediction." ] From f8447718c6514c6fb61c3dbb56e44d92e4bec4c1 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 12:00:20 +0200 Subject: [PATCH 13/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 1cfa8109b..2f6a01513 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -233,7 +233,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Statistics after Imputing / Removing / Encoding\n", + "### Statistics after Data Processing\n", "Check the data after cleaning and transformation to ensure the changes were made correctly.\n" ] }, From 8bd2a25373747379e077b3cfa5794520ffc40c43 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 12:01:20 +0200 Subject: [PATCH 14/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 2f6a01513..06d389af5 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -366,7 +366,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Reverse Transforming the `prediction`" + "### Reverse-Transforming the Prediction" ] }, { From 4a46369a589855ff3120ed294cc83778e1ef3c33 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 12:01:48 +0200 Subject: [PATCH 15/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 06d389af5..54f35c5c5 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -346,7 +346,7 @@ "collapsed": false }, "source": [ - "### Using the Trained Random Forest Model to Predict Survival\n", + "### Predicting with the Classifier\n", "Use the trained `RandomForest` model to predict the survival rate of passengers in the test dataset.
\n", "Pass the `test_table` into the `predict` function, which uses our trained model for prediction." ] From 7a0a302033a6b8670db5befc624a5706f9f1b3e9 Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 12:02:07 +0200 Subject: [PATCH 16/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 54f35c5c5..5938b7f14 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -322,7 +322,7 @@ "collapsed": false }, "source": [ - "### Using `RandomForest` Classifier as a Model for Classification\n", + "### Fitting a Classifier\n", "We use the `RandomForest` classifier as our model and pass the training dataset to the model's `fit` function to train it." ] }, From b5ce86a4249904b1aacd743854ee646ddb50da0e Mon Sep 17 00:00:00 2001 From: Leon Peplau <115023385+LIEeOoNn@users.noreply.github.com> Date: Tue, 2 Jul 2024 12:02:22 +0200 Subject: [PATCH 17/19] changed heading Co-authored-by: Lars Reimann --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 5938b7f14..8027e3c98 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -302,7 +302,7 @@ "collapsed": false }, "source": [ - "### Mark the `survived` Column as the Target Variable to Be Predicted" + "### Marking the Target Column" ] }, { From e3b234df7d6ad68f1a5302fbff80d80003ca71a5 Mon Sep 17 00:00:00 2001 From: peplaul0 Date: Tue, 2 Jul 2024 12:07:21 +0200 Subject: [PATCH 18/19] updated link --- docs/tutorials/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 8027e3c98..45ea4aba5 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -16,7 +16,7 @@ }, "source": [ "### Loading Data\n", - "The data is available under [Titanic - Machine Learning from Disaster](https://www.kaggle.com/c/titanic/data):\n" + "The data is available under [Titanic - Machine Learning from Disaster](https://github.com/Safe-DS/Datasets/blob/main/src/safeds_datasets/tabular/_titanic/data/titanic.csv):\n" ] }, { From 0a72f4890389bd200a596684d94db6eb84930ec1 Mon Sep 17 00:00:00 2001 From: peplaul0 Date: Tue, 2 Jul 2024 17:07:00 +0200 Subject: [PATCH 19/19] added text to the asked for sections --- docs/tutorials/classification.ipynb | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 45ea4aba5..f0cecce53 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -302,7 +302,12 @@ "collapsed": false }, "source": [ - "### Marking the Target Column" + "### Marking the Target Column\n", + "Here, we set the target, extra, and feature columns using `to_tabular_dataset`.\n", + "This ensures the model knows which column to predict and which columns to use as features during training.\n", + "- target: `survived`\n", + "- extra: `name`\n", + "- fearutes: all columns expect target and extra" ] }, { @@ -366,7 +371,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Reverse-Transforming the Prediction" + "### Reverse-Transforming the Prediction\n", + "After making a prediction, the values will be in a transformed format. To interpret the results using the original values, we need to reverse this transformation. This is done using `inverse_transform_table` with the fitted transformers that support inverse transformation." ] }, { @@ -437,7 +443,8 @@ "collapsed": false }, "source": [ - "### Testing the Accuracy of the Model" + "### Testing the Accuracy of the Model\n", + "We evaluate the performance of the trained model by testing its accuracy on the transformed test data using `accuracy`." ] }, { @@ -448,18 +455,16 @@ }, "outputs": [ { - "data": { - "text/plain": [ - "0.7938931297709924" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy on test data: 79.3893%\n" + ] } ], "source": [ - "fitted_classifier.accuracy(transformed_test_data)" + "accuracy = fitted_classifier.accuracy(transformed_test_data) * 100\n", + "print(f'Accuracy on test data: {accuracy:.4f}%')" ] } ],