diff --git a/docs/tutorials/classification.ipynb b/docs/tutorials/classification.ipynb index 0bba08deb..f0cecce53 100644 --- a/docs/tutorials/classification.ipynb +++ b/docs/tutorials/classification.ipynb @@ -2,195 +2,470 @@ "cells": [ { "cell_type": "markdown", - "source": [ - "# Classification\n", - "\n", - "This tutorial uses safeds on **titanic passenger data** to predict who will survive and who will not, using sex as a feature for the prediction.\n" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "In this tutorial, we use `safeds` on **Titanic passenger data** to predict who will survive and who will not." + ] }, { "cell_type": "markdown", - "source": [ - "1. Load your data into a `Table`, the data is available under `docs/tutorials/data/titanic.csv`:\n" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "### Loading Data\n", + "The data is available under [Titanic - Machine Learning from Disaster](https://github.com/Safe-DS/Datasets/blob/main/src/safeds_datasets/tabular/_titanic/data/titanic.csv):\n" + ] }, { "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (15, 12)
idnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
i64strstrf64i64i64stri64f64strstri64
0"Abbing, Mr. Anthony""male"42.000"C.A. 5547"37.55null"Southampton"0
1"Abbott, Master. Eugene Joseph""male"13.002"C.A. 2673"320.25null"Southampton"0
2"Abbott, Mr. Rossmore Edward""male"16.011"C.A. 2673"320.25null"Southampton"0
3"Abbott, Mrs. Stanton (Rosa Hun…"female"35.011"C.A. 2673"320.25null"Southampton"1
4"Abelseth, Miss. Karen Marie""female"16.000"348125"37.65null"Southampton"1
10"Adahl, Mr. Mauritz Nils Martin""male"30.000"C 7076"37.25null"Southampton"0
11"Adams, Mr. John""male"26.000"341826"38.05null"Southampton"0
12"Ahlin, Mrs. Johan (Johanna Per…"female"40.010"7546"39.475null"Southampton"0
13"Aks, Master. Philip Frank""male"0.833301"392091"39.35null"Southampton"1
14"Aks, Mrs. Sam (Leah Rosen)""female"18.001"392091"39.35null"Southampton"1
" + ], + "text/plain": [ + "+-----+----------------------+--------+----------+---+----------+-------+---------------+----------+\n", + "| id | name | sex | age | … | fare | cabin | port_embarked | survived |\n", + "| --- | --- | --- | --- | | --- | --- | --- | --- |\n", + "| i64 | str | str | f64 | | f64 | str | str | i64 |\n", + "+==================================================================================================+\n", + "| 0 | Abbing, Mr. Anthony | male | 42.00000 | … | 7.55000 | null | Southampton | 0 |\n", + "| 1 | Abbott, Master. | male | 13.00000 | … | 20.25000 | null | Southampton | 0 |\n", + "| | Eugene Joseph | | | | | | | |\n", + "| 2 | Abbott, Mr. Rossmore | male | 16.00000 | … | 20.25000 | null | Southampton | 0 |\n", + "| | Edward | | | | | | | |\n", + "| 3 | Abbott, Mrs. Stanton | female | 35.00000 | … | 20.25000 | null | Southampton | 1 |\n", + "| | (Rosa Hun… | | | | | | | |\n", + "| 4 | Abelseth, Miss. | female | 16.00000 | … | 7.65000 | null | Southampton | 1 |\n", + "| | Karen Marie | | | | | | | |\n", + "| … | … | … | … | … | … | … | … | … |\n", + "| 10 | Adahl, Mr. Mauritz | male | 30.00000 | … | 7.25000 | null | Southampton | 0 |\n", + "| | Nils Martin | | | | | | | |\n", + "| 11 | Adams, Mr. John | male | 26.00000 | … | 8.05000 | null | Southampton | 0 |\n", + "| 12 | Ahlin, Mrs. Johan | female | 40.00000 | … | 9.47500 | null | Southampton | 0 |\n", + "| | (Johanna Per… | | | | | | | |\n", + "| 13 | Aks, Master. Philip | male | 0.83330 | … | 9.35000 | null | Southampton | 1 |\n", + "| | Frank | | | | | | | |\n", + "| 14 | Aks, Mrs. Sam (Leah | female | 18.00000 | … | 9.35000 | null | Southampton | 1 |\n", + "| | Rosen) | | | | | | | |\n", + "+-----+----------------------+--------+----------+---+----------+-------+---------------+----------+" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "from safeds.data.tabular.containers import Table\n", "\n", - "titanic = Table.from_csv_file(\"data/titanic.csv\")\n", + "raw_data = Table.from_csv_file(\"data/titanic.csv\")\n", "#For visualisation purposes we only print out the first 15 rows.\n", - "titanic.slice_rows(0, 15)" - ], - "metadata": { - "collapsed": false - }, - "execution_count": null, - "outputs": [] + "raw_data.slice_rows(length=15)" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "2. Split the titanic dataset into two tables. A training set, that we will use later to implement a training model to predict the survival of passengers, containing 60% of the data, and a testing set containing the rest of the data.\n", - "Delete the column `survived` from the test set, to be able to predict it later:" - ], - "metadata": { - "collapsed": false - } + "### Splitting Data into Train and Test Sets\n", + "- **Training set**: Contains 60% of the data and will be used to train the model.\n", + "- **Testing set**: Contains 40% of the data and will be used to test the model's accuracy." + ] }, { "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], "source": [ - "train_table, testing_table = titanic.split_rows(0.6)\n", - "\n", - "test_table = testing_table.remove_columns([\"survived\"]).shuffle_rows()" - ], - "metadata": { - "collapsed": false - }, - "execution_count": null, - "outputs": [] + "train_table, test_table = raw_data.shuffle_rows().split_rows(0.6)" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "3. Use `OneHotEncoder` to create an encoder, that will be used later to transform the training table.\n", - "* We use `OneHotEncoder` to transform non-numerical categorical values into numerical representations with values of zero or one. In this example we will transform the values of the sex column, hence they will be used in the model for predicting the surviving of passengers.\n", - "* Use the `fit` function of the `OneHotEncoder` to pass the table and the column names, that will be used as features to predict who will survive to the encoder.\n", - "* The names of the column before transformation need to be saved, because `OneHotEncoder` changes the names of the fitted `Column`s:\n" + "### Removing Low-Quality Columns" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (9, 13)
metricidnamesexagesiblings_spousesparents_childrentickettravel_classfarecabinport_embarkedsurvived
strf64strstrf64f64f64strf64f64strstrf64
"min"1.0"Abbott, Master. Eugene Joseph""female"0.16670.00.0"110152"1.00.0"A11""Cherbourg"0.0
"max"1307.0"van Melkebeke, Mr. Philemon""male"76.08.06.0"WE/P 5735"3.0512.3292"T""Southampton"1.0
"mean"654.408917"-""-"29.5421910.5184710.396178"-"2.29808933.849861"-""-"0.37707
"median"658.0"-""-"28.00.00.0"-"3.014.5"-""-"0.0
"standard deviation"376.780514"-""-"14.1643251.0678410.818931"-"0.83471255.721765"-""-"0.484962
"distinct value count"785.0"784""2"89.07.07.0"618"3.0239.0"134""3"2.0
"idness"1.0"0.9987261146496815""0.0025477707006369425"0.114650.0089170.008917"0.7872611464968153"0.0038220.305732"0.17197452229299362""0.003821656050955414"0.002548
"missing value ratio"0.0"0.0""0.0"0.1898090.00.0"0.0"0.00.001274"0.7745222929936306""0.0"0.0
"stability"0.001274"0.0025477707006369425""0.6522292993630573"0.0487420.6700640.75414"0.007643312101910828"0.5414010.043367"0.02824858757062147""0.7019108280254777"0.62293
" + ], + "text/plain": [ + "+-----------+-----------+-----------+-----------+---+-----------+-----------+-----------+----------+\n", + "| metric | id | name | sex | … | fare | cabin | port_emba | survived |\n", + "| --- | --- | --- | --- | | --- | --- | rked | --- |\n", + "| str | f64 | str | str | | f64 | str | --- | f64 |\n", + "| | | | | | | | str | |\n", + "+==================================================================================================+\n", + "| min | 1.00000 | Abbott, | female | … | 0.00000 | A11 | Cherbourg | 0.00000 |\n", + "| | | Master. | | | | | | |\n", + "| | | Eugene | | | | | | |\n", + "| | | Joseph | | | | | | |\n", + "| max | 1307.0000 | van Melke | male | … | 512.32920 | T | Southampt | 1.00000 |\n", + "| | 0 | beke, Mr. | | | | | on | |\n", + "| | | Philemon | | | | | | |\n", + "| mean | 654.40892 | - | - | … | 33.84986 | - | - | 0.37707 |\n", + "| median | 658.00000 | - | - | … | 14.50000 | - | - | 0.00000 |\n", + "| standard | 376.78051 | - | - | … | 55.72177 | - | - | 0.48496 |\n", + "| deviation | | | | | | | | |\n", + "| distinct | 785.00000 | 784 | 2 | … | 239.00000 | 134 | 3 | 2.00000 |\n", + "| value | | | | | | | | |\n", + "| count | | | | | | | | |\n", + "| idness | 1.00000 | 0.9987261 | 0.0025477 | … | 0.30573 | 0.1719745 | 0.0038216 | 0.00255 |\n", + "| | | 146496815 | 707006369 | | | 222929936 | 560509554 | |\n", + "| | | | 425 | | | 2 | 14 | |\n", + "| missing | 0.00000 | 0.0 | 0.0 | … | 0.00127 | 0.7745222 | 0.0 | 0.00000 |\n", + "| value | | | | | | 929936306 | | |\n", + "| ratio | | | | | | | | |\n", + "| stability | 0.00127 | 0.0025477 | 0.6522292 | … | 0.04337 | 0.0282485 | 0.7019108 | 0.62293 |\n", + "| | | 707006369 | 993630573 | | | 875706214 | 280254777 | |\n", + "| | | 425 | | | | 7 | | |\n", + "+-----------+-----------+-----------+-----------+---+-----------+-----------+-----------+----------+" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } ], - "metadata": { - "collapsed": false - } + "source": [ + "train_table.summarize_statistics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We remove certain columns for the following reasons:\n", + "1. **high idness**: `id` , `ticket`\n", + "2. **high stability**: `parents_children` \n", + "3. **high missing value ratio**: `cabin`" + ] }, { "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], "source": [ - "from safeds.data.tabular.transformation import OneHotEncoder\n", + "train_table = train_table.remove_columns([\"id\",\"ticket\", \"parents_children\", \"cabin\"])\n", + "test_table = test_table.remove_columns([\"id\",\"ticket\", \"parents_children\", \"cabin\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Handling Missing Values\n", + "We fill in missing values in the `age` and `fare` columns with the mean of each column\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "from safeds.data.tabular.transformation import SimpleImputer\n", "\n", - "encoder = OneHotEncoder(column_names=\"sex\").fit(train_table)" - ], + "simple_imputer = SimpleImputer(column_names=[\"age\",\"fare\"],strategy=SimpleImputer.Strategy.mean())\n", + "fitted_simple_imputer_train, transformed_train_data = simple_imputer.fit_and_transform(train_table)\n", + "transformed_test_data = fitted_simple_imputer_train.transform(test_table)" + ] + }, + { + "cell_type": "markdown", "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "source": [ + "### Handling Nominal Categorical Data\n", + "We use `OneHotEncoder` to transform categorical, non-numerical values into numerical representations with values of zero or one. In this example, we will transform the values of the `sex` column, so they can be used in the model to predict passenger survival.\n", + "- Use the `fit_and_transform` function of the `OneHotEncoder` to pass the table and the column names to be used as features for the prediction." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "from safeds.data.tabular.transformation import OneHotEncoder\n", + "\n", + "fitted_one_hot_encoder_train, transformed_train_data = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"]).fit_and_transform(transformed_train_data)\n", + "transformed_test_data = fitted_one_hot_encoder_train.transform(transformed_test_data)" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "4. Transform the training table using the fitted encoder, and create a set with the new names of the fitted `Column`s:\n" + "### Statistics after Data Processing\n", + "Check the data after cleaning and transformation to ensure the changes were made correctly.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (9, 12)
metricnameagesiblings_spousestravel_classfaresurvivedsex__malesex__femaleport_embarked__Southamptonport_embarked__Cherbourgport_embarked__Queenstown
strstrf64f64f64f64f64f64f64f64f64f64
"min""Abbott, Master. Eugene Joseph"0.16670.01.00.00.00.00.00.00.00.0
"max""van Melkebeke, Mr. Philemon"76.08.03.0512.32921.01.01.01.01.01.0
"mean""-"29.5421910.5184712.29808933.8498610.377070.6522290.3477710.7019110.2089170.089172
"median""-"29.5421910.03.014.50.01.00.01.00.00.0
"standard deviation""-"12.7474911.0678410.83471255.6862170.4849620.4765660.4765660.457710.4067940.285174
"distinct value count""784"90.07.03.0240.02.02.02.02.02.02.0
"idness""0.9987261146496815"0.114650.0089170.0038220.3057320.0025480.0025480.0025480.0025480.0025480.002548
"missing value ratio""0.0"0.00.00.00.00.00.00.00.00.00.0
"stability""0.0025477707006369425"0.1898090.6700640.5414010.0433120.622930.6522290.6522290.7019110.7910830.910828
" + ], + "text/plain": [ + "+-----------+-----------+----------+-----------+---+-----------+-----------+-----------+-----------+\n", + "| metric | name | age | siblings_ | … | sex__fema | port_emba | port_emba | port_emba |\n", + "| --- | --- | --- | spouses | | le | rked__Sou | rked__Che | rked__Que |\n", + "| str | str | f64 | --- | | --- | thampton | rbourg | enstown |\n", + "| | | | f64 | | f64 | --- | --- | --- |\n", + "| | | | | | | f64 | f64 | f64 |\n", + "+==================================================================================================+\n", + "| min | Abbott, | 0.16670 | 0.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", + "| | Master. | | | | | | | |\n", + "| | Eugene | | | | | | | |\n", + "| | Joseph | | | | | | | |\n", + "| max | van Melke | 76.00000 | 8.00000 | … | 1.00000 | 1.00000 | 1.00000 | 1.00000 |\n", + "| | beke, Mr. | | | | | | | |\n", + "| | Philemon | | | | | | | |\n", + "| mean | - | 29.54219 | 0.51847 | … | 0.34777 | 0.70191 | 0.20892 | 0.08917 |\n", + "| median | - | 29.54219 | 0.00000 | … | 0.00000 | 1.00000 | 0.00000 | 0.00000 |\n", + "| standard | - | 12.74749 | 1.06784 | … | 0.47657 | 0.45771 | 0.40679 | 0.28517 |\n", + "| deviation | | | | | | | | |\n", + "| distinct | 784 | 90.00000 | 7.00000 | … | 2.00000 | 2.00000 | 2.00000 | 2.00000 |\n", + "| value | | | | | | | | |\n", + "| count | | | | | | | | |\n", + "| idness | 0.9987261 | 0.11465 | 0.00892 | … | 0.00255 | 0.00255 | 0.00255 | 0.00255 |\n", + "| | 146496815 | | | | | | | |\n", + "| missing | 0.0 | 0.00000 | 0.00000 | … | 0.00000 | 0.00000 | 0.00000 | 0.00000 |\n", + "| value | | | | | | | | |\n", + "| ratio | | | | | | | | |\n", + "| stability | 0.0025477 | 0.18981 | 0.67006 | … | 0.65223 | 0.70191 | 0.79108 | 0.91083 |\n", + "| | 707006369 | | | | | | | |\n", + "| | 425 | | | | | | | |\n", + "+-----------+-----------+----------+-----------+---+-----------+-----------+-----------+-----------+" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } ], + "source": [ + "transformed_train_data.summarize_statistics()" + ] + }, + { + "cell_type": "markdown", "metadata": { "collapsed": false - } + }, + "source": [ + "### Marking the Target Column\n", + "Here, we set the target, extra, and feature columns using `to_tabular_dataset`.\n", + "This ensures the model knows which column to predict and which columns to use as features during training.\n", + "- target: `survived`\n", + "- extra: `name`\n", + "- fearutes: all columns expect target and extra" + ] }, { "cell_type": "code", - "source": "transformed_table = encoder.transform(train_table)", + "execution_count": 8, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "tagged_train_table = transformed_train_data.to_tabular_dataset(\"survived\",extra_names=[\"name\"])" + ] }, { "cell_type": "markdown", - "source": "5. Mark the `survived` `Column` as the target variable to be predicted. Include some columns only as extra columns, which are completely ignored by the model:", "metadata": { "collapsed": false - } + }, + "source": [ + "### Fitting a Classifier\n", + "We use the `RandomForest` classifier as our model and pass the training dataset to the model's `fit` function to train it." + ] }, { "cell_type": "code", - "source": [ - "extra_names = [\"id\", \"name\", \"ticket\", \"cabin\", \"port_embarked\", \"age\", \"fare\"]\n", - "\n", - "train_tabular_dataset = transformed_table.to_tabular_dataset(\"survived\", extra_names=extra_names)" - ], + "execution_count": 9, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "from safeds.ml.classical.classification import RandomForestClassifier\n", + "\n", + "classifier = RandomForestClassifier()\n", + "fitted_classifier = classifier.fit(tagged_train_table)" + ] }, { "cell_type": "markdown", - "source": "6. Use `RandomForest` classifier as a model for the classification. Pass the \"train_tabular_dataset\" table to the fit function of the model:", "metadata": { "collapsed": false - } + }, + "source": [ + "### Predicting with the Classifier\n", + "Use the trained `RandomForest` model to predict the survival rate of passengers in the test dataset.
\n", + "Pass the `test_table` into the `predict` function, which uses our trained model for prediction." + ] }, { "cell_type": "code", - "source": [ - "from safeds.ml.classical.classification import RandomForestClassifier\n", - "\n", - "model = RandomForestClassifier()\n", - "fitted_model= model.fit(train_tabular_dataset)" - ], + "execution_count": 10, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "prediction = fitted_classifier.predict(transformed_test_data)" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "7. Use the fitted random forest model, that we trained on the training dataset to predict the survival rate of passengers in the test dataset.\n", - "Transform the test data with `OneHotEncoder` first, to be able to pass it to the predict function, that uses our fitted random forest model for prediction:" - ], - "metadata": { - "collapsed": false - } + "### Reverse-Transforming the Prediction\n", + "After making a prediction, the values will be in a transformed format. To interpret the results using the original values, we need to reverse this transformation. This is done using `inverse_transform_table` with the fitted transformers that support inverse transformation." + ] }, { "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "shape: (15, 8)
nameagesiblings_spousestravel_classfaresurvivedsexport_embarked
strf64i64i64f64i64strstr
"Christy, Mrs. (Alice Frances)"45.00230.01"female""Southampton"
"Gheorgheff, Mr. Stanio"29.542191037.89580"male""Cherbourg"
"Miles, Mr. Frank"29.542191038.050"male""Southampton"
"Foley, Mr. William"29.542191037.750"male""Queenstown"
"Kink-Heilmann, Miss. Luise Gre…4.00322.0250"female""Southampton"
"Zimmerman, Mr. Leo"29.0037.8750"male""Southampton"
"Kelly, Mr. James"44.0038.050"male""Southampton"
"Jensen, Mr. Niels Peder"48.0037.85420"male""Southampton"
"White, Mr. Richard Frasar"21.00177.28750"male""Southampton"
"Smith, Mr. James Clinch"56.00130.69580"male""Cherbourg"
" + ], + "text/plain": [ + "+--------------+----------+-------------+-------------+----------+----------+--------+-------------+\n", + "| name | age | siblings_sp | travel_clas | fare | survived | sex | port_embark |\n", + "| --- | --- | ouses | s | --- | --- | --- | ed |\n", + "| str | f64 | --- | --- | f64 | i64 | str | --- |\n", + "| | | i64 | i64 | | | | str |\n", + "+==================================================================================================+\n", + "| Christy, | 45.00000 | 0 | 2 | 30.00000 | 1 | female | Southampton |\n", + "| Mrs. (Alice | | | | | | | |\n", + "| Frances) | | | | | | | |\n", + "| Gheorgheff, | 29.54219 | 0 | 3 | 7.89580 | 0 | male | Cherbourg |\n", + "| Mr. Stanio | | | | | | | |\n", + "| Miles, Mr. | 29.54219 | 0 | 3 | 8.05000 | 0 | male | Southampton |\n", + "| Frank | | | | | | | |\n", + "| Foley, Mr. | 29.54219 | 0 | 3 | 7.75000 | 0 | male | Queenstown |\n", + "| William | | | | | | | |\n", + "| Kink-Heilman | 4.00000 | 0 | 3 | 22.02500 | 0 | female | Southampton |\n", + "| n, Miss. | | | | | | | |\n", + "| Luise Gre… | | | | | | | |\n", + "| … | … | … | … | … | … | … | … |\n", + "| Zimmerman, | 29.00000 | 0 | 3 | 7.87500 | 0 | male | Southampton |\n", + "| Mr. Leo | | | | | | | |\n", + "| Kelly, Mr. | 44.00000 | 0 | 3 | 8.05000 | 0 | male | Southampton |\n", + "| James | | | | | | | |\n", + "| Jensen, Mr. | 48.00000 | 0 | 3 | 7.85420 | 0 | male | Southampton |\n", + "| Niels Peder | | | | | | | |\n", + "| White, Mr. | 21.00000 | 0 | 1 | 77.28750 | 0 | male | Southampton |\n", + "| Richard | | | | | | | |\n", + "| Frasar | | | | | | | |\n", + "| Smith, Mr. | 56.00000 | 0 | 1 | 30.69580 | 0 | male | Cherbourg |\n", + "| James Clinch | | | | | | | |\n", + "+--------------+----------+-------------+-------------+----------+----------+--------+-------------+" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "transformed_test_table = encoder.transform(test_table)\n", - "\n", - "prediction = fitted_model.predict(\n", - " transformed_test_table\n", - ")\n", + "reverse_transformed_prediction = prediction.to_table().inverse_transform_table(fitted_one_hot_encoder_train)\n", "#For visualisation purposes we only print out the first 15 rows.\n", - "prediction.to_table().slice_rows(start=0, length=15)" - ], - "metadata": { - "collapsed": false - }, - "execution_count": null, - "outputs": [] + "reverse_transformed_prediction.slice_rows(length=15)" + ] }, { "cell_type": "markdown", - "source": [ - "8. You can test the accuracy of that model with the initial testing_table as follows:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "### Testing the Accuracy of the Model\n", + "We evaluate the performance of the trained model by testing its accuracy on the transformed test data using `accuracy`." + ] }, { "cell_type": "code", - "source": [ - "testing_table = encoder.transform(testing_table)\n", - "\n", - "test_tabular_dataset = testing_table.to_tabular_dataset(\"survived\", extra_names=extra_names)\n", - "fitted_model.accuracy(test_tabular_dataset)\n" - ], + "execution_count": 12, "metadata": { "collapsed": false }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy on test data: 79.3893%\n" + ] + } + ], + "source": [ + "accuracy = fitted_classifier.accuracy(transformed_test_data) * 100\n", + "print(f'Accuracy on test data: {accuracy:.4f}%')" + ] } ], "metadata": { @@ -202,14 +477,14 @@ "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.6" + "pygments_lexer": "ipython3", + "version": "3.12.3" } }, "nbformat": 4,