Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve performance of model & forward layer #616

Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
019a6d8
perf: Add special case to `Table.add_rows` to increase performance
Marsmaennchen221 Apr 3, 2024
5471a03
style: apply automated linter fixes
megalinter-bot Apr 3, 2024
0802e0e
perf: change number_of_rows to number_of_columns in `add_rows` as 0 c…
Marsmaennchen221 Apr 3, 2024
a177d97
perf: special case if rows has columns but no rows
Marsmaennchen221 Apr 3, 2024
8450dc6
test: Added test for `Table.add_rows` for "same schema add no rows"
Marsmaennchen221 Apr 3, 2024
bae8e4d
perf: suggested performance upgrades for nn._fnn_layer and nn._model
Marsmaennchen221 Apr 4, 2024
4ae795e
make dataloader shuffle data each epoch
sibre28 Apr 9, 2024
350b771
add learning_rate parameter to fit() function
sibre28 Apr 9, 2024
905f103
raise an Error if test data doesnt match format of train data
sibre28 Apr 9, 2024
7593204
add abstract layer class
sibre28 Apr 9, 2024
f5f291e
make forward return tensor instead of float and change method to buil…
sibre28 Apr 9, 2024
7284c89
remove uncoverable lines from codecov
sibre28 Apr 9, 2024
0572cb4
small change
sibre28 Apr 9, 2024
0eefee7
small change
sibre28 Apr 9, 2024
f5bdf22
add abstract functions
sibre28 Apr 10, 2024
e3543fb
change for linter
sibre28 Apr 10, 2024
41bdd6a
change for linter
sibre28 Apr 10, 2024
1487ea2
change for linter
sibre28 Apr 10, 2024
8df1f26
change for linter
sibre28 Apr 10, 2024
37238db
Merge branch 'main' into 610-improve-fnn-layer-and-model-performance-…
sibre28 Apr 10, 2024
03907fd
style: apply automated linter fixes
megalinter-bot Apr 10, 2024
7f56aa2
style: apply automated linter fixes
megalinter-bot Apr 10, 2024
0923989
change for linter
sibre28 Apr 10, 2024
4c64e53
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 10, 2024
625c0b6
change for linter
sibre28 Apr 10, 2024
5feeff5
style: apply automated linter fixes
megalinter-bot Apr 10, 2024
3c98c23
accumulate epoch and batch counters and loss over all fit-calls
sibre28 Apr 10, 2024
6a05dd6
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 10, 2024
8191fc4
style: apply automated linter fixes
megalinter-bot Apr 10, 2024
908409a
add input_size property to Layer
sibre28 Apr 10, 2024
2e253af
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 10, 2024
846a36d
raise InputSizeError if input size and table size mismatch
sibre28 Apr 10, 2024
c4c0965
style: apply automated linter fixes
megalinter-bot Apr 10, 2024
fe842e8
perf: suggested performance upgrades for dataloader in TaggedTable an…
Marsmaennchen221 Apr 12, 2024
4892d5d
rename FNNLayer to Forward Layer and put it in separate File
sibre28 Apr 13, 2024
d42e44f
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 13, 2024
0ca1ec8
style: apply automated linter fixes
megalinter-bot Apr 13, 2024
d39ada7
remove unnecessary test file
sibre28 Apr 14, 2024
f567310
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 14, 2024
a41c330
Merge remote-tracking branch 'origin/suggested_perf_upgrades_model_an…
sibre28 Apr 14, 2024
8b30b1a
Merge remote-tracking branch 'origin/suggested_perf_upgrades_tagged_t…
sibre28 Apr 14, 2024
bf74b67
merge suggested changes
sibre28 Apr 15, 2024
1d0bfa1
style: apply automated linter fixes
megalinter-bot Apr 15, 2024
f61c687
style: apply automated linter fixes
megalinter-bot Apr 15, 2024
9a6d706
update test to cover into_dataloader_with_classes
sibre28 Apr 15, 2024
3db67cd
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 15, 2024
4100e6d
remove inplace modifications of model and reset loss after every epoch
sibre28 Apr 17, 2024
7ecec5b
adjust loss calculation
sibre28 Apr 17, 2024
7833e05
style: apply automated linter fixes
megalinter-bot Apr 17, 2024
b7da6df
loss_sum
sibre28 Apr 17, 2024
c8040ed
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 17, 2024
2501c4c
style: apply automated linter fixes
megalinter-bot Apr 17, 2024
8ecb9fd
fix bug
sibre28 Apr 17, 2024
cbd69f0
style: apply automated linter fixes
megalinter-bot Apr 17, 2024
8222b5f
fix bug
sibre28 Apr 17, 2024
080fe03
Merge remote-tracking branch 'origin/610-improve-fnn-layer-and-model-…
sibre28 Apr 17, 2024
99ec26c
style: apply automated linter fixes
megalinter-bot Apr 17, 2024
2842732
Merge branch 'main' into 610-improve-fnn-layer-and-model-performance-…
sibre28 Apr 17, 2024
02e2996
style: apply automated linter fixes
megalinter-bot Apr 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/safeds/data/tabular/containers/_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -2417,7 +2417,7 @@ def __dataframe__(self, nan_as_null: bool = False, allow_copy: bool = True): #

def _into_dataloader(self, batch_size: int) -> DataLoader:
"""
Return a Dataloader for the data stored in this table, used for training neural networks.
Return a Dataloader for the data stored in this table, used for predicting with neural networks.

The original table is not modified.

Expand Down
6 changes: 5 additions & 1 deletion src/safeds/data/tabular/containers/_tagged_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,7 +899,11 @@ def _into_dataloader(self, batch_size: int) -> DataLoader:
for column_name in row:
new_item.append(row.get_value(column_name))
all_rows.append(new_item.copy())
return DataLoader(dataset=_CustomDataset(np.array(all_rows), np.array(self.target)), batch_size=batch_size)
return DataLoader(
dataset=_CustomDataset(np.array(all_rows), np.array(self.target)),
batch_size=batch_size,
shuffle=True,
)


class _CustomDataset(Dataset):
Expand Down
2 changes: 2 additions & 0 deletions src/safeds/exceptions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
ModelNotFittedError,
NonTimeSeriesError,
PredictionError,
TestTrainDataMismatchError,
UntaggedTableError,
)

Expand Down Expand Up @@ -59,6 +60,7 @@
"ModelNotFittedError",
"NonTimeSeriesError",
"PredictionError",
"TestTrainDataMismatchError",
"UntaggedTableError",
# Other
"Bound",
Expand Down
9 changes: 9 additions & 0 deletions src/safeds/exceptions/_ml.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,15 @@ def __init__(self, reason: str):
super().__init__(f"Error occurred while predicting: {reason}")


class TestTrainDataMismatchError(Exception):
"""Raised when the columns of the table passed to the predict method do not match with the feature columns of the training data."""

def __init__(self) -> None:
super().__init__(
("The column names in the test table do not match with the feature columns names of the training data."),
)


class UntaggedTableError(Exception):
"""Raised when an untagged table is used instead of a TaggedTable in a regression or classification."""

Expand Down
27 changes: 24 additions & 3 deletions src/safeds/ml/nn/_fnn_layer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from torch import nn
from abc import ABC, abstractmethod

from torch import Tensor, nn

from safeds.exceptions import ClosedBound, OutOfBoundsError

Expand All @@ -17,11 +19,30 @@ def __init__(self, input_size: int, output_size: int, activation_function: str):
case _:
raise ValueError("Unknown Activation Function: " + activation_function)

def forward(self, x: float) -> float:
def forward(self, x: Tensor) -> Tensor:
return self._fn(self._layer(x))


class FNNLayer:
class Layer(ABC):
@abstractmethod
sibre28 marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self) -> None:
pass # pragma: no cover

@abstractmethod
def _get_internal_layer(self, activation_function: str) -> _InternalLayer:
pass # pragma: no cover

@property
@abstractmethod
def output_size(self) -> int:
pass # pragma: no cover

@abstractmethod
def _set_input_size(self, input_size: int) -> None:
pass # pragma: no cover


class FNNLayer(Layer):
def __init__(self, output_size: int, input_size: int | None = None):
"""
Create a FNN Layer.
Expand Down
52 changes: 36 additions & 16 deletions src/safeds/ml/nn/_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,23 @@
from torch import Tensor, nn

from safeds.data.tabular.containers import Column, Table, TaggedTable
from safeds.exceptions import ClosedBound, ModelNotFittedError, OutOfBoundsError
from safeds.ml.nn._fnn_layer import FNNLayer
from safeds.exceptions import ClosedBound, ModelNotFittedError, OutOfBoundsError, TestTrainDataMismatchError
from safeds.ml.nn._fnn_layer import Layer


class NeuralNetworkRegressor:
def __init__(self, layers: list):
def __init__(self, layers: list[Layer]):
self._model = _PytorchModel(layers, is_for_classification=False)
self._batch_size = 1
self._is_fitted = False
self._feature_names: None | list[str] = None

def fit(
self,
train_data: TaggedTable,
epoch_size: int = 25,
batch_size: int = 1,
learning_rate: float = 0.001,
callback_on_batch_completion: Callable[[int, float], None] | None = None,
callback_on_epoch_completion: Callable[[int, float], None] | None = None,
) -> Self:
Expand All @@ -37,6 +39,8 @@ def fit(
The number of times the training cycle should be done.
batch_size
The size of data batches that should be loaded at one time.
learning_rate
The learning rate of the neural network.
callback_on_batch_completion
Function used to view metrics while training. Gets called after a batch is completed with the index of the last batch and the overall loss average.
callback_on_epoch_completion
Expand All @@ -57,17 +61,19 @@ def fit(
raise OutOfBoundsError(actual=epoch_size, name="epoch_size", lower_bound=ClosedBound(1))
if batch_size < 1:
raise OutOfBoundsError(actual=batch_size, name="batch_size", lower_bound=ClosedBound(1))
self._feature_names = train_data.features.column_names
copied_model = copy.deepcopy(self)

copied_model._batch_size = batch_size
dataloader = train_data._into_dataloader(copied_model._batch_size)

loss_fn = nn.MSELoss()

optimizer = torch.optim.SGD(copied_model._model.parameters(), lr=0.05)
optimizer = torch.optim.SGD(copied_model._model.parameters(), lr=learning_rate)
loss_sum = 0.0
number_of_batches_done = 0
for epoch in range(epoch_size):
for x, y in dataloader:
for x, y in iter(dataloader):
optimizer.zero_grad()

pred = copied_model._model(x)
Expand Down Expand Up @@ -111,6 +117,10 @@ def predict(self, test_data: Table) -> TaggedTable:
"""
if not self._is_fitted:
raise ModelNotFittedError
if not (sorted(test_data.column_names)).__eq__(
sorted(self._feature_names) if self._feature_names is not None else None,
):
raise TestTrainDataMismatchError
dataloader = test_data._into_dataloader(self._batch_size)
predictions = []
with torch.no_grad():
Expand All @@ -134,17 +144,19 @@ def is_fitted(self) -> bool:


class NeuralNetworkClassifier:
def __init__(self, layers: list[FNNLayer]):
def __init__(self, layers: list[Layer]):
self._model = _PytorchModel(layers, is_for_classification=True)
self._batch_size = 1
self._is_fitted = False
self._is_multi_class = layers[-1].output_size > 1
self._feature_names: None | list[str] = None

def fit(
self,
train_data: TaggedTable,
epoch_size: int = 25,
batch_size: int = 1,
learning_rate: float = 0.001,
callback_on_batch_completion: Callable[[int, float], None] | None = None,
callback_on_epoch_completion: Callable[[int, float], None] | None = None,
) -> Self:
Expand All @@ -161,6 +173,8 @@ def fit(
The number of times the training cycle should be done.
batch_size
The size of data batches that should be loaded at one time.
learning_rate
The learning rate of the neural network.
callback_on_batch_completion
Function used to view metrics while training. Gets called after a batch is completed with the index of the last batch and the overall loss average.
callback_on_epoch_completion
Expand All @@ -181,7 +195,9 @@ def fit(
raise OutOfBoundsError(actual=epoch_size, name="epoch_size", lower_bound=ClosedBound(1))
if batch_size < 1:
raise OutOfBoundsError(actual=batch_size, name="batch_size", lower_bound=ClosedBound(1))
self._feature_names = train_data.features.column_names
copied_model = copy.deepcopy(self)

copied_model._batch_size = batch_size
dataloader = train_data._into_dataloader(copied_model._batch_size)

Expand All @@ -190,11 +206,11 @@ def fit(
else:
loss_fn = nn.BCELoss()

optimizer = torch.optim.SGD(copied_model._model.parameters(), lr=0.05)
optimizer = torch.optim.SGD(copied_model._model.parameters(), lr=learning_rate)
loss_sum = 0.0
number_of_batches_done = 0
for epoch in range(epoch_size):
for x, y in dataloader:
for x, y in iter(dataloader):
optimizer.zero_grad()
pred = copied_model._model(x)
if self._is_multi_class:
Expand Down Expand Up @@ -253,6 +269,10 @@ def predict(self, test_data: Table) -> TaggedTable:
"""
if not self._is_fitted:
raise ModelNotFittedError
if not (sorted(test_data.column_names)).__eq__(
sorted(self._feature_names) if self._feature_names is not None else None,
):
raise TestTrainDataMismatchError
dataloader = test_data._into_dataloader(self._batch_size)
predictions = []
with torch.no_grad():
Expand Down Expand Up @@ -290,27 +310,27 @@ def is_fitted(self) -> bool:


class _PytorchModel(nn.Module):
def __init__(self, fnn_layers: list[FNNLayer], is_for_classification: bool) -> None:
def __init__(self, layers: list[Layer], is_for_classification: bool) -> None:
super().__init__()
self._layer_list = fnn_layers
self._layer_list = layers
internal_layers = []
previous_output_size = None

for layer in fnn_layers:
for layer in layers:
if previous_output_size is not None:
layer._set_input_size(previous_output_size)
internal_layers.append(layer._get_internal_layer(activation_function="relu"))
previous_output_size = layer.output_size

if is_for_classification:
internal_layers.pop()
if fnn_layers[-1].output_size > 2:
internal_layers.append(fnn_layers[-1]._get_internal_layer(activation_function="softmax"))
if layers[-1].output_size > 2:
internal_layers.append(layers[-1]._get_internal_layer(activation_function="softmax"))
else:
internal_layers.append(fnn_layers[-1]._get_internal_layer(activation_function="sigmoid"))
self._pytorch_layers = nn.ModuleList(internal_layers)
internal_layers.append(layers[-1]._get_internal_layer(activation_function="sigmoid"))
self._pytorch_layers = nn.Sequential(*internal_layers)

def forward(self, x: float) -> float:
def forward(self, x: Tensor) -> Tensor:
for layer in self._pytorch_layers:
x = layer(x)
return x
28 changes: 27 additions & 1 deletion tests/safeds/ml/nn/test_model.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pytest
from safeds.data.tabular.containers import Table, TaggedTable
from safeds.exceptions import ModelNotFittedError, OutOfBoundsError
from safeds.exceptions import ModelNotFittedError, OutOfBoundsError, TestTrainDataMismatchError
from safeds.ml.nn import FNNLayer, NeuralNetworkClassifier, NeuralNetworkRegressor


Expand Down Expand Up @@ -87,6 +87,19 @@ def test_should_raise_if_is_fitted_is_set_correctly_for_multiclass_classificatio
)
assert model.is_fitted

def test_should_raise_if_test_and_train_data_mismatch(self) -> None:
model = NeuralNetworkClassifier([FNNLayer(input_size=1, output_size=1), FNNLayer(output_size=3)])
model = model.fit(
Table.from_dict({"a": [1, 0, 2], "b": [0, 15, 5]}).tag_columns("a"),
)
with pytest.raises(
TestTrainDataMismatchError,
match="The column names in the test table do not match with the feature columns names of the training data.",
):
model.predict(
Table.from_dict({"a": [1], "c": [2]}),
)

def test_should_raise_if_fit_doesnt_batch_callback(self) -> None:
model = NeuralNetworkClassifier([FNNLayer(input_size=1, output_size=1)])

Expand Down Expand Up @@ -186,6 +199,19 @@ def test_should_raise_if_is_fitted_is_set_correctly(self) -> None:
)
assert model.is_fitted

def test_should_raise_if__test_and_train_data_mismatch(self) -> None:
model = NeuralNetworkRegressor([FNNLayer(input_size=1, output_size=1)])
model = model.fit(
Table.from_dict({"a": [1, 0, 2], "b": [0, 15, 5]}).tag_columns("a"),
)
with pytest.raises(
TestTrainDataMismatchError,
match="The column names in the test table do not match with the feature columns names of the training data.",
):
model.predict(
Table.from_dict({"a": [1], "c": [2]}),
)

def test_should_raise_if_fit_doesnt_batch_callback(self) -> None:
model = NeuralNetworkRegressor([FNNLayer(input_size=1, output_size=1)])

Expand Down