-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
/
Copy pathclassification.py
357 lines (308 loc) · 16 KB
/
classification.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
"""Extension template for time series classifiers.
Purpose of this implementation template:
quick implementation of new estimators following the template
NOT a concrete class to import! This is NOT a base class or concrete class!
This is to be used as a "fill-in" coding template.
How to use this implementation template to implement a new estimator:
- make a copy of the template in a suitable location, give it a descriptive name.
- work through all the "todo" comments below
- fill in code for mandatory methods, and optionally for optional methods
- do not write to reserved variables: is_fitted, _is_fitted, _X, _y, classes_,
n_classes_, fit_time_, _class_dictionary, _threads_to_use, _tags, _tags_dynamic
- you can add more private methods, but do not override BaseEstimator's private methods
an easy way to be safe is to prefix your methods with "_custom"
- change docstrings for functions and the file
- ensure interface compatibility by sktime.utils.estimator_checks.check_estimator
- once complete: use as a local library, or contribute to sktime via PR
- more details:
https://www.sktime.net/en/stable/developer_guide/add_estimators.html
Mandatory implements:
fitting - _fit(self, X, y)
predicting classes - _predict(self, X)
Optional implements:
data conversion and capabilities tags - _tags
fitted parameter inspection - _get_fitted_params()
predicting class probabilities - _predict_proba(self, X)
Testing - required for sktime test framework and check_estimator usage:
get default parameters for test instance(s) - get_test_params()
copyright: sktime developers, BSD-3-Clause License (see LICENSE file)
"""
# todo: write an informative docstring for the file or module, remove the above
# todo: add an appropriate copyright notice for your estimator
# estimators contributed to sktime should have the copyright notice at the top
# estimators of your own do not need to have permissive or BSD-3 copyright
# todo: uncomment the following line, enter authors' GitHub IDs
# __author__ = [authorGitHubID, anotherAuthorGitHubID]
from sktime.classification.base import BaseClassifier
# todo: add any necessary imports here
# todo: for imports of sktime soft dependencies:
# make sure to fill in the "python_dependencies" tag with the package import name
# import soft dependencies only inside methods of the class, not at the top of the file
# todo: change class name and write docstring
class MyTimeSeriesClassifier(BaseClassifier):
"""Custom time series classifier. todo: write docstring.
todo: describe your custom time series classifier here
Hyper-parameters
----------------
parama : int
descriptive explanation of parama
paramb : string, optional (default='default')
descriptive explanation of paramb
paramc : boolean, optional (default= whether paramb is not the default)
descriptive explanation of paramc
and so on
Components
----------
est : sktime.estimator, BaseEstimator descendant
descriptive explanation of est
est2: another estimator
descriptive explanation of est2
and so on
"""
# optional todo: override base class estimator default tags here if necessary
# these are the default values, only add if different to these.
_tags = {
# packaging info
# --------------
"authors": ["author1", "author2"], # authors, GitHub handles
"maintainers": ["maintainer1", "maintainer2"], # maintainers, GitHub handles
# author = significant contribution to code at some point
# if interfacing a 3rd party estimator, ensure to give credit to the
# authors of the interfaced estimator
# maintainer = algorithm maintainer role, "owner" of the sktime class
# for 3rd party interfaces, the scope is the sktime class only
# specify one or multiple authors and maintainers
# remove maintainer tag if maintained by sktime core team
#
"python_version": None, # PEP 440 python version specifier to limit versions
"python_dependencies": None, # PEP 440 python dependencies specifier,
# e.g., "numba>0.53", or a list, e.g., ["numba>0.53", "numpy>=1.19.0"]
# delete if no python dependencies or version limitations
#
# estimator tags
# --------------
"X_inner_mtype": "numpy3D", # which type do _fit/_predict accept, usually
"y_inner_mtype": "numpy1D", # which type do _fit/_predict return, usually
# this is one of "numpy3D" (instance, variable, time point),
# "pd-multiindex" (row index: instance, time; column index: variable) or other
# machine types, see datatypes/panel/_registry.py for options.
"capability:multivariate": False, # ability to handle multivariate X
"capability:multioutput": False, # ability to predict multiple columns in y
"capability:unequal_length": False,
"capability:missing_values": False,
"capability:train_estimate": False,
"capability:feature_importance": False,
"capability:contractable": False,
"capability:multithreading": False,
}
# todo: add any hyper-parameters and components to constructor
def __init__(self, est, parama, est2=None, paramb="default", paramc=None):
# estimators should precede parameters
# if estimators have default values, set None and initialize below
# todo: write any hyper-parameters and components to self
self.est = est
self.parama = parama
self.paramb = paramb
self.paramc = paramc
# IMPORTANT: the self.params should never be overwritten or mutated from now on
# for handling defaults etc, write to other attributes, e.g., self._parama
# for estimators, initialize a clone, e.g., self.est_ = est.clone()
# leave this as is
super().__init__()
# todo: optional, parameter checking logic (if applicable) should happen here
# if writes derived values to self, should *not* overwrite self.parama etc
# instead, write to self._parama, self._newparam (starting with _)
# todo: default estimators should have None arg defaults
# and be initialized here
# do this only with default estimators, not with parameters
# if est2 is None:
# self.estimator = MyDefaultEstimator()
# todo: if tags of estimator depend on component tags, set these here
# only needed if estimator is a composite
# tags set in the constructor apply to the object and override the class
#
# example 1: conditional setting of a tag
# if est.foo == 42:
# self.set_tags(handles-missing-data=True)
# example 2: cloning tags from component
# self.clone_tags(est2, ["enforce_index_type", "handles-missing-data"])
# todo: implement this, mandatory
def _fit(self, X, y):
"""Fit time series classifier to training data.
private _fit containing the core logic, called from fit
Writes to self:
Sets fitted model attributes ending in "_".
Parameters
----------
X : guaranteed to be of a type in self.get_tag("X_inner_mtype")
if self.get_tag("X_inner_mtype") = "numpy3D":
3D np.ndarray of shape = [n_instances, n_dimensions, series_length]
if self.get_tag("X_inner_mtype") = "pd-multiindex:":
pd.DataFrame with columns = variables,
index = pd.MultiIndex with first level = instance indices,
second level = time indices
for list of other mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb
y : guaranteed to be of a type in self.get_tag("y_inner_mtype")
1D iterable, of shape [n_instances]
or 2D iterable, of shape [n_instances, n_dimensions]
class labels for fitting
if self.get_tag("capaility:multioutput") = False, guaranteed to be 1D
if self.get_tag("capaility:multioutput") = True, guaranteed to be 2D
Returns
-------
self : Reference to self.
"""
# implement here
# IMPORTANT: avoid side effects to X, y
#
# Note: when interfacing a model that has fit, with parameters
# that are not data (X, y) or data-like,
# but model parameters, *don't* add as arguments to fit, but treat as follows:
# 1. pass to constructor, 2. write to self in constructor,
# 3. read from self in _fit, 4. pass to interfaced_model.fit in _fit
# todo: implement this, mandatory
def _predict(self, X):
"""Predict labels for sequences in X.
private _predict containing the core logic, called from predict
Parameters
----------
X : guaranteed to be of a type in self.get_tag("X_inner_mtype")
if self.get_tag("X_inner_mtype") = "numpy3D":
3D np.ndarray of shape = [n_instances, n_dimensions, series_length]
if self.get_tag("X_inner_mtype") = "pd-multiindex:":
pd.DataFrame with columns = variables,
index = pd.MultiIndex with first level = instance indices,
second level = time indices
for list of other mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb
Returns
-------
y : should be of mtype in self.get_tag("y_inner_mtype")
1D iterable, of shape [n_instances]
or 2D iterable, of shape [n_instances, n_dimensions]
predicted class labels
indices correspond to instance indices in X
if self.get_tag("capaility:multioutput") = False, should be 1D
if self.get_tag("capaility:multioutput") = True, should be 2D
"""
# implement here
# IMPORTANT: avoid side effects to X
# todo: consider implementing this, optional
# if you do not implement it, then the default _predict_proba will be called.
# the default simply calls predict and sets probas to 0 or 1.
def _predict_proba(self, X):
"""Predicts labels probabilities for sequences in X.
private _predict_proba containing the core logic, called from predict_proba
State required:
Requires state to be "fitted".
Accesses in self:
Fitted model attributes ending in "_"
Parameters
----------
X : guaranteed to be of a type in self.get_tag("X_inner_mtype")
if self.get_tag("X_inner_mtype") = "numpy3D":
3D np.ndarray of shape = [n_instances, n_dimensions, series_length]
if self.get_tag("X_inner_mtype") = "pd-multiindex:":
pd.DataFrame with columns = variables,
index = pd.MultiIndex with first level = instance indices,
second level = time indices
for list of other mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb
Returns
-------
y : 2D array of shape [n_instances, n_classes] - predicted class probabilities
1st dimension indices correspond to instance indices in X
2nd dimension indices correspond to possible labels (integers)
(i, j)-th entry is predictive probability that i-th instance is of class j
"""
# implement here
# IMPORTANT: avoid side effects to X
# todo: consider implementing this, optional
# implement only if different from default:
# default retrieves all self attributes ending in "_"
# and returns them with keys that have the "_" removed
# if not implementing, delete the method
# avoid overriding get_fitted_params
def _get_fitted_params(self):
"""Get fitted parameters.
private _get_fitted_params, called from get_fitted_params
State required:
Requires state to be "fitted".
Returns
-------
fitted_params : dict with str keys
fitted parameters, keyed by names of fitted parameter
"""
# implement here
#
# when this function is reached, it is already guaranteed that self is fitted
# this does not need to be checked separately
#
# parameters of components should follow the sklearn convention:
# separate component name from parameter name by double-underscore
# e.g., componentname__paramname
# todo: return default parameters, so that a test instance can be created
# required for automated unit and integration testing of estimator
@classmethod
def get_test_params(cls, parameter_set="default"):
"""Return testing parameter settings for the estimator.
Parameters
----------
parameter_set : str, default="default"
Name of the set of test parameters to return, for use in tests. If no
special parameters are defined for a value, will return `"default"` set.
Reserved values for classifiers:
"results_comparison" - used for identity testing in some classifiers
should contain parameter settings comparable to "TSC bakeoff"
Returns
-------
params : dict or list of dict, default = {}
Parameters to create testing instances of the class
Each dict are parameters to construct an "interesting" test instance, i.e.,
`MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance.
`create_test_instance` uses the first (or only) dictionary in `params`
"""
# todo: set the testing parameters for the estimators
# Testing parameters can be dictionary or list of dictionaries
#
# this can, if required, use:
# class properties (e.g., inherited); parent class test case
# imported objects such as estimators from sktime or sklearn
# important: all such imports should be *inside get_test_params*, not at the top
# since imports are used only at testing time
#
# The parameter_set argument is not used for most automated, module level tests.
# It can be used in custom, estimator specific tests, for "special" settings.
# For classification, this is also used in tests for reference settings,
# such as published in benchmarking studies, or for identity testing.
# A parameter dictionary must be returned *for all values* of parameter_set,
# i.e., "parameter_set not available" errors should never be raised.
#
# A good parameter set should primarily satisfy two criteria,
# 1. Chosen set of parameters should have a low testing time,
# ideally in the magnitude of few seconds for the entire test suite.
# This is vital for the cases where default values result in
# "big" models which not only increases test time but also
# run into the risk of test workers crashing.
# 2. There should be a minimum two such parameter sets with different
# sets of values to ensure a wide range of code coverage is provided.
#
# example 1: specify params as dictionary
# any number of params can be specified
# params = {"est": value0, "parama": value1, "paramb": value2}
#
# example 2: specify params as list of dictionary
# note: Only first dictionary will be used by create_test_instance
# params = [{"est": value1, "parama": value2},
# {"est": value3, "parama": value4}]
#
# example 3: parameter set depending on param_set value
# note: only needed if a separate parameter set is needed in tests
# if parameter_set == "special_param_set":
# params = {"est": value1, "parama": value2}
# return params
#
# # "default" params
# params = {"est": value3, "parama": value4}
# return params