Skip to content

Categorical feature selection based on information theoretical considerations

License

Notifications You must be signed in to change notification settings

m-martin-j/fcbf

Repository files navigation

Fast Correlation-Based Filter

A categorical feature selection approach based on information theoretical considerations.

Implementation of the fast correlation-based filter (FCBF) proposed by Yu and Liu:

@inproceedings{inproceedings,
author = {Yu, Lei and Liu, Huan},
year = {2003},
month = {01},
pages = {856-863},
title = {Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution},
volume = {2},
journal = {Proceedings, Twentieth International Conference on Machine Learning}
}

Data for testing is taken from the UCI Machine Learning Repository. See also notes on the contained lung cancer dataset.

Example

from fcbf import fcbf, data

dataset = data.lung_cancer
X = dataset.loc[:, [dataset.columns[0]] + dataset.columns[2:].tolist()]
y = dataset[dataset.columns[1]].astype(int)
print(X)
print(y)

relevant_features, irrelevant_features, correlations = fcbf(X, y, su_threshold=0.1, base=2)
print('relevant_features:', relevant_features, '(count:', len(relevant_features), ')')
print('irrelevant_features:', irrelevant_features, '(count:', len(irrelevant_features), ')')
print('correlations:', correlations)

Setup

Using pip, execute the following

pip install fcbf

Development

TODO

Contributing

TODO

License

Code is released under the MIT License. All dependencies are copyright to the respective authors and released under the respective licenses.

About

Categorical feature selection based on information theoretical considerations

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages