Skip to content

Latest commit

 

History

History
52 lines (40 loc) · 1.5 KB

README.md

File metadata and controls

52 lines (40 loc) · 1.5 KB

Fast Correlation-Based Filter

A categorical feature selection approach based on information theoretical considerations.

Implementation of the fast correlation-based filter (FCBF) proposed by Yu and Liu:

@inproceedings{inproceedings,
author = {Yu, Lei and Liu, Huan},
year = {2003},
month = {01},
pages = {856-863},
title = {Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution},
volume = {2},
journal = {Proceedings, Twentieth International Conference on Machine Learning}
}

Data for testing is taken from the UCI Machine Learning Repository. See also notes on the contained lung cancer dataset.

Example

from fcbf import fcbf, data

dataset = data.lung_cancer
X = dataset.loc[:, [dataset.columns[0]] + dataset.columns[2:].tolist()]
y = dataset[dataset.columns[1]].astype(int)
print(X)
print(y)

relevant_features, irrelevant_features, correlations = fcbf(X, y, su_threshold=0.1, base=2)
print('relevant_features:', relevant_features, '(count:', len(relevant_features), ')')
print('irrelevant_features:', irrelevant_features, '(count:', len(irrelevant_features), ')')
print('correlations:', correlations)

Setup

Using pip, execute the following

pip install fcbf

Development

TODO

Contributing

TODO

License

Code is released under the MIT License. All dependencies are copyright to the respective authors and released under the respective licenses.