Nearly every part of ATM is configurable. For example, you can specify which machine-learning algorithms ATM should try, which metrics it computes (such as F1 score and ROC/AUC), and which method it uses to search through the space of hyperparameters (using another HDI Project library, BTB). You can also constrain ATM to find the best model within a limited amount of time or by training a limited amount of total models.
The atm.ATM
class accepts a series of arguments to configure the environment where
ATM is run.
These arguments specify the database configuration, including which type of database is being used and how to connect to it.
The arguments for SQL are:
dialect
: type of the sql database. Choices are sqlite or mysql. The default issqlite
.database
: name or path of the database. The default isatm.db
.username
: username for the database to be used. The default isNone
.password
: password for the username. The default isNone
.host
: IP adress or 'localhost' to where the connection is going to be established. The default isNone
.port
: Port number of where the database is listening, default isNone
.query
: additional query to be executed for the login process, default isNone
.
An example of creating an instance with mysql
database:
from atm import ATM
atm = ATM(
dialect='mysql',
database='atm',
username='admin',
password='password',
host='localhost',
port=3306
)
The following arguments specify the AWS configuration. Bear in mind that
you can have the access_key
and secret_key
already configured on your machine if you follow
the steps here.
Boto3 will use
them by default, however if you specify them during instantiation, this will be the ones used.
access_key
: aws access key id provided from amazon.secret_key
: aws secret key provided from amazon.s3_bucket
: S3 bucket to be used to store the models and metrics.s3_folder
: folder inside the bucket where the models and metrics will be saved.
Note all this arguments are None
by default, and they should be passed as a string
.
An exmaple of creating an instance with aws
configuration is:
from atm import ATM
atm = ATM(
access_key='my_aws_key_id',
secret_key='my_aws_secret_key',
s3_bucket='my_bucket',
s3_folder='my_folder'
)
The following arguments specify the configuration to where the models and metrics will be stored and if we would like a verbose version of the metrics.
models_dir
: local folder where the models should be saved, default ismodels
.metrics_dir
: local folder where the models should be saved, default ismetrics
.verbose_metrics
: whether or not to store verbose metrics, default isFalse
.
An example of creating an instance with log
configuration is:
from atm import ATM
atm = ATM(
models_dir='my_path_to_models',
metrics_dir='my_path_to_metrics',
verbose_metrics=True
)
The following arguments are used to specify the dataset
creation inside the database.
-
train_path
: local path, URL or S3 bucket URL, to a CSV file that follows the Data Format and specifies the traininig data for the models. -
test_path
: local path, URL or S3 bucket URL, to a CSV file that follows the Data Format and specifies the test data for the models, if this isNone
the training data will be splited in train and test. -
name
: a name for thedataset
, if it's not set anmd5
will be generated from the path. -
description
: short description about the dataset, default isNone
. -
class_column
: name of the column that is being the target of our predictions, default isclass
.
An example of using this arguments in our atm.run
method is:
from atm import ATM
atm = ATM()
results = atm.run(
train_path='path/to/train.csv',
test_path='path/to/test.csv',
name='test',
description='Test data',
class_column='test_column'
)
The following arguments are used to specify the datarun
creation inside the database. This
configuration it's important for the behaviour and metrics of our classifiers
.
-
budget
: amount ofclassifiers
or amount ofminutes
to run, typeint
. -
budget_type
: Type of thebudget
, by default it'sclassifier
, can be changed towalltime
, typestr
. -
gridding
: Gridding factor, by default set to0
which means that no gridding will be performed, typeint
. -
k_window
: Number of previous scores considered byk selector
methods. Default is3
, typeint
. -
methods
: Method or a list of methods to use for classification. Each method can either be one of the pre-defined method codes listed below or a path to a JSON file defining a custom method. Default is['logreg', 'dt', 'knn']
, type isstr
or alist
like. A complete list of the default choices in ATM are:- logreg
- svm
- sgd
- dt
- et
- rf
- gnb
- mnb
- bnb
- gp
- pa
- knn
- mlp
- ada
-
metric
: Metric by which ATM should evaluate the classifiers. The metric function specified here will be used to compute the judgment metric for each classifier. Defaultmetric
is set tof1
, typestr
. The rest of metrics that we support at the moment are as follow:roc_auc_micro
rank_accuracy
f1_micro
accuracy
roc_auc_macro
ap
cohen_kappa
f1
f1_macro
mcc
-
r_minimum
: Number of random runs to perform before tuning can occur. Default value is2
, typeint
. -
run_per_partition
: If true, generate a new datarun for each hyperpartition. Default isFalse
, typebool
. -
score_target
: Determines which judgment metric will be used to search the hyperparameter space.cv
will use the mean cross-validated performance,test
will use the performance on a test dataset, andmu_sigma
will use the lower confidence bound on the CV performance. Default iscv
, typestr
. -
priority
: the priority for this datarun, the higher value is the most important. -
selector
: Type of BTB selector to use. A list of them at the moment is[uniform, ucb1, bestk, bestkvel, purebestkvel, recentk, hieralg]
. Default is set touniform
, typestr
. -
tuner
: Type of BTB tuner to use. A list of them at the moment is[uniform, gp, gp_ei, gp_eivel]
. Default is set touniform
, typestr
.
An example using atm.run
method with this arguments is:
from atm import ATM
atm = ATM()
results = atm.run(
budget=200,
budget_type='classifier',
gridding=0,
k_window=4,
metric='f1_macro',
methods=['logreg']
r_minimum=2,
run_per_partition=True,
score_target='cv',
priority=9,
selector='uniform',
tuner='uniform',
deadline=None,
)
If you would like to use the system for your own dataset, convert your data to a csv file with the specified data format.
Once having your dataset ready to use, you will simply have to provide the path to this CSV in one of the supported formats (local path, URL, or a complete AWS S3 Bucket path). Bear in mind that if you specify an S3 Bucket path, the propper access keys should be configured.
ATM uses a database to store information about datasets, dataruns and classifiers. It's currently compatible with the SQLite3 and MySQL dialects.
For first-time and casual users, the SQLite3 is used by default without any required step from the user.
However, if you're planning on running large, distributed, or performance-intensive jobs, you might prefer using MySQL.
If you do not have a MySQL database already prepared, you can follow the next steps in order install it and parepare it for ATM:
sudo apt-get install mysql-server
In the latest versions of MySQL no input for the user is required for this step, but in older versions the installation process will require the user to input a password for the MySQL root user.
If this happens, keep track of the password that you set, as you will need it in the next step.
If no password was required during the installation of MySQL, you should be able to log in with the following command.
sudo mysql
If a MySQL Root password was required, you will need to execute this other command:
sudo mysql -u root -p
and input the password that you used during the installation when prompted.
Once you are logged in, execute the following three commands to create a database
called atm
and a user also called atm
with write permissions on it:
$ mysql> CREATE DATABASE atm;
$ mysql> CREATE USER 'atm'@'localhost' IDENTIFIED BY 'set-your-own-password-here';
$ mysql> GRANT ALL PRIVILEGES ON atm.* TO 'atm'@'localhost';
After you have executed the previous three commands and exited the mysql prompt, you can test your settings by executing the following command and inputing the password that you used in the previous step when prompted:
mysql -u atm -p