The process of building an AL model is divided into two steps ---
-
Generating a prior dataset. This step can further be broken two smaller steps. (i) The first involves creating a prior input files, which would contain all the features. The second step involves conducting the simulation for generating the dataset and then performing data extraction. For a single feature, this file would have pressure. For a double features, it would have pressure and temperature both. We would use the code prior.py to do it, and we can generate the prior in three way — a) Boundary-informed prior, b) Linear-spaced LHS, and c) Log-spaced LHS. For details please refer to the original paper. In the prior.py code, give the necessary conditions, this would pressure limits and temperature limits for the LHS-based priors, and the hand-picked pressure and temperature conditions for boundary informed prior. After giving the necesaary conditions, run the code in python and we have a .csv file called 'Prior_test.csv' (ii) Use build-prior-sh to submit the simulation to a remote computing system. At Notre Dame, the cluster is based on grid-engine. For other clusters, necassary changes need to be made. In the build-prior-sh, you need to input the number of sample point you want to generate (this should be equivalent to the number of points in 'Prior_test.csv'). Submit the jobs, and when they are done use extract-uptake-sh script to extract adsorption data from the GCMC simulations. Please note we are using RASPA (open source code) to conduct GCMC simulation. You are welcome to use your own code but necessary changes need to be made here.
-
Conducting Active learning. The entire theory behing the AL has been highlighted in this picture above. For details please refer to the original paper. For this step, we need the code GP.py (which would GP_TP.py for two features), error_estimator.py (one can also copy the lines from error estimator.py and add it to GP.py for convinience). Simply open the GP.py and change the input pressure limits (X_test), temperature limits (limits for 2 features, for single feature put the temperature you want to do AL). The hyper parameters of the GPR can also be changed. In the GP.py code itself, the threshold limit for AL convergence can be change. If the max GP relative error is below this threhold, the GP will output "DONE", and the AL will stop. If not, it would output "NOT DONE" along with the details of the next sampling point. The error_estimator will simply calculate the Mean relative error (MRE), relative error and GP relative error to a csv files. The maximum number of iterations the AL can do it user-define. We have 50 for a single features and 2000 for a double features. (This is based on the array size of X_test).
Also, we have shared all the files in single_feature.zip and double_feature.zip. Along with those, we have also shared three python scripts and all of them correspond to performing AL on methane in Cu-BTC for two features. These codes correspond to generating prior, performing AL and estimating the relative errors. If there are large dependencies in the your system then you can use three python code and create your own shell-based workflow. The AL script can also be used for different application and further can be used for a single feature system.