using the MPDS data platform, AiiDA workflows, and CRYSTAL simulation engine.
- get accurate encyclopedic, reference, and benchmarking scientific data
- get vast systematic training data for machine learning
- use the cheap commodity cloud environment (not necessarily the HPC cluster)
- ensure provenance tracking and reproducibility of simulations with AiiDA
The code in this repo requires the aiida-crystal-dft, yascheduler, and mpds-ml-labs Python packages installed. In their turn, they depend on the aiida, mpds_client, and other Python packages.
Thus, installation is as follows (replace pip
with pip3
if needed and mind virtual env):
pip install git+https://github.com/tilde-lab/aiida-crystal-dft
pip install git+https://github.com/tilde-lab/yascheduler
pip install git+https://github.com/mpds-io/mpds-ml-labs
git clone https://github.com/mpds-io/mpds-aiida
pip install mpds-aiida/
Here some reader's AiiDA experience is assumed. Note, since the AiiDA does not support cloud environments, the custom cloud scheduler engine yascheduler should be employed. This scheduler manages the CRYSTAL simulation engine at the cloud VPS instances and encapsulates all the details, concerning the remote computer task submission, queue, and results retrieval, as well as the VPS management. This scheduler runs its own daemon and lives together with the AiiDA at the same machine. However, AiiDA considers it as a remote service, accessible via the ssh
transport, so the command ssh $USER@localhost
should pass. To achieve that, the reader might run e.g.:
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh $USER@localhost
(Note, that the AiiDA should be aware of the ~/.ssh/id_rsa.pub
key file while SSH setup!)
For simplicity the yascheduler
can share the database with AiiDA. Setting up the yascheduler
looks like:
vi /etc/yascheduler/yascheduler.conf
yainit
service yascheduler start
The AiiDA should be set up normally, and the stub remote computer (e.g. cluster: yascheduler
), as well as the stub CRYSTAL code (e.g. codes: Pcrystal
) should be added:
reentry scan
verdi setup
verdi computer setup
verdi computer configure ssh $COMPUTER
verdi computer test $COMPUTER --print-traceback
verdi code setup
Why stub? Because the computer and code management is delegated to the yascheduler
, taking care of the on-demand cloud resources management.
The Gaussian basis sets used by CRYSTAL engine should be added to the AiiDA database. We download the entire basis set library from the CRYSTAL website and save some selected basis sets as *.basis
files using the script scripts/bs_unito_download.py
. Then, in a subfolder with the *.basis
files, one runs:
verdi data crystal_dft uploadfamily --name=$BASIS_FAMILY
or, to add the internal basis sets predefined in CRYSTAL:
verdi data crystal_dft createpredefined
Then the desired name ($BASIS_FAMILY) should be used in the calculation settings inside mpds_aiida/calc_templates
(see below).
The MPDS platform is the main data source for generating the simulation inputs and checking the simulation results. An access to the binary compounds data subset is free, one should login at the MPDS and get the MPDS API key:
export MPDS_KEY=...
(Please do not forget to withdraw i.e. invalidate the API key after finishing the work.)
A template system is used to control the calculation parameters, see the mpds_aiida/calc_templates
subfolder. Note, that the options: resources
template directive makes no sense with our custom cloud scheduler. The cluster
, codes
, and basis_family
template directives have to be specified exactly as defined above.
The following on-demand cloud providers are currently supported (resp. yascheduler
directives given in brackets):
- Hetzner (
hetzner_token
,hetzner_max_nodes
), API token must be issued for a project - Upcloud (
upcloud_login
,upcloud_pass
,upcloud_max_nodes
), API permissions are set in account settings
At the moment of writing, the chosen default Hetzner configuration (CX51) runs a test task for 2-2.5 hours on average and costs EUR 35.88 per month, the chosen default Upcloud configuration (8 cores, 4Gb memory) runs a test task for 1.5 hours on average and costs $89 per month.
More examples are given in the scripts
subfolder.
An operation principle is briefly illustrated below.
Note: this repo is subject to change and presents an ongoing work in progress.
- This code: MIT
- CRYSTAL engine: commercial
The resulting data are available at the MPDS platform, according to the CC BY 4.0 license.
Please, report any issues in the respective repositories: aiida-crystal-dft, yascheduler, mpds-ml-labs, aiida, mpds_client, etc.
The Google Cloud machines need first to be prepared via the web-browser SSH console (note sudo -i
). The file /etc/ssh/sshd_config
should be changed to allow root
user to log in.
The Amazon EC2 machines need first to be accessed with the admin
user (note sudo -i
). Then the file /root/.ssh/authorized_keys
needs to be cleaned to allow root
user to log in.