EPFL Machine Learning (CS411) Project 2 Machine Learning Forcefields (ML-FFs) from Spatial Equivariant Descriptors
- Pian Wan, [email protected]
- Qianjun Xu, [email protected]
- Siyuan Cheng, [email protected]
The training set for our ML-FF can be generated by the calculation results of ab initio methods or state-of-the-art DFT. For the given system configuration, we first generate its atom density function. The potential function V(r) can be computed. Furthermore, a series of coefficients can be obtained by projecting the potential function onto the basic functions constructed by radial basis function R(r) and spherical harmonic. These coefficients finally lead to the invariants I. The model is trained on the processed invariant descriptors and energy labels. For a given system, after being converted into the invariant descriptors, it can be input into the model to get the energy. The derivative of the output energy gives the atom force, which can be used for atom coordinates updating to conduct simulations.
- Python3.10
- Rust (rascaline depends on rust to build)
git clone [email protected]:CS-433/ml-project-2-cross-entropy.git
cd ml-project-2-cross-entropy
pip install --upgrade pip
pip install -r requirements.txt
pip install --extra-index-url https://luthaf.fr/temporary-wheels/ metatensor
pip install git+https://github.com/lab-cosmo/equisolve
pip install git+https://github.com/Luthaf/rascaline
You can access the dataset here. Download all .xyz
and .npz
files and put them into ./dataset
folder.
Because the dataset is small, we include all the datasets in the repository.
python run.py
.
├── README.md
├── config.py
├── configs
│ └── ridge.txt
├── data
│ ├── __init__.py
│ ├── dataloader.py
│ ├── dataset.py
│ └── feature
│ ├── __init__.py
│ ├── coordinate.py
│ ├── descriptor.py
│ └── feature_base.py
├── dataset
│ ├── xe2_50.xyz
│ ├── xe2_50_x.npz
│ ├── xe2_50_y.npz
│ ├── xe3_50.xyz
│ ├── xe3_50_x.npz
│ ├── xe3_50_y.npz
│ ├── xe3_dataset_dft.xyz
│ ├── xe3_dataset_dft_x.npz
│ └── xe3_dataset_dft_y.npz
├── fig
│ ├── method.png
│ └── results.png
├── methods
│ ├── __init__.py
│ ├── base_method.py
│ ├── bayesian_method.py
│ ├── coord_based_1.ipynb
│ ├── coord_based_2.ipynb
│ ├── decision_tree_method.py
│ ├── elasticnet_method.py
│ ├── knn_method.py
│ ├── lasso_lars_method.py
│ ├── lasso_method.py
│ ├── mlp_method.py
│ ├── pca_method.py
│ ├── preprocessing
│ │ ├── __init__.py
│ │ ├── base_lining.py
│ │ ├── base_method.py
│ │ ├── identity.py
│ │ ├── methods_list.py
│ │ ├── normalization.py
│ │ ├── pca.py
│ │ ├── shift.py
│ │ └── standardization.py
│ ├── random_forest_method.py
│ └── ridge_method.py
├── radial_basis.py
├── requirements.txt
├── run.ipynb
├── run.py
└── utils
├── __init__.py
├── energy_util.py
└── visualize.py
The authors thank Philip Robin Loche and Kevin Kazuki Huguenin-Dumittan for their guidance and useful discussions.