Serendipity Recommender

Code and data related to serendipity recommender paper

Training data

Located under data/training

Historical training data

Located under data/training/historical. This folder contains 2 files.

raw_data.csv - Contains the raw training file generated for all descriptors generated by Escalate
training_data.csv - Historical training data set used by all models

Amine specific training data

Located under data/training/initialization There are four folders each corresponding to the 4 amines used to refine the models in the lab

HJFYRMFYQMIZDG-UHFFFAOYSA-N - Hydroxyphenethyl amine
JMXLWMIFDJCGBV-UHFFFAOYSA-N - Dimethylammonium Iodide
NJQKYAASWUGXIT-UHFFFAOYSA-N - 4-Chlorophenylammonium Iodide
ZKRCWINLLKOVCL-UHFFFAOYSA-N - 4-Chlorophenethylammonium Iodide

Each amine folder contains training draws named training_draw0.csv and training_draw1.csv

Statset

Located under data/stateset Just like amine specific initialization, there are four folders corresponding to each amine. Each amine folder contains 3 files:

stateset.csv - Stateset of all possible concentrations along with their descriptors. This stateset is used during the active learning and final prediction phases
stateset_volumes.csv - Reagent volumes to combine to get the concentrations defined in stateset.csv, used in the lab
vertices.csv - Inorganic, organic and acid concentrations that represent the vertices of the explored stateset. Used to plot stateset

Results

Located under data/results/final_plate_observations The results folder contains the observations made in the final prediction plate by all models. There are two subfolders corresponding to the exploitation and serendipity recommenders

Source Code

Located under /src

Model code

Located under src/models. All open source models are provided in this repo. Classification models such as BART, DT, KNN and PLATIPUS are under src/models/classification and regression model such as BGP is under src/model/regression

Plotting code

Code to generate plots used in the paper are placed under src/plot