Code and data related to serendipity recommender paper
Located under data/training
Located under data/training/historical. This folder contains 2 files.
- raw_data.csv - Contains the raw training file generated for all descriptors generated by Escalate
- training_data.csv - Historical training data set used by all models
Located under data/training/initialization There are four folders each corresponding to the 4 amines used to refine the models in the lab
- HJFYRMFYQMIZDG-UHFFFAOYSA-N - Hydroxyphenethyl amine
- JMXLWMIFDJCGBV-UHFFFAOYSA-N - Dimethylammonium Iodide
- NJQKYAASWUGXIT-UHFFFAOYSA-N - 4-Chlorophenylammonium Iodide
- ZKRCWINLLKOVCL-UHFFFAOYSA-N - 4-Chlorophenethylammonium Iodide
Each amine folder contains training draws named training_draw0.csv and training_draw1.csv
Located under data/stateset Just like amine specific initialization, there are four folders corresponding to each amine. Each amine folder contains 3 files:
- stateset.csv - Stateset of all possible concentrations along with their descriptors. This stateset is used during the active learning and final prediction phases
- stateset_volumes.csv - Reagent volumes to combine to get the concentrations defined in stateset.csv, used in the lab
- vertices.csv - Inorganic, organic and acid concentrations that represent the vertices of the explored stateset. Used to plot stateset
Located under data/results/final_plate_observations The results folder contains the observations made in the final prediction plate by all models. There are two subfolders corresponding to the exploitation and serendipity recommenders
Located under /src
Located under src/models. All open source models are provided in this repo. Classification models such as BART, DT, KNN and PLATIPUS are under src/models/classification and regression model such as BGP is under src/model/regression
Code to generate plots used in the paper are placed under src/plot
Code to calculate serendipity is placed under src/recsys. Recommender code is available in preprocess.py and the objective function is available in objectives.py
Located under /notebooks. Presents the code used to generate and present the results used in the paper