Updating README

ejohnson643 · Dec 17, 2021 · ebc5589 · ebc5589
1 parent 0cf80e7
commit ebc5589
Showing 1 changed file with 32 additions and 13 deletions.
diff --git a/README.md b/README.md
@@ -24,7 +24,7 @@ To install EMBEDR, we recommend cloning this repository before installing using
 `pip` in the main project directory.  Specifically:
 
 ```bash
-    pip install .
+pip install .
 ```
 
 The package requires numpy, scikit-learn, scipy, conda, and numba for
@@ -38,21 +38,50 @@ the t-SNE algorithm.  You can install fftw using
 ## Getting Started
 
 Once you've installed EMBEDR, you can easily generate an embedding colored by
-EMBEDR *p*-value by calling the `fit` method in the EMBEDR class as below:
+EMBEDR *p*-value by calling the `fit` method in the EMBEDR class as below.  
 
 ```python
 from EMBEDR import EMBEDR, EMBEDR_sweep
 import numpy as np
 
 X = np.loadtxt("./data/mnist2500_X.txt").astype(float)
 
-embObj = EMBEDR()
+embObj = EMBEDR(project_dir='./')
 embObj.fit(X)
 embObj.plot()
 ```
 
 ![Example EMBEDR Plot](EasyUseExample.png)
 
+In the example above, we embed 2500 MNIST digits once using t-SNE and we embed
+a marginally-resampled null data set once as well.  The quality of the data
+embedding, based on the correspondence between the neighborhoods of each sample
+in the original space and the shown projection, are compared to those expected
+to be generated by signalless data (as generated by the null data set).  This
+comparison results in a "*p*-value," which we use to color the samples in the
+embedding.  For complete details, see our
+[preprint](https://www.biorxiv.org/content/10.1101/2020.11.18.389031v2).
+
+The EMBEDR package primarily works through the `EMBEDR` class object, as in the
+example above.  Importantly, because EMBEDR generates several embeddings of a
+data set (and a generated null data set), the method stores intermediate
+results in a project directory.  In the example above, the `project_dir`
+variable is set to the current working directory, but we recommend that you set
+a specified "projects" directory.  The default value for `project_dir` is
+`./projects/`.  To facilitate this organization, a `project_name` parameter can
+also be specified.  If you don't want to do file caching, set `do_cache=False`
+when initializing the EMBEDR object.
+
+Other useful parameters are:
+- `DRA`: the dimensionality reduction algorithm; currently only `tSNE` and
+  `UMAP` are supported.
+- `perplexity`/`nearest_neighbors`: Set the algorithm hyperparameters for
+  t-SNE or UMAP.  Defaults are to set these at 10% of the number of samples.
+- `n_data_embed` and `n_null_embed`: The number of data and null embeddings to
+  generate before calculating EMBEDR *p*-values.  Defaults are set at 1, but in
+  practice using 3-10 embeddings is recommended.
+For a complete list of options, check the `EMBEDR` class documentation.
+
 ## New in Version 2.0
 
 The updated version of the EMBEDR package better facilitates the EMBEDR 
@@ -81,14 +110,4 @@ which they were created has been amended and will be backwards compatible with
 previous versions.  Objects can now be loaded from any relative path
 specification for the project directory.
 
-## To-Do
-
-- Plotting Utility
-    - Sweep results:
-        - EES vs hyperparameter (null and data)
-        - p-Values vs hyperparameter
-    - EMBEDR results:
-        - color plot by other metadata / supplied array.
-- k-Effective Calculator
-