Skip to content

Commit

Permalink
Updating README
Browse files Browse the repository at this point in the history
  • Loading branch information
ejohnson643 committed Dec 17, 2021
1 parent 0cf80e7 commit ebc5589
Showing 1 changed file with 32 additions and 13 deletions.
45 changes: 32 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To install EMBEDR, we recommend cloning this repository before installing using
`pip` in the main project directory. Specifically:

```bash
pip install .
pip install .
```

The package requires numpy, scikit-learn, scipy, conda, and numba for
Expand All @@ -38,21 +38,50 @@ the t-SNE algorithm. You can install fftw using
## Getting Started

Once you've installed EMBEDR, you can easily generate an embedding colored by
EMBEDR *p*-value by calling the `fit` method in the EMBEDR class as below:
EMBEDR *p*-value by calling the `fit` method in the EMBEDR class as below.

```python
from EMBEDR import EMBEDR, EMBEDR_sweep
import numpy as np

X = np.loadtxt("./data/mnist2500_X.txt").astype(float)

embObj = EMBEDR()
embObj = EMBEDR(project_dir='./')
embObj.fit(X)
embObj.plot()
```

![Example EMBEDR Plot](EasyUseExample.png)

In the example above, we embed 2500 MNIST digits once using t-SNE and we embed
a marginally-resampled null data set once as well. The quality of the data
embedding, based on the correspondence between the neighborhoods of each sample
in the original space and the shown projection, are compared to those expected
to be generated by signalless data (as generated by the null data set). This
comparison results in a "*p*-value," which we use to color the samples in the
embedding. For complete details, see our
[preprint](https://www.biorxiv.org/content/10.1101/2020.11.18.389031v2).

The EMBEDR package primarily works through the `EMBEDR` class object, as in the
example above. Importantly, because EMBEDR generates several embeddings of a
data set (and a generated null data set), the method stores intermediate
results in a project directory. In the example above, the `project_dir`
variable is set to the current working directory, but we recommend that you set
a specified "projects" directory. The default value for `project_dir` is
`./projects/`. To facilitate this organization, a `project_name` parameter can
also be specified. If you don't want to do file caching, set `do_cache=False`
when initializing the EMBEDR object.

Other useful parameters are:
- `DRA`: the dimensionality reduction algorithm; currently only `tSNE` and
`UMAP` are supported.
- `perplexity`/`nearest_neighbors`: Set the algorithm hyperparameters for
t-SNE or UMAP. Defaults are to set these at 10% of the number of samples.
- `n_data_embed` and `n_null_embed`: The number of data and null embeddings to
generate before calculating EMBEDR *p*-values. Defaults are set at 1, but in
practice using 3-10 embeddings is recommended.
For a complete list of options, check the `EMBEDR` class documentation.

## New in Version 2.0

The updated version of the EMBEDR package better facilitates the EMBEDR
Expand Down Expand Up @@ -81,14 +110,4 @@ which they were created has been amended and will be backwards compatible with
previous versions. Objects can now be loaded from any relative path
specification for the project directory.

## To-Do

- Plotting Utility
- Sweep results:
- EES vs hyperparameter (null and data)
- p-Values vs hyperparameter
- EMBEDR results:
- color plot by other metadata / supplied array.
- k-Effective Calculator


0 comments on commit ebc5589

Please sign in to comment.