Skip to content

Commit

Permalink
🔀 Merge pull request #63 from RasmussenLab/developer
Browse files Browse the repository at this point in the history
♻️ 🎨 📝 Developer
  • Loading branch information
ri-heme authored Jan 2, 2023
2 parents 4d312b8 + 4a42031 commit 17db92c
Show file tree
Hide file tree
Showing 30 changed files with 1,019 additions and 146 deletions.
16 changes: 16 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
version: 2

build:
os: ubuntu-20.04
tools:
python: "3.9"

sphinx:
configuration: docs/source/conf.py

python:
install:
- requirements: docs/requirements.txt
- requirements: requirements.txt
- method: pip
path: .
37 changes: 12 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The code in this repository can be used to run our Multi-Omics Variational
autoEncoder (MOVE) framework for integration of omics and clinical variabels
spanning both categorial and continuous data. Our approach includes training
ensemble VAE models and using *in silico* perturbation experiments to identify
cross omics associations. The manuscript has been accepted and we will provide
cross omics associations. The manuscript has been accepted and we will provide
the link when it is published.

We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
Expand Down Expand Up @@ -68,29 +68,8 @@ MOVE has five-six steps:

## How to run MOVE

You can run the move-dl pipeline from the command line or within a Jupyter
notebook.

You can run MOVE as Python module with the following command. Details on how
to set up the configuration for the data and task can be found our
[tutorial](https://github.com/RasmussenLab/MOVE/tree/main/tutorial) folder.

```bash
>>> move-dl data=[name of data config] task=[name of task config]
```

Feel free to
[open an issue](https://github.com/RasmussenLab/MOVE/issues/new/choose) if you
need any help.

### How to use MOVE with your data

Your data files should be tab separated, include a header and the first column
should be the IDs of your samples. The configuration of MOVE is done using YAML
files that describe the input data and the task specification. These should be
placed in a `config` directory in the working directory. Please see the
[tutorial](https://github.com/RasmussenLab/MOVE/tree/main/tutorial)
for more information.
Please refer to our [**documentation**](https://move-dl.readthedocs.io/) for
examples and tutorials on how to run MOVE.


# Data sets
Expand All @@ -110,5 +89,13 @@ available [here](https://directdiabetes.org).

## Simulated and publicaly available data sets

We have therefore provided two datasets to test the workflow: a simulated
We have therefore provided two datasets to test the workflow: a simulated
dataset and a publicly-available maize rhizosphere microbiome data set.

# Citation

To cite MOVE, use the following information:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of
drug–omics associations in type 2 diabetes with generative deep-learning models.
*Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x
4 changes: 3 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
sphinx==5.3.0
sphinx_rtd_theme=1.1.1
sphinx-rtd-theme
sphinx-autodoc-typehints
sphinxemoji
16 changes: 12 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,23 @@

sys.path.insert(0, str(Path("../src").resolve()))

project = "move-dl"
copyright = "2022, Valentas Brasas, Ricardo Hernandez Medina"
author = "Valentas Brasas, Ricardo Hernandez Medina"
release = "1.0.0"
import move

project = "MOVE"
copyright = "2022, Rasmussen Lab"
author = "Rasmussen Lab"
release = ".".join(map(str, move.__version__))

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosectionlabel",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
"sphinx_autodoc_typehints",
"sphinxemoji.sphinxemoji",
]

templates_path = ["_templates"]
Expand All @@ -32,6 +37,9 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "sphinx_rtd_theme"
html_theme_options = {
"collapse_navigation" : False,
}
html_static_path = []

# -- Napoleon settings --------------------------------------------------------
Expand Down
27 changes: 17 additions & 10 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
.. move-dl documentation master file, created by
sphinx-quickstart on Sat Nov 5 15:48:56 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to move-dl's documentation!
===================================
Welcome to MOVE's documentation!
================================

.. toctree::
:hidden:
:maxdepth: 1
:caption: Contents:

pages/installation
pages/tutorial
pages/api/API
install
method
tutorial/index

MOVE (**m**\ ulti-\ **o**\ mics **v**\ ariational auto\ **e**\ ncoder) is a
framework for integration of omics and other data modalities (including both
categorical and continuous data). Our approach consists of training an ensemble
of VAE (variational autoencoder) models and performing *in silico* perturbation
experiments to identify associations across the different omics datasets.

We invite you to read `our publication`_ presenting this method, or read
about the method :doc:`here</method>`.

.. _`our publication`: https://www.nature.com/articles/s41587-022-01520-x
47 changes: 47 additions & 0 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Install
=======

MOVE is distributed as ``move-dl``, a Python package.

It requires Python 3.9 (or later) and third-party libraries, such as `PyTorch`_
and `Hydra`_. These dependencies will be installed automatically when you
install with ``pip``.

Install the stable version
--------------------------

We recommend installing ``move-dl`` in a fresh virtual environment. If you wish
to learn how to create and manage virtual environments with Conda, please
follow `these instructions`_.

The latest stable version of ``move-dl`` can be installed with ``pip``.

.. code-block:: bash
>>> pip install move-dl
Install the development version
-------------------------------

If you wish to install the development of ``move-dl``, create a new virtual
environment, and do:

.. code-block:: bash
>>> pip install git+https://github.com/RasmussenLab/MOVE@developer
Alternatively, you can clone ``move-dl`` from `GitHub`_ and install by
running the following command from the top-level source directory:

.. code-block:: bash
>>> pip install -e .
The ``-e`` flag installs the project in "editable" mode, so you can follow the
development branch and update your installation by pulling from GitHub.

.. _PyTorch: https://pytorch.org/
.. _Hydra: https://hydra.cc/
.. _GitHub: https://github.com/RasmussenLab/MOVE

.. _these instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html
98 changes: 98 additions & 0 deletions docs/source/method.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
About the method
================

MOVE is based on the VAE (variational autoencoder) model, a deep learning model
that transforms high-dimensional data into a lower-dimensional space (so-called
latent representation). The autoencoder is made up of two neural networks: an
encoder, which compresses the input variables; and a decoder, which tries to
reconstruct the original input from the compressed representation. In doing so,
the model learns the structure and associations between the input variables.

In `our publication`_, we used this type of model to integrate different data
modalities, including: genomics, transcriptomics, proteomics, metabolomics,
microbiomes, medication data, diet questionnaires, and clinical measurements.
Once we obtained a trained model, we exploited the decoder network to identify
cross-omics associations.

Our approach consists of performing *in silico* perturbations of the original
data and using either univariate statistical methods or Bayesian decision
theory to identify significant differences between the reconstruction with or
without perturbation. Thus, we are able to detect associations between the
input variables.

.. _`our publication`: https://www.nature.com/articles/s41587-022-01520-x

.. image:: method/fig1.svg

VAE design
-----------

The VAE was designed to account for a variable number of fully-connected hidden
layers in both encoder and decoder. Each hidden layer is followed by batch
normalization, dropout, and a leaky rectified linear unit (leaky ReLU).

To integrate different modalities, each dataset is reshaped and concatenated
into an input matrix. Moreover, error calculation is done on a dataset
basis: binary cross-entropy for binary and categorical datasets and mean squared
error for continuous datasets. Each error :math:`E_i` is then multiplied by a
given weight :math:`W_i` and added up to form the loss function:

:math:`L = \sum_i W_i E_i + W_\textnormal{KL} D_\textnormal{KL}`

Note that the :math:`D_\textnormal{KL}` (Kullback–Leibler divergence) penalizes
deviance of the latent representation from the standard normal distribution. It
is also subject to a weight :math:`W_\textnormal{KL}`, which warms up as the
model is trained.

Extracting associations
-----------------------

After determining the right set of hyperparameters, associations are extracted
by perturbing the original input data and passing it through an ensemble of
trained models. The reason behind using an ensemble is that VAE models are
stochastic, so we need to ensure that the results we obtain are not a product
of chance.

We perturbed categorical data by changing its value from one category to
another (e.g., drug status changed from "not received" to "received"). Then, we
compare the change between the reconstruction generated from the original data
and the perturbed data. To achieve this, we proposed two approaches: using
*t*\ -test and Bayes factors. Both are described below:

MOVE *t*\ -test
^^^^^^^^^^^^^^^

#. Perturb a variable in one dataset.
#. Repeat 10 times for 4 different latent space sizes:

#. Train VAE model with original data.
#. Obtain reconstruction of original data (baseline reconstruction).
#. Obtain 10 additional reconstructions of original data and calculate
difference from the first (baseline difference).
#. Obtain reconstruction of perturbed data (perturbed reconstruction) and
subtract from baseline reconstruction (perturbed difference).
#. Compute p-value between baseline and perturbed differences with t-test.

#. Correct p-values using Bonferroni method.
#. Select features that are significant (p-value lower than 0.05).
#. Select significant features that overlap in at least half of the refits and
3 out of 4 architectures. These features are associated with the
perturbed variable.

MOVE Bayes
^^^^^^^^^^

#. Perturb a variable in one dataset.
#. Repeat 30 times:

#. Train VAE model with original data.
#. Obtain reconstruction of original data (baseline reconstruction).
#. Obtain reconstruction of perturbed data (perturbed reconstruction).
#. Record difference between baseline and perturbed reconstruction.

#. Compute probability of difference being greater than 0.
#. Compute Bayes factor from probability: :math:`K = \log p - \log (1 - p)`.
#. Sort probabilities by Bayes factor, from highest to lowest.
#. Compute false discovery rate (FDR) as cumulative evidence.
#. Select features whose FDR is above desired threshold (e.g., 0.05). These
features are associated with the perturbed variable.
1 change: 1 addition & 0 deletions docs/source/method/fig1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 0 additions & 2 deletions docs/source/pages/installation.rst

This file was deleted.

2 changes: 0 additions & 2 deletions docs/source/pages/tutorial.rst

This file was deleted.

Loading

0 comments on commit 17db92c

Please sign in to comment.