Skip to content

Commit

Permalink
added next section to DIY
Browse files Browse the repository at this point in the history
  • Loading branch information
haeussma committed May 6, 2024
1 parent 06e037e commit 91e7cea
Show file tree
Hide file tree
Showing 6 changed files with 116 additions and 7 deletions.
Binary file added docs/figs/data_scheme.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figs/model_chromatopy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 6 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# EnzymeML Training Course

## ℹ️ About
The EnzymeML Training Course is a collaborative project between multiple biocatalysis groups. The aim of this project is to jointly develop computational tools, data standards, and best practices for the analysis of enzyme-catalyzed reactions. Furthermore, the project provides training for Python-based data analysis of experimental data.
The EnzymeML Training Course is a collaborative project between multiple biocatalysis groups. This project aims to jointly develop computational tools, data standards, and best practices for the analysis of enzyme-catalyzed reactions. Furthermore, the project provides training for Python-based data analysis of experimental data.

## 🚀 Motivation
Current state of data analysis and implications on reproducibility and scalability...
The current processing and analysis of catalytic data mainly relies on spreadsheet-based tools. Thus, many manual processing steps are required, which are error-prone and time-consuming. The goal of this project is to develop generic Python tools that allow the processing and analysis of reaction data in Jupyter notebooks. Therefore, the MTPHandler and Chromatopy are developed, enabling the processing of raw data from spectrophotometry and chromatography. These tools act as data harmonization tools, allowing the conversion of raw data into the standardized EnzymeML format while enriching the measured data with metadata on e.g. reaction conditions or catalyst and substrate properties.

![EnzymeML Training Course](figs/data_scheme.png)


## 📚 Stages
The project is structured into three stages, dividing the task of implementing generic analysis workflows for biocatalytic data:

### 1. Making experimental data accessible
Development of generic Python tools, allowing reading in unprocessed data from various analytical sources such as spectrophotomety, chromatograph, or NMR. The tools facilitate the preprocessing of the data, concentration calculation, and the generation of EnzymeML files.
Development of generic Python tools, allowing reading in unprocessed data from various analytical sources such as spectrophotometry, chromatograph, or NMR. The tools facilitate the preprocessing of the data, concentration calculation, and the generation of EnzymeML files.

### 2. Data Analysis
Depending on the needs of the project members, existing Python tools will be used or extended to analyze the data. Experimental questions
Expand Down
104 changes: 104 additions & 0 deletions docs/load_and_manipulate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
In this section, we will go through the basic concepts of Python programming and how to load and manipulate data in Python. Then, we will load chromatographic data and perform basic operations on the data.

## Data Structures in Python

Python has several built-in data structures that can be used to store and manipulate data. The most common data structures are:

- **String**: A string is a collection of characters. Strings are defined by single quotes `''` or double quotes `""`. Strings can be used to e.g. define the path to a file, or to assign a literal value to a variable.
```python
# Defining a name as a string
reactant_name = "D-Glucose"

# Defining a file path as a string
file_path = "data/2021-01-01_experiment.txt"
```

- **Integer / Float**: An integer is a whole number, positive or negative, without decimals, of unlimited length. Integers or floats can be used to store numerical values e.g. the initial concentration of a reactant or the temperature of a reaction.
```python
# Defining the initial concentration
init_conc = 3.2

# Defining the temperature
temperature = 37
```

- **List**: A list is a collection of items that are ordered and changeable. Lists are defined by square brackets `[]`. Lists can be used to e.g. store the reaction time points.
```python
# Defining a list of time points
reaction_time = [0, 5, 10, 15, 20]
```

- **Dictionary**: A dictionary is a collection of items that are unordered, changeable, and indexed. Dictionaries are defined by curly brackets `{}`. Dictionary items are in the form of key-value pairs. Dictionaries can be used to store metadata about the reaction conditions.
```python
# Defining a dictionary of reaction conditions
reaction_conditions = {
"temperature": 37,
"temp_unit": "C",
"pH": 7.0,
"init_conc": 3.2,
"conc_unit": "mmol/l"
}
```


The defined data structure can be printed using the `print()` function for visualization. For example:
```python
print(reactant_name)
print(file_path)
print(reaction_time)
```
results in:
```text
> D-Glucose
> data/2021-01-01_experiment.txt
> [0, 5, 10, 15, 20]
```

!!! Example
Open a Jupyter Notebook eighter via the EnzymeML-Lab Docker container or with a different Python environment. And define the data structures mentioned above. Print the defined data structures.


## Manipulation of data in a `List`

Lists are the most important data structure to organize reaction data of different time series or experimental series. The following operations can be performed on lists:

- **Accessing Elements**: Elements in a list can be accessed by their index. The index starts at 0 for the first element.
```python
# Accessing the first element of the list
print(reaction_time[0])
```

- **Appending Elements**: Elements can be added to the end of a list using the `append()` method.
```python
# Appending a new time point to the list
reaction_time.append(100)
print(reaction_time)
```
```text
> [0, 5, 10, 15, 20, 100]
```

- **Removing Elements**: Elements can be removed from a list using the `remove()` method.
```python
# Removing the element 5 from the list
reaction_time.remove(5)
print(reaction_time)
```
```text
> [0, 10, 15, 20, 100]
```

!!! Example
Define a list of time points and perform the operations mentioned above in various ways. Print the list after each operation to see the changes.

## Objects in Python

In Python, everything is an object. An object is an instance of a class, and a class is a blueprint for objects. Objects can have attributes (data) and methods (functions). This is the basis of object-oriented programming as well as the basis for using and manipulating data with EnzymeML tools. All EnzymeML tools are based on objects that can be used to load, manipulate, and analyze data. In the following diagram, the object model of Chromatopy is visualized. It shows the relations and connections between the different objects. Each object has attributes, which are variables of basic data types (e.g. strings, integers, lists) or other objects. If an attribute can occur multiple times, it is represented as a list. In the diagram, list attributes are marked with `[0..*]`.

![EnzymeML Object Model](figs/model_chromatopy.png){: style=height:700px; width:100px;}

## Loading and process chromatographic data
!!! Example
In the next step, we will apply the introduced concepts to process chromatographic data. Therefore, load the [`shaimadzu-example`](https://github.com/haeussma/shimadzu-example) dataset and follow the steps to load, annotate, calibrate, visualize, and export chromatographic data in the prepared `calibration.ipynb` notebook.

If in doubt about the steps to start the EnzymeML-Lab Docker container, or how to load the dataset, please refer to the [Getting Started](getting_started.md) section.
9 changes: 5 additions & 4 deletions docs/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ The conceptualized analysis workflow consists of the following steps, which are
- [x] Creation of standard curves
- [x] Use of calibration curves for concentration calculation
- [x] Export of data to EnzymemML
- [ ] Export of data to CSV
- [x] Export of data to CSV
- [ ] Export of data to EnzymemML


## Implementation status [ChromatoPy](https://github.com/FAIRChemistry/chromatopy)
**Supported Systems**
Expand All @@ -34,7 +36,6 @@ The conceptualized analysis workflow consists of the following steps, which are
- [x] Read in data
- [x] Peak detection/integration
- [x] Extract peaks of given retention time from many spectra
- [ ] Concentration calculation via internal standard
- [ ] Concentration calculation via external standard
- [x] Concentration calculation via external standard
- [x] Export of data to CSV
- [ ] Export of data to EnzymemML
- [ ] Export of data to CSV
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ nav:
- Home: index.md
- DIY:
- Getting started: getting_started.md
- Python basics and manipulation of data: load_and_manipulate.md
- Project Status: status.md
- Installations:
- Docker: docker_installation.md
Expand Down

0 comments on commit 91e7cea

Please sign in to comment.