Skip to content

Commit

Permalink
Sequence network features (#89)
Browse files Browse the repository at this point in the history
* addded example notbook from last demo

* ok

* new features for threshold and cytoscape

* added new threshold settings for networkx

* API update

* fixed issues

* fixed naming of settings

* adjusted biopython dependencies

* fixed lattest issue but changed create graph

* update the doc

* updated docs and formatted code

* adde dnew docs

* changed struc of naming_dic, removed it

* fixed typo

---------

Co-authored-by: sdRDM Bot <[email protected]>
Co-authored-by: Niklas Abraham - INFlux <[email protected]>
Co-authored-by: max <[email protected]>
  • Loading branch information
4 people authored Aug 14, 2024
1 parent 4eb2baf commit 4610e89
Show file tree
Hide file tree
Showing 11 changed files with 241 additions and 121 deletions.
143 changes: 116 additions & 27 deletions docs/examples/network.ipynb

Large diffs are not rendered by default.

94 changes: 38 additions & 56 deletions docs/quick_start/networks.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,64 @@
# Creating Sequence Networks

A `SequenceNetwork` is created using a list of `PairwiseAlignment` objects, and a list of `AbstractSequences`. Additionally, the way the network is constructed is influenced by the `weight` and `threshold` attributes. The `weight` determines which attribute of the `PairwiseAlignment` object is used to calculate the distance between the sequences of the network. The `threshold` determines the minimum value of the `weight` attribute for an edge to be created. Furthermore, a `color` can be determined based on the attributes of an `AbstractSequence` object in which the nodes of the network will be colored.
A `SequenceNetwork` object can be created using a list of ProteinRecord objects. Those sequences are then used to create a alignment, based on this alignment each ProteinRecord objects represents a node. The edges can than be created based on a weight, e.g. 'identity', but custom weights can be introduced.
With the threshold mode and the threshold value, a threshold is then set to for example hide all edges with an identity score below 0.8.
The final network can be visualized and also loaded into cytoscape for further settings. Moreover it also can be used in maplotlib to plot, if intrested in custom styles.

## Visualization

=== "2D"

```py
from pyeed.core import ProteinInfo, Alignment
from pyeed.aligners import PairwiseAligner
from pyeed.network import SequenceNetwork

# Accessions from different methionine adenyltransferases
``` py
mat_accessions = [
"MBP1912539.1",
"SEV92896.1",
"MBO8174569.1",
"WP_042680787.1",
"NPA47376.1",
"WP_167889085.1",
"WP_048165429.1",
"ACS90033.1",
]
mats = ProteinInfo.get_ids(mat_accessions)

# Create pairwise alignments between all sequences
alignments = Alignment.from_sequences(mats, aligner=PairwiseAligner)

# Create a network
mats = ProteinRecord.get_ids(mat_accessions)
# Create network
network = SequenceNetwork(
sequences=mats,
pairwise_alignments=alignments,
weight="identity",
threshold=0.9,
dimensions=2,
color="taxonomy_id",
)

# Visualize the network
network.create_graph()
network.visualize()
```
```

=== "3D"
Exporting the network in cytoscape is done the following way:
``` py
import py4cytoscape as p4c

```py
from pyeed.core import ProteinInfo, Alignment
from pyeed.aligners import PairwiseAligner
from pyeed.network import SequenceNetwork
# transfer the network to cytoscape
netowork.create_cytoscape_graph(
threshold=0.75,
column_name="class",
)

# Accessions from different methionine adenyltransferases
mat_accessions = [
"MBP1912539.1",
"SEV92896.1",
"MBO8174569.1",
"WP_042680787.1",
"NPA47376.1",
"WP_167889085.1",
"WP_048165429.1",
"ACS90033.1",
]
mats = ProteinInfo.get_ids(mat_accessions)
# plot the network
p4c.notebook_export_show_image()
```

# Create pairwise alignments between all sequences
alignments = Alignment.from_sequences(mats, aligner=PairwiseAligner)
The networkx object on which SequenceNetwork is based on can be extracted using the following command.

# Create a network
network = SequenceNetwork(
sequences=mats,
pairwise_alignments=alignments,
weight="identity",
threshold=0.9,
dimensions=3,
color="taxonomy_id",
)
``` py
graph_network = network.network
```

# Visualize the network
network.visualize()
```
This then could be used to plot using matplotlib.

``` py

import networkx as nx
import matplotlib.pyplot as plt

# Plotting of the network
pos = nx.spring_layout(graph_network, weight='identity', iterations=100, seed=18)
plt.figure(figsize=(19,9))
nx.draw_networkx(graph_network, pos=pos, with_labels=True, node_color=c, node_size=s,
font_color='Black',font_size='6', font_weight='bold', edge_color='grey', alpha=0.5, width=1)
plt.axis('off')
plt.show()
```

## Network Analysis

Expand Down
1 change: 1 addition & 0 deletions pyeed/core/abstractannotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ class AbstractAnnotation(
_repo: Optional[str] = PrivateAttr(default="https://github.com/PyEED/pyeed")
_commit: Optional[str] = PrivateAttr(
default="72d2203f2e3ce4b319b29fa0d2f146b5eead7b00"

)

_raw_xml_data: Dict = PrivateAttr(default_factory=dict)
Expand Down
1 change: 1 addition & 0 deletions pyeed/core/blastdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ class BlastData(
_repo: Optional[str] = PrivateAttr(default="https://github.com/PyEED/pyeed")
_commit: Optional[str] = PrivateAttr(
default="72d2203f2e3ce4b319b29fa0d2f146b5eead7b00"

)

_raw_xml_data: Dict = PrivateAttr(default_factory=dict)
Expand Down
1 change: 1 addition & 0 deletions pyeed/core/numberedsequence.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ class NumberedSequence(

_repo: Optional[str] = PrivateAttr(default="https://github.com/PyEED/pyeed")
_commit: Optional[str] = PrivateAttr(

default="72d2203f2e3ce4b319b29fa0d2f146b5eead7b00"
)

Expand Down
1 change: 1 addition & 0 deletions pyeed/core/region.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ class Region(

_repo: Optional[str] = PrivateAttr(default="https://github.com/PyEED/pyeed")
_commit: Optional[str] = PrivateAttr(

default="72d2203f2e3ce4b319b29fa0d2f146b5eead7b00"
)

Expand Down
1 change: 1 addition & 0 deletions pyeed/core/sequence.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ class Sequence(

_repo: Optional[str] = PrivateAttr(default="https://github.com/PyEED/pyeed")
_commit: Optional[str] = PrivateAttr(

default="72d2203f2e3ce4b319b29fa0d2f146b5eead7b00"
)

Expand Down
110 changes: 75 additions & 35 deletions pyeed/network/network.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
from requests import RequestException

from pyeed.align.pairwise import PairwiseAligner
from pyeed.core.proteinrecord import ProteinRecord
from pyeed.core.sequencerecord import SequenceRecord

plt.rcParams["figure.dpi"] = 300
Expand Down Expand Up @@ -51,6 +50,11 @@ class Config:
description="List of selected sequences",
)

edge_data: Optional[dict] = Field(
default=None,
description="Dictionary with edge data",
)

network: Optional[nx.Graph] = Field(
default=nx.Graph(),
description="Network graph with networkx",
Expand All @@ -65,18 +69,23 @@ class Config:
)

_cytoscape_url: Optional[str] = PrivateAttr(
default="http://cytoscape-desktop:1234/v1",
default="http://cytoscape:1234/v1",
)

def model_post_init(self, __context):
def post_init(self):
self._create_graph()

def add_to_targets(self, target: SequenceRecord):
if target.id not in self.targets:
self.targets.append(target.id)

def _create_graph(self):
"""Initializes the nx.Graph object and adds nodes and edges based on the sequences."""
"""
Initializes the nx.Graph object and adds nodes and edges based on the sequences.
Parameters:
edge_data: optional data frame for the edges, one could add here special parameters for the edges, the type is a dic the first entry is the key id and then a nested dic with the key id and then the data dic (has to all combines for the edges)
"""

sequences = {}

Expand All @@ -92,9 +101,6 @@ def _create_graph(self):
if "mol_weight" in seq_dict:
node_dict["mol_weight"] = seq_dict.pop("mol_weight")




sequences[seq_id] = sequence.sequence
self._full_network.add_node(seq_id, **node_dict)

Expand All @@ -110,6 +116,10 @@ def _create_graph(self):
alignment_result["sequences"][1]["id"],
{key: value for key, value in alignment_result.items()},
)
# here we could add the data from the data_Edge
if self.edge_data is not None:
for key, data_item in self.edge_data[edge[0]][edge[1]].items():
edge[2][key] = data_item
if edge:
edge_data.append(edge)

Expand All @@ -121,17 +131,25 @@ def _create_graph(self):
self.network = copy.deepcopy(self._full_network)
self.calculate_centrality()

def update_threshhold(self, threshold: float):
def update_threshhold(
self, threshold: float, threshold_mode: str = "UNDER_THRESHOLD"
):
"""Removes or adds edges based on the threshold value."""

assert 0 <= threshold <= 1, "Threshold must be between 0 and 1"
if self.weight == "identity":
assert 0 <= threshold <= 1, "Threshold must be between 0 and 1"

edge_pairs_below_threshold = []
network = copy.deepcopy(self._full_network)
edge_pairs_below_threshold = [
(node1, node2)
for node1, node2, data in network.edges(data=True)
if data["identity"] < threshold
]

for node1, node2, data in network.edges(data=True):
if threshold_mode == "UNDER_THRESHOLD":
if data[self.weight] < threshold:
edge_pairs_below_threshold.append((node1, node2))
elif threshold_mode == "ABOVE_THRESHOLD":
if data[self.weight] > threshold:
edge_pairs_below_threshold.append((node1, node2))

network.remove_edges_from(edge_pairs_below_threshold)
self._2d_position_nodes_and_edges(network)

Expand Down Expand Up @@ -169,7 +187,8 @@ def create_cytoscape_graph(
layout: str = "force-directed",
threshold: float = 0.8,
style_name: str = "default",
column_name: str = "domain",
column_name: str = "genus",
threshold_mode: str = "UNDER_THRESHOLD",
):
try:
p4c.cytoscape_ping(base_url=self._cytoscape_url)
Expand All @@ -181,19 +200,25 @@ def create_cytoscape_graph(
base_url=self._cytoscape_url
), "Cytoscape is not running in the background"

p4c.layout_network(layout, base_url=self._cytoscape_url)
# p4c.layout_network(layout, base_url=self._cytoscape_url, network="SequenceNetwork")

# create a degree column for the nodes based on the current chosen threshold
self.calculate_degree(threshold=threshold)
self.calculate_degree(threshold=threshold, threshold_mode=threshold_mode)
# filter the the edges by the threshold
p4c.create_network_from_networkx(
self._full_network,
collection="SequenceNetwork",
base_url=self._cytoscape_url,
title="SequenceNetwork",
)

self._hide_under_threshold(threshold)
p4c.layout_network("grid", base_url=self._cytoscape_url)
self.hide_threshold(threshold, threshold_mode=threshold_mode)
p4c.set_layout_properties(
"force-directed", {"defaultSpringLength": 70, "defaultSpringCoefficient": 2}
)
p4c.layout_network(
layout, base_url=self._cytoscape_url, network="SequenceNetwork"
)

df_nodes = p4c.get_table_columns(table="node", base_url=self._cytoscape_url)

Expand Down Expand Up @@ -235,30 +260,47 @@ def export_cytoscape_graph(self, file_path: str):
file.write(str(cyt_dict))
print(f"💾 Network exported to {file_path}")

def _hide_under_threshold(self, threshold):
def hide_threshold(self, threshold, threshold_mode: str = "UNDER_THRESHOLD"):
p4c.unhide_all(base_url=self._cytoscape_url)

hide_list = []

for u, v, d in self._full_network.edges(data=True):
if d["identity"] < threshold:
hide_list.append("{} (interacts with) {}".format(u, v))
if threshold_mode == "UNDER_THRESHOLD":
if d[self.weight] < threshold:
hide_list.append("{} (interacts with) {}".format(u, v))
elif threshold_mode == "ABOVE_THRESHOLD":
if d[self.weight] > threshold:
hide_list.append("{} (interacts with) {}".format(u, v))

p4c.hide_edges(hide_list, base_url=self._cytoscape_url)

def calculate_degree(self, threshold: float = 0.8):
def calculate_degree(
self, threshold: float = 0.8, threshold_mode: str = "UNDER_THRESHOLD"
):
# Calculate degree of nodes with filtering
degree = {}
for u, v, d in self._full_network.edges(data=True):
if d["identity"] > threshold:
if u not in degree:
degree[u] = 1
else:
degree[u] += 1
if v not in degree:
degree[v] = 1
else:
degree[v] += 1
if threshold_mode == "UNDER_THRESHOLD":
if d[self.weight] > threshold:
if u not in degree:
degree[u] = 1
else:
degree[u] += 1
if v not in degree:
degree[v] = 1
else:
degree[v] += 1
elif threshold_mode == "ABOVE_THRESHOLD":
if d[self.weight] <= threshold:
if u not in degree:
degree[u] = 1
else:
degree[u] += 1
if v not in degree:
degree[v] = 1
else:
degree[v] += 1

nx.set_node_attributes(
self._full_network, degree, "degree_with_threshold_{}".format(threshold)
Expand Down Expand Up @@ -364,9 +406,7 @@ def visualize(
color_labels = list(set(color_labels))
colors = self._sample_colorscale(len(set(color_labels)))

color_dict = dict(
zip(color_labels, colors)
)
color_dict = dict(zip(color_labels, colors))

color_list = []
for node in self.network.nodes.values():
Expand Down
1 change: 1 addition & 0 deletions pyeed/tools/clustalo.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def run_service(self, data) -> httpx.Response:
return httpx.post(self._service_url, files=file, timeout=600)

except httpx.ConnectError as connect_error:

error_number = connect_error.__context__.args[0].errno
if error_number == 8 or error_number == -3:
self._service_url = self._service_url.replace("clustalo", "localhost")
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ packages = [{ include = "pyeed" }]

[tool.poetry.dependencies]
python = ">=3.10,<3.13"
biopython = "^1.81"
biopython = ">=1.81,<1.84"
networkx = "^3.2.1"
plotly = "^5.18.0"
nbformat = "^5.9.2"
Expand Down
Loading

0 comments on commit 4610e89

Please sign in to comment.