Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting nxp algorithms #63

Merged
merged 9 commits into from
Jun 6, 2024
33 changes: 15 additions & 18 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,31 +84,28 @@ To add any additional tests, **specific to nx_parallel**, you can follow the way

For displaying a small note about nx-parallel's implementation at the end of the main NetworkX documentation, we use the `backend_info` [entry_point](https://packaging.python.org/en/latest/specifications/entry-points/#entry-points) (in the `pyproject.toml` file). The [`get_info` function](https://github.com/networkx/nx-parallel/blob/main/_nx_parallel/__init__.py) is used to parse the docstrings of all the algorithms in nx-parallel and display the nx-parallel specific documentation on the NetworkX's main docs, in the "Additional Backend implementations" box, as shown in the screenshot below.

![backend_box_ss](https://github.com/networkx/nx-parallel/blob/main/assets/images/backend_box_ss.png)
![backend_box_ss](./assets/images/backend_box_ss.png)

Here is how the docstring should be formatted in nx-parallel:
nx-parallel follows [sphinx docstring guidelines](https://the-ultimate-sphinx-tutorial.readthedocs.io/en/latest/_guide/_styleguides/docstrings-guidelines.html) for writing docstrings. But, while extracting the docstring to display on the main networkx docs, only the first paragraph of the function's description and the first paragraph of each parameter's description is extracted and displayed. So, make sure to include all the necessary information in the first paragraphs itself. And you only need to include the additional **backend** parameters in the `Parameters` section and not all the parameters. Also, it is recommended to include a link to the networkx function's documentation page in the docstring, at the end of the function description.

Here is an example of how the docstrings should be formatted in nx-parallel:

```.py
def betweenness_centrality(
G, k=None, normalized=True, weight=None, endpoints=False, seed=None, get_chunks="chunks"
):
"""[FIRST PARA DISPLAYED ON MAIN NETWORKX DOCS AS FUNC DESC]
The parallel computation is implemented by dividing the
nodes into chunks and computing betweenness centrality for each chunk concurrently.
def parallel_func(G, nx_arg, additional_backend_arg_1, additional_backend_arg_2=None):
"""The parallel computation is implemented by dividing the
nodes into chunks and ..... [ONLY THIS PARAGRAPH WILL BE DISPLAYED ON THE MAIN NETWORKX DOCS]

Some more additional information about the function.

networkx.func : <link to the function's networkx docs page>

Parameters
------------ [EVERYTHING BELOW THIS LINE AND BEFORE THE NETWORKX LINK WILL BE DISPLAYED IN ADDITIONAL PARAMETER'S SECTION ON NETWORKX MAIN DOCS]
get_chunks : function (default = "chunks")
A function that takes in nodes as input and returns node_chunks...[YOU CAN MULTIPLE PARAGRAPHS FOR EACH PARAMETER, IF NEEDED, SEPARATED BY 1 BLANK LINE]
----------
additional_backend_arg_1 : int or float
[YOU CAN HAVE MULTIPLE PARAGRAPHS BUT ONLY THE FIRST PARAGRAPH WILL BE DISPLAYED ON THE MAIN NETWORKX DOCS]

[LEAVE 2 BLANK LINES BETWEEN EACH PARAMETER]
parameter 2 : int
additional_backend_arg_2 : None or str (default=None)
....
.
.
.
[LEAVE 1 BLANK LINE BETWEEN THE PARAMETERS SECTION AND THE LINK]
networkx.betweenness_centrality : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
"""
```

Expand Down
26 changes: 18 additions & 8 deletions _nx_parallel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ def get_info():
"number_of_isolates": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L8",
"additional_docs": "The parallel computation is implemented by dividing the list of isolated nodes into chunks and then finding the length of each chunk in parallel and then adding all the lengths at the end.",
"additional_parameters": None,
"additional_parameters": {
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the isolated nodes as input and returns an iterable `isolate_chunks`. The default chunking is done by slicing the `isolates` into `n` chunks, where `n` is the total number of CPU cores available."
},
},
"square_clustering": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L10",
Expand All @@ -25,22 +27,30 @@ def get_info():
"local_efficiency": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L9",
"additional_docs": "The parallel computation is implemented by dividing the nodes into chunks and then computing and adding global efficiencies of all node in all chunks, in parallel, and then adding all these sums and dividing by the total number of nodes at the end.",
"additional_parameters": None,
"additional_parameters": {
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores available."
},
},
"closeness_vitality": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L9",
"additional_docs": "The parallel computation is implemented only when the node is not specified. The closeness vitality for each node is computed concurrently.",
"additional_parameters": None,
"additional_parameters": {
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores."
},
},
"is_reachable": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L10",
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L12",
"additional_docs": "The function parallelizes the calculation of two neighborhoods of vertices in `G` and checks closure conditions for each neighborhood subset in parallel.",
"additional_parameters": None,
"additional_parameters": {
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores available."
},
},
"tournament_is_strongly_connected": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L54",
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L59",
"additional_docs": "The parallel computation is implemented by dividing the nodes into chunks and then checking whether each node is reachable from each other node in parallel.",
"additional_parameters": None,
"additional_parameters": {
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores available."
},
},
"all_pairs_node_connectivity": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L17",
Expand Down Expand Up @@ -127,7 +137,7 @@ def get_info():
},
},
"all_pairs_shortest_path": {
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L62",
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L63",
"additional_docs": "The parallel implementation first divides the nodes into chunks and then creates a generator to lazily compute shortest paths for each `node_chunk`, and then employs joblib's `Parallel` function to execute these computations in parallel across all available CPU cores.",
"additional_parameters": {
'get_chunks : str, function (default = "chunks")': "A function that takes in an iterable of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `G.nodes` into `n` chunks, where `n` is the number of CPU cores."
Expand Down
53 changes: 33 additions & 20 deletions _nx_parallel/update_get_info.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
import os
import ast

__all__ = ["get_funcs_info", "extract_docstrings_from_file", "extract_from_docs"]
__all__ = [
"get_funcs_info",
"extract_docstrings_from_file",
"extract_add_docs",
"extract_add_params",
"get_url",
]

# Helper functions for get_info

Expand All @@ -21,11 +27,10 @@ def get_funcs_info():
path = os.path.join(root, file)
d = extract_docstrings_from_file(path)
for func in d:
par_docs, par_params = extract_from_docs(d[func])
funcs[func] = {
"url": get_url(path, func),
"additional_docs": par_docs,
"additional_parameters": par_params,
"additional_docs": extract_add_docs(d[func]),
"additional_parameters": extract_add_params(d[func]),
}
return funcs

Expand Down Expand Up @@ -60,8 +65,8 @@ def extract_docstrings_from_file(file_path):
return docstrings


def extract_from_docs(docstring):
"""Extract the parallel documentation and parallel parameter description from the given doctring."""
def extract_add_docs(docstring):
"""Extract the parallel documentation description from the given doctring."""
try:
# Extracting Parallel Computation description
# Assuming that the first para in docstring is the function's PC desc
Expand All @@ -76,30 +81,38 @@ def extract_from_docs(docstring):
except Exception as e:
print(e)
par_docs = None
return par_docs


def extract_add_params(docstring):
"""Extract the parallel parameter description from the given docstring."""
try:
# Extracting extra parameters
# Assuming that the last para in docstring is the function's extra params
par_params = {}
par_params_ = docstring.split("------------\n")[1]

par_params_ = par_params_.split("\n\n\n")
for i in par_params_:
j = i.split("\n")
par_params[j[0]] = "\n".join(
[line.strip() for line in j[1:] if line.strip()]
)
if i == par_params_[-1]:
par_params[j[0]] = " ".join(
[line.strip() for line in j[1:-1] if line.strip()]
)
par_docs = par_docs.replace("\n", " ")
par_params_ = docstring.split("----------\n")[1]
par_params_ = par_params_.split("\n")

i = 0
while i < len(par_params_):
line = par_params_[i]
if " : " in line:
key = line.strip()
n = par_params_.index(key) + 1
par_desc = ""
while n < len(par_params_) and par_params_[n] != "":
par_desc += par_params_[n].strip() + " "
n += 1
par_params[key] = par_desc.strip()
i = n + 1
else:
i += 1
except IndexError:
par_params = None
except Exception as e:
print(e)
par_params = None
return par_docs, par_params
return par_params


def get_url(file_path, function_name):
Expand Down
Binary file modified assets/images/backend_box_ss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions nx_parallel/algorithms/approximation/connectivity.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,16 @@ def approximate_all_pairs_node_connectivity(
will run the parallel implementation of `all_pairs_node_connectivity` present in the
`connectivity/connectivity`. Use `nxp.approximate_all_pairs_node_connectivity` instead.

networkx.all_pairs_node_connectivity : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.connectivity.all_pairs_node_connectivity.html

Parameters
------------
----------
get_chunks : str, function (default = "chunks")
A function that takes in `list(iter_func(nbunch, 2))` as input and returns
an iterable `pairs_chunks`, here `iter_func` is `permutations` in case of
directed graphs and `combinations` in case of undirected graphs. The default
is to create chunks by slicing the list into `n` chunks, where `n` is the
number of CPU cores, such that size of each chunk is atmost 10, and at least 1.

networkx.all_pairs_node_connectivity : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.connectivity.all_pairs_node_connectivity.html
"""

if hasattr(G, "graph_object"):
Expand Down
7 changes: 4 additions & 3 deletions nx_parallel/algorithms/bipartite/redundancy.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,15 @@ def node_redundancy(G, nodes=None, get_chunks="chunks"):
"""In the parallel implementation we divide the nodes into chunks and compute
the node redundancy coefficients for all `node_chunk` in parallel.

networkx.bipartite.node_redundancy : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.redundancy.node_redundancy.html

Parameters
------------
----------
get_chunks : str, function (default = "chunks")
A function that takes in an iterable of all the nodes as input and returns
an iterable `node_chunks`. The default chunking is done by slicing the
`G.nodes` (or `nodes`) into `n` chunks, where `n` is the number of CPU cores.

networkx.bipartite.node_redundancy : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.redundancy.node_redundancy.html"""
"""

if hasattr(G, "graph_object"):
G = G.graph_object
Expand Down
6 changes: 3 additions & 3 deletions nx_parallel/algorithms/centrality/betweenness.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ def betweenness_centrality(
"""The parallel computation is implemented by dividing the nodes into chunks and
computing betweenness centrality for each chunk concurrently.

networkx.betweenness_centrality : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html

Parameters
------------
----------
get_chunks : str, function (default = "chunks")
A function that takes in a list of all the nodes as input and returns an
iterable `node_chunks`. The default chunking is done by slicing the
`nodes` into `n` chunks, where `n` is the number of CPU cores.

networkx.betweenness_centrality : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
"""
if hasattr(G, "graph_object"):
G = G.graph_object
Expand Down
6 changes: 3 additions & 3 deletions nx_parallel/algorithms/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ def square_clustering(G, nodes=None, get_chunks="chunks"):
coefficient for all `node_chunks` are computed in parallel over all available
CPU cores.

networkx.square_clustering: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.square_clustering.html

Parameters
------------
----------
get_chunks : str, function (default = "chunks")
A function that takes in a list of all the nodes (or nbunch) as input and
returns an iterable `node_chunks`. The default chunking is done by slicing the
`nodes` into `n` chunks, where `n` is the number of CPU cores.

networkx.square_clustering: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.square_clustering.html
"""

def _compute_clustering_chunk(node_iter_chunk):
Expand Down
6 changes: 3 additions & 3 deletions nx_parallel/algorithms/connectivity/connectivity.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,16 @@ def all_pairs_node_connectivity(G, nbunch=None, flow_func=None, get_chunks="chun
execute these computations in parallel across all available CPU cores. At the end,
the results are aggregated into a single dictionary and returned.

networkx.all_pairs_node_connectivity : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.connectivity.connectivity.all_pairs_node_connectivity.html

Parameters
------------
----------
get_chunks : str, function (default = "chunks")
A function that takes in `list(iter_func(nbunch, 2))` as input and returns
an iterable `pairs_chunks`, here `iter_func` is `permutations` in case of
directed graphs and `combinations` in case of undirected graphs. The default
is to create chunks by slicing the list into `n` chunks, where `n` is the
number of CPU cores, such that size of each chunk is atmost 10, and at least 1.

networkx.all_pairs_node_connectivity : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.connectivity.connectivity.all_pairs_node_connectivity.html
"""

if hasattr(G, "graph_object"):
Expand Down
26 changes: 18 additions & 8 deletions nx_parallel/algorithms/efficiency_measures.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,37 @@
__all__ = ["local_efficiency"]


def local_efficiency(G):
def local_efficiency(G, get_chunks="chunks"):
"""The parallel computation is implemented by dividing the
nodes into chunks and then computing and adding global efficiencies of all node
in all chunks, in parallel, and then adding all these sums and dividing by the
total number of nodes at the end.

networkx.local_efficiency : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.efficiency_measures.local_efficiency.html#local-efficiency
networkx.local_efficiency : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.efficiency_measures.local_efficiency.html

Parameters
----------
get_chunks : str, function (default = "chunks")
A function that takes in a list of all the nodes as input and returns an
iterable `node_chunks`. The default chunking is done by slicing the `nodes`
into `n` chunks, where `n` is the total number of CPU cores available.
"""

def _local_efficiency_node_subset(G, nodes):
return sum(nx.global_efficiency(G.subgraph(G[v])) for v in nodes)
def _local_efficiency_node_subset(G, chunk):
return sum(nx.global_efficiency(G.subgraph(G[v])) for v in chunk)

if hasattr(G, "graph_object"):
G = G.graph_object

cpu_count = nxp.cpu_count()
total_cores = nxp.cpu_count()

num_in_chunk = max(len(G.nodes) // cpu_count, 1)
node_chunks = list(nxp.chunks(G.nodes, num_in_chunk))
if get_chunks == "chunks":
num_in_chunk = max(len(G.nodes) // total_cores, 1)
node_chunks = list(nxp.chunks(G.nodes, num_in_chunk))
else:
node_chunks = get_chunks(G.nodes)

efficiencies = Parallel(n_jobs=cpu_count)(
efficiencies = Parallel(n_jobs=total_cores)(
delayed(_local_efficiency_node_subset)(G, chunk) for chunk in node_chunks
)
return sum(efficiencies) / len(G)
Loading
Loading