parallel implementation for all_pairs_bellman_ford_path #14

Schefflera-Arboricola · 2023-10-11T04:35:41Z

Also, for a larger number of nodes, I was getting this heatmap :

Please give your feedback.

Thank you :)

dschult

The helper function could be simplified (and maybe sped up) by using a dict comprehension:

def _calculate_shortest_paths_subset(G, chunk, weight):
    return {n: single_source_bellman_ford_path(n, weight=weight) for n in chunk}

and it might also be easy (since it is so short) to define the helper function inside the function itself. It's a style choice. If there is a chance that it could be used by another function, then keep it defined in the main module.

I have a feeling that some of these idioms will be used over and over. Like:

    num_in_chunk = max(len(nodes) // total_cores, 1)
    node_chunks = nxp.chunks(nodes, num_in_chunk)

PR #7 has consolidated some of those into utility functions that most of the functions in nx_parallel could use. There already is one utility function you have used: nxp.cpu_count(). Maybe there should be others. I'm not sure what the best way to implement things like that is. But we should think about it. No need to implement those ideas for this PR though.

:)

Schefflera-Arboricola · 2023-10-16T14:13:36Z

@dschult There wasn't much difference in the speedups from the above dict comprehension also. The parallel implementation is taking more time because here I am computing all the paths and returning them in a dictionary but in the non-parallel implementation an iterator is returned so the paths are yielded(computed) when/if the iterator is iterated later in the program. And we cannot break down the task, such that, we compute multiple generators in parallel and then combine them, like this :

def _calculate_shortest_paths_subset(G, chunk, weight):
    for n in chunk:
        yield n, single_source_bellman_ford_path(G, n, weight=weight)

because joblib requires a serializable data structure to be returned to apply parallization(error: TypeError: cannot pickle 'generator' object).

But if someone wants to compute all the paths then this implementation is better than iterating through the iterator of the non-parallel function. (Suggestion(feature): add a parameter iter in the all_pairs_bellman_ford_path function, which when is True would return an iterator object and a dictionary when it's False) Here, the time to convert the iterator into a dictionary is also taken into account (timing file)

            t1 = time.time()
            c = currFun(H)
+           d1=dict(c)
            t2 = time.time()
            parallelTime = t2 - t1
            t1 = time.time()
            c = currFun(G)
+           d2=dict(c)
            t2 = time.time()
            stdTime = t2 - t1
            timesFaster = stdTime / parallelTime
            heatmapDF.at[j, i] = timesFaster
            print("Finished " + str(currFun))

here, we are getting speedups :

Also, thanks for the feedback on styling the code!
Please let me know what you think and how I should proceed from here.

Thank you :)

dschult · 2023-10-25T02:32:27Z

I think you can handle generators with joblib.
Take a look at this page and maybe a page linked from this one.
https://joblib.readthedocs.io/en/stable/parallel.html
Hopefully it is simple to do.

my understanding is that each cpu generates values, but they are yielded to the user in the order they are started, so if one is faster than a previous one, it waits until the previous one is finished and yielded before being yielded itself. The order is preserved relative to the non-parallel version.

now, the timing should still include generating the entire set of nodes. Because setting up the generators take very little time. It is when they actually do the computations that it takes time.

Schefflera-Arboricola · 2023-10-25T19:04:24Z

thanks @dschult

I have updated it now. Also, I have used weighted graphs this time. And, the speedup values seem to decrease after the 100-node graph. Let me know if there's anything else to improve upon.

Thank you :)

dschult

This looks good -- I have some comments below, but I think this is close to being ready.
:)

dschult · 2023-11-01T00:08:10Z

nx_parallel/algorithms/shortest_paths/weighted.py

+    def _calculate_shortest_paths_subset(G, source, weight):
+        return (source, single_source_bellman_ford_path(G, source, weight=weight))


I think G and weight are already loaded into the outer function's namespace, so they will be found when used within this helper function. So you can remove those two inputs and make this a function of only source. That also shortens the later code that calls this function. Less time patching together function arguments, more time needed for variable lookups. But I think it could be faster overall. Can you tell?

yes, passing only source is a little faster, following are the speed-ups for the same :

I have changed it in the recent commit

dschult · 2023-11-01T00:39:16Z

timing/timing_individual_function.py

+        G = nx.fast_gnp_random_graph(num, p, directed=False)
+
+        # for weighted graphs
+        for u, v in G.edges():
+            G[u][v]['weight'] = random.random()


You should probably set the seed for random functions when timing. That will help ensure that the same steps are taken by the various trials. This is true for random.random() and also the graph creation routines.

dschult · 2023-11-01T00:39:57Z

timing/timing_individual_function.py

+        parallelTime = t2 - t1
+        t1 = time.time()
+        c = currFun(G)
+        if type(c)==types.GeneratorType: d = dict(c) 


Is it worth checking that the results are the same? (outside the timing part)
something like assert d1 == d2.

I checked d1==d2 for the all_pairs_bellman_ford_path before committing. It was true for all cases. But, for betweenness_centrality it was not always true. I had to round up all the values. I can add separate tests for all the algorithms, if that seems good to you.

timing/timing_individual_function.py

Schefflera-Arboricola · 2023-11-03T13:39:12Z

@dschult I have made all the updates please let me know if this looks good to you. Thank you very much for the review :)

Schefflera-Arboricola · 2023-11-04T07:14:20Z

I tried running the same timing_individual_function.py script again and I got the following heatmap :

This is the heatmap right now in the PR :

Would you say it's a lot of variation, especially for the 100-node graphs?

Thank you :)

dschult

I think this looks good.
There are some big-picture issues we should probably consider at some point -- but not for this PR's intent (addition of a parallel implementation).

Some of the questions invovle things like:

should we include the NX docs information in the nx-parallel doc_strings? That might lead to one version becoming out-of-date compared to the other. Perhaps we should only put info about the parallel implementation here and refer to the NX docs for the info about the original function. If so, where do we draw the line? at the function signature? do we copy the parameter descriptions?
Should we centralize the 'chunk'ing and 'map'ing and 'reduce'ing? see [WIP]: Refactor-- consolidate and simplify #7

But let's go ahead and merge this and worry about the big picture issues in another PR.
Thanks!

MridulS · 2023-12-05T11:01:35Z

nx_parallel/algorithms/shortest_paths/weighted.py

+
+    nodes = G.nodes
+
+    total_cores = nxp.cpu_count()


We should be moving this to the function signature and match something like what scikit-learn does (n_jobs) but not a blocker here.

MridulS · 2023-12-05T11:02:45Z

Thanks for this @Schefflera-Arboricola !

There are a bunch of stuff we should revisit, especially in the context of #7 but for now let's merge this in :)

parallel implementation for all_pairs_bellman_ford_path

8e48408

jarrodmillman added the type: Enhancement New feature or request label Oct 13, 2023

Merge branch 'networkx:main' into main

614ada3

dschult reviewed Oct 14, 2023

View reviewed changes

Schefflera-Arboricola requested a review from dschult October 16, 2023 17:21

Schefflera-Arboricola deleted the branch networkx:main October 17, 2023 22:00

Schefflera-Arboricola closed this Oct 17, 2023

Schefflera-Arboricola deleted the main branch October 17, 2023 22:00

Schefflera-Arboricola restored the main branch October 17, 2023 22:04

Schefflera-Arboricola reopened this Oct 17, 2023

changed return type from dict to generator

a79de33

dschult reviewed Nov 2, 2023

View reviewed changes

updated all_pairs_bellman_ford_path and timing_individual_function.py

f8f512f

added examples to all_pairs_bellman_ford_path()

6c5e0d4

Schefflera-Arboricola requested a review from dschult November 5, 2023 16:27

Schefflera-Arboricola added 2 commits November 26, 2023 18:12

applied black and modified doc in weighted.py

2abd434

modified doc in weighted.py

aaa36b7

dschult approved these changes Nov 27, 2023

View reviewed changes

dschult requested a review from MridulS November 27, 2023 02:32

MridulS mentioned this pull request Dec 5, 2023

parallelization of all_pairs_bellman_ford_path networkx/networkx#7003

Closed

MridulS reviewed Dec 5, 2023

View reviewed changes

MridulS merged commit 362044c into networkx:main Dec 5, 2023
7 checks passed

jarrodmillman added this to the 0.1 milestone Dec 5, 2023

Schefflera-Arboricola mentioned this pull request Feb 26, 2024

ENH : improved all_pairs_bellman_ford_path #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel implementation for all_pairs_bellman_ford_path #14

parallel implementation for all_pairs_bellman_ford_path #14

Schefflera-Arboricola commented Oct 11, 2023

dschult left a comment

Schefflera-Arboricola commented Oct 16, 2023 •

edited

Loading

dschult commented Oct 25, 2023

Schefflera-Arboricola commented Oct 25, 2023 •

edited

Loading

dschult left a comment

dschult Nov 1, 2023

Schefflera-Arboricola Nov 3, 2023 •

edited

Loading

dschult Nov 1, 2023

dschult Nov 1, 2023

Schefflera-Arboricola Nov 3, 2023

Schefflera-Arboricola commented Nov 3, 2023

Schefflera-Arboricola commented Nov 4, 2023

dschult left a comment

MridulS Dec 5, 2023

MridulS commented Dec 5, 2023

		def _calculate_shortest_paths_subset(G, source, weight):
		return (source, single_source_bellman_ford_path(G, source, weight=weight))

parallel implementation for all_pairs_bellman_ford_path #14

parallel implementation for all_pairs_bellman_ford_path #14

Conversation

Schefflera-Arboricola commented Oct 11, 2023

dschult left a comment

Choose a reason for hiding this comment

Schefflera-Arboricola commented Oct 16, 2023 • edited Loading

dschult commented Oct 25, 2023

Schefflera-Arboricola commented Oct 25, 2023 • edited Loading

dschult left a comment

Choose a reason for hiding this comment

dschult Nov 1, 2023

Choose a reason for hiding this comment

Schefflera-Arboricola Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

dschult Nov 1, 2023

Choose a reason for hiding this comment

dschult Nov 1, 2023

Choose a reason for hiding this comment

Schefflera-Arboricola Nov 3, 2023

Choose a reason for hiding this comment

Schefflera-Arboricola commented Nov 3, 2023

Schefflera-Arboricola commented Nov 4, 2023

dschult left a comment

Choose a reason for hiding this comment

MridulS Dec 5, 2023

Choose a reason for hiding this comment

MridulS commented Dec 5, 2023

Schefflera-Arboricola commented Oct 16, 2023 •

edited

Loading

Schefflera-Arboricola commented Oct 25, 2023 •

edited

Loading

Schefflera-Arboricola Nov 3, 2023 •

edited

Loading