Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel implementation for all_pairs_bellman_ford_path #14

Merged
merged 7 commits into from
Dec 5, 2023

Conversation

Schefflera-Arboricola
Copy link
Member

networkx/networkx#7003

Also, for a larger number of nodes, I was getting this heatmap :
Screenshot 2023-10-11 at 9 51 46 AM

Please give your feedback.

Thank you :)

@jarrodmillman jarrodmillman added the type: Enhancement New feature or request label Oct 13, 2023
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper function could be simplified (and maybe sped up) by using a dict comprehension:

def _calculate_shortest_paths_subset(G, chunk, weight):
    return {n: single_source_bellman_ford_path(n, weight=weight) for n in chunk}

and it might also be easy (since it is so short) to define the helper function inside the function itself. It's a style choice. If there is a chance that it could be used by another function, then keep it defined in the main module.

I have a feeling that some of these idioms will be used over and over. Like:

    num_in_chunk = max(len(nodes) // total_cores, 1)
    node_chunks = nxp.chunks(nodes, num_in_chunk)

PR #7 has consolidated some of those into utility functions that most of the functions in nx_parallel could use. There already is one utility function you have used: nxp.cpu_count(). Maybe there should be others. I'm not sure what the best way to implement things like that is. But we should think about it. No need to implement those ideas for this PR though.

:)

@Schefflera-Arboricola
Copy link
Member Author

Schefflera-Arboricola commented Oct 16, 2023

@dschult There wasn't much difference in the speedups from the above dict comprehension also. The parallel implementation is taking more time because here I am computing all the paths and returning them in a dictionary but in the non-parallel implementation an iterator is returned so the paths are yielded(computed) when/if the iterator is iterated later in the program. And we cannot break down the task, such that, we compute multiple generators in parallel and then combine them, like this :

def _calculate_shortest_paths_subset(G, chunk, weight):
    for n in chunk:
        yield n, single_source_bellman_ford_path(G, n, weight=weight)

because joblib requires a serializable data structure to be returned to apply parallization(error: TypeError: cannot pickle 'generator' object).

But if someone wants to compute all the paths then this implementation is better than iterating through the iterator of the non-parallel function. (Suggestion(feature): add a parameter iter in the all_pairs_bellman_ford_path function, which when is True would return an iterator object and a dictionary when it's False) Here, the time to convert the iterator into a dictionary is also taken into account (timing file)

            t1 = time.time()
            c = currFun(H)
+           d1=dict(c)
            t2 = time.time()
            parallelTime = t2 - t1
            t1 = time.time()
            c = currFun(G)
+           d2=dict(c)
            t2 = time.time()
            stdTime = t2 - t1
            timesFaster = stdTime / parallelTime
            heatmapDF.at[j, i] = timesFaster
            print("Finished " + str(currFun))

here, we are getting speedups :
Screenshot 2023-10-16 at 11 54 24 AM

Also, thanks for the feedback on styling the code!
Please let me know what you think and how I should proceed from here.

Thank you :)

@dschult
Copy link
Member

dschult commented Oct 25, 2023

I think you can handle generators with joblib.
Take a look at this page and maybe a page linked from this one.
https://joblib.readthedocs.io/en/stable/parallel.html
Hopefully it is simple to do.

my understanding is that each cpu generates values, but they are yielded to the user in the order they are started, so if one is faster than a previous one, it waits until the previous one is finished and yielded before being yielded itself. The order is preserved relative to the non-parallel version.

now, the timing should still include generating the entire set of nodes. Because setting up the generators take very little time. It is when they actually do the computations that it takes time.

@Schefflera-Arboricola
Copy link
Member Author

Schefflera-Arboricola commented Oct 25, 2023

thanks @dschult

I have updated it now. Also, I have used weighted graphs this time. And, the speedup values seem to decrease after the 100-node graph. Let me know if there's anything else to improve upon.

Thank you :)

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good -- I have some comments below, but I think this is close to being ready.
:)

Comment on lines 43 to 44
def _calculate_shortest_paths_subset(G, source, weight):
return (source, single_source_bellman_ford_path(G, source, weight=weight))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think G and weight are already loaded into the outer function's namespace, so they will be found when used within this helper function. So you can remove those two inputs and make this a function of only source. That also shortens the later code that calls this function. Less time patching together function arguments, more time needed for variable lookups. But I think it could be faster overall. Can you tell?

Copy link
Member Author

@Schefflera-Arboricola Schefflera-Arboricola Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, passing only source is a little faster, following are the speed-ups for the same :

Screenshot 2023-11-03 at 6 04 59 PM

I have changed it in the recent commit

Comment on lines 23 to 27
G = nx.fast_gnp_random_graph(num, p, directed=False)

# for weighted graphs
for u, v in G.edges():
G[u][v]['weight'] = random.random()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably set the seed for random functions when timing. That will help ensure that the same steps are taken by the various trials. This is true for random.random() and also the graph creation routines.

parallelTime = t2 - t1
t1 = time.time()
c = currFun(G)
if type(c)==types.GeneratorType: d = dict(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth checking that the results are the same? (outside the timing part)
something like assert d1 == d2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked d1==d2 for the all_pairs_bellman_ford_path before committing. It was true for all cases. But, for betweenness_centrality it was not always true. I had to round up all the values. I can add separate tests for all the algorithms, if that seems good to you.

timing/timing_individual_function.py Outdated Show resolved Hide resolved
@Schefflera-Arboricola
Copy link
Member Author

@dschult I have made all the updates please let me know if this looks good to you. Thank you very much for the review :)

@Schefflera-Arboricola
Copy link
Member Author

I tried running the same timing_individual_function.py script again and I got the following heatmap :
Screenshot 2023-11-04 at 12 37 35 PM
This is the heatmap right now in the PR :
Screenshot 2023-11-04 at 12 39 53 PM

Would you say it's a lot of variation, especially for the 100-node graphs?

Thank you :)

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good.
There are some big-picture issues we should probably consider at some point -- but not for this PR's intent (addition of a parallel implementation).

Some of the questions invovle things like:

  • should we include the NX docs information in the nx-parallel doc_strings? That might lead to one version becoming out-of-date compared to the other. Perhaps we should only put info about the parallel implementation here and refer to the NX docs for the info about the original function. If so, where do we draw the line? at the function signature? do we copy the parameter descriptions?
  • Should we centralize the 'chunk'ing and 'map'ing and 'reduce'ing? see [WIP]: Refactor-- consolidate and simplify #7

But let's go ahead and merge this and worry about the big picture issues in another PR.
Thanks!


nodes = G.nodes

total_cores = nxp.cpu_count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be moving this to the function signature and match something like what scikit-learn does (n_jobs) but not a blocker here.

@MridulS
Copy link
Member

MridulS commented Dec 5, 2023

Thanks for this @Schefflera-Arboricola !

There are a bunch of stuff we should revisit, especially in the context of #7 but for now let's merge this in :)

@MridulS MridulS merged commit 362044c into networkx:main Dec 5, 2023
7 checks passed
@jarrodmillman jarrodmillman added this to the 0.1 milestone Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

4 participants