Skip to content

Commit

Permalink
first commit, isolates and betweenness (#2)
Browse files Browse the repository at this point in the history
* first commit, isolates and betweeness

isolates passes all tests
still need to pass a few betweenness tests

* Works with PR #6688, more graph types, parallel implementations of vitality and tournament

- Decided to just make things work with PR #6688 (had all the functions I needed marked with dispatch decorator)
- More graph types and small interface changes
- parallel implementations of closeness_vitality + tournament (I am a bit ahead of schedule)
- Made networkx tests into my own tests for nx_parallel (same directories as in networkx. can be easily run w pytest for CI)
- ended up having to include all the functions, but didn't have to reimplement (see isolates or tournament for example)
- added utils/chunk.py

* Fixed betweeness tests + made betweenness_centrality pass all tests

- Fixed betweenness tests, had some small errors in them
- Minor changes to graph class constructors
- Changed betweenness_centrality implementation, passes all tests

* Changed betweenness

Redid betweenness without convert function

Tried to use __wrapped__ but it only worked for isolates...for consistency I kept everything the same

Errors for using __wrapped__ were because various methods were "not implemented by parallel"

* Parallelized efficiency_measures

Passes all tests

* added originalGraph to parallel classes, added heatmaps + their code

added originalGraph to parallel classes, added heatmaps + their code in the timing folder

WIP for heatmap

* .py add

* fix test build pyproject.toml

* add init files for import -- might be revised

* try changing dir

* try changing dir correctly

* undo dir munging tries

* try again

* try tests

* debug widnows ci

* debug widnows ci

* now get nx_parallel tests working

* show environment pre-testing

* try pyargs with nx_parallel

* import debug

* print more

* use python -m pytest instead of pytest

* cleanup and check all

* Quick timing documentation update

* style with black and ruff

* set up pre-commit config to match NetworkX

---------

Co-authored-by: Dan Schult <[email protected]>
  • Loading branch information
20kavishs and dschult authored Sep 11, 2023
1 parent 5b796c1 commit 241fbac
Show file tree
Hide file tree
Showing 33 changed files with 1,921 additions and 94 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,10 @@ jobs:
run: |
conda install -c conda-forge joblib scipy pandas pytest-cov pytest-randomly
# matplotlib lxml pygraphviz pydot sympy # Extra networkx deps we don't need yet
pip install git+https://github.com/networkx/networkx.git@main --no-deps
pip install -e . --no-deps
python -m pip install git+https://github.com/networkx/networkx.git@main
python -m pip install .
echo "Done with installing"
- name: PyTest
run: |
NETWORKX_GRAPH_CONVERT=parallel pytest --pyargs networkx
python -m pytest --pyargs nx_parallel
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,4 +127,3 @@ dmypy.json

# Pyre type checker
.pyre/

31 changes: 31 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Install pre-commit hooks via
# pre-commit install

repos:
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.13.0
hooks:
- id: blacken-docs
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.7.1
hooks:
- id: prettier
files: \.(html|md|toml|yml|yaml)
args: [--prose-wrap=preserve]
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.258
hooks:
- id: ruff
args:
- --fix
- repo: local
hooks:
- id: pyproject.toml
name: pyproject.toml
language: system
entry: python tools/generate_pyproject.toml.py
files: "pyproject.toml|requirements/.*\\.txt|tools/.*pyproject.*"
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
NX-Parallel
nx_parallel
-----------

A NetworkX backend plugin which uses dask for parallelization.
A NetworkX backend plugin which uses joblib and multiprocessing for parallelization.

``` python
In [1]: import networkx as nx; import nx_parallel
Expand All @@ -23,4 +23,19 @@ Out[4]:
8: 0.0,
9: 0.0}

```
```

Currently the following functions have parallelized implementations:
- centrality
- betweenness_centrality
- tournament
- is_reachable
- closeness_vitality
- efficiency_measures
- local_efficiency

![alt text](timing/heatmap_all_functions.png)

See the ```/timing``` folder for more heatmaps and code for heatmap generation!


4 changes: 2 additions & 2 deletions nx_parallel/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .centrality import *
from .graph import *
from .algorithms import *
from .classes import *
from .interface import *
8 changes: 8 additions & 0 deletions nx_parallel/algorithms/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# subpackages
from .centrality import *
from .utils import *

# modules
from .efficiency_measures import *
from .isolate import *
from .tournament import *
1 change: 1 addition & 0 deletions nx_parallel/algorithms/centrality/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .betweenness import *
121 changes: 121 additions & 0 deletions nx_parallel/algorithms/centrality/betweenness.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
from joblib import Parallel, delayed, cpu_count
from nx_parallel.algorithms.utils.chunk import chunks
from networkx.utils import py_random_state
from networkx.algorithms.centrality.betweenness import (
_rescale,
_single_source_shortest_path_basic,
_single_source_dijkstra_path_basic,
_accumulate_endpoints,
_accumulate_basic,
)

__all__ = ["betweenness_centrality"]


@py_random_state(5)
def betweenness_centrality(
G, k=None, normalized=True, weight=None, endpoints=False, seed=None
):
r"""Parallel Compute shortest-path betweenness centrality for nodes
Betweenness centrality of a node $v$ is the sum of the
fraction of all-pairs shortest paths that pass through $v$
.. math::
c_B(v) =\sum_{s,t \in V} \frac{\sigma(s, t|v)}{\sigma(s, t)}
where $V$ is the set of nodes, $\sigma(s, t)$ is the number of
shortest $(s, t)$-paths, and $\sigma(s, t|v)$ is the number of
those paths passing through some node $v$ other than $s, t$.
If $s = t$, $\sigma(s, t) = 1$, and if $v \in {s, t}$,
$\sigma(s, t|v) = 0$ [2]_.
Parameters
----------
G : graph
A NetworkX graph.
k : int, optional (default=None)
If k is not None use k node samples to estimate betweenness.
The value of k <= n where n is the number of nodes in the graph.
Higher values give better approximation.
normalized : bool, optional
If True the betweenness values are normalized by `2/((n-1)(n-2))`
for graphs, and `1/((n-1)(n-2))` for directed graphs where `n`
is the number of nodes in G.
weight : None or string, optional (default=None)
If None, all edge weights are considered equal.
Otherwise holds the name of the edge attribute used as weight.
Weights are used to calculate weighted shortest paths, so they are
interpreted as distances.
endpoints : bool, optional
If True include the endpoints in the shortest path counts.
seed : integer, random_state, or None (default)
Indicator of random number generation state.
See :ref:`Randomness<randomness>`.
Note that this is only used if k is not None.
Returns
-------
nodes : dictionary
Dictionary of nodes with betweenness centrality as the value.
Notes
-----
This algorithm is a parallelized version of betwenness centrality in NetworkX.
Nodes are divided into chunks based on the number of available processors,
and otherwise all calculations are similar.
"""
if k is None:
nodes = G.nodes
else:
nodes = seed.sample(list(G.nodes), k)
total_cores = cpu_count()
num_chunks = max(len(nodes) // total_cores, 1)
node_chunks = list(chunks(nodes, num_chunks))
bt_cs = Parallel(n_jobs=total_cores)(
delayed(betweenness_centrality_node_subset)(
G,
chunk,
weight,
endpoints,
)
for chunk in node_chunks
)

# Reducing partial solution
bt_c = bt_cs[0]
for bt in bt_cs[1:]:
for n in bt:
bt_c[n] += bt[n]

betweenness = _rescale(
bt_c,
len(G),
normalized=normalized,
directed=G.is_directed(),
k=k,
endpoints=endpoints,
)
return betweenness


def betweenness_centrality_node_subset(G, nodes, weight=None, endpoints=False):
betweenness = dict.fromkeys(G, 0.0)
for s in nodes:
# single source shortest paths
if weight is None: # use BFS
S, P, sigma, _ = _single_source_shortest_path_basic(G, s)
else: # use Dijkstra's algorithm
S, P, sigma, _ = _single_source_dijkstra_path_basic(G, s, weight)
# accumulation
if endpoints:
betweenness, delta = _accumulate_endpoints(betweenness, S, P, sigma, s)
else:
betweenness, delta = _accumulate_basic(betweenness, S, P, sigma, s)
return betweenness
Loading

0 comments on commit 241fbac

Please sign in to comment.