Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional dependency GSTools-Core #215

Merged
merged 10 commits into from
Nov 11, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
All notable changes to **GSTools** will be documented in this file.


## [1.3.4] - Pure Pink ?

### Enhancements
- add GStools-Core as optional dependency [#215](https://github.com/GeoStat-Framework/GSTools/pull/215)

### Changes
- remove unnecessary `dim` argument in Cython code [#216](https://github.com/GeoStat-Framework/GSTools/issues/216)


## [1.3.3] - Pure Pink - 2021-08

### Enhancements
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ in memory for immediate 3D plotting in Python.

### Optional

- [GSTools-Core >= 0.2.0](https://github.com/GeoStat-Framework/GSTools-Core)
- [matplotlib](https://matplotlib.org)
- [pyvista](https://docs.pyvista.org/)

Expand Down
30 changes: 30 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ To get the latest development version you can install it directly from GitHub:

If something went wrong during installation, try the :code:`-I` `flag from pip <https://pip-python3.readthedocs.io/en/latest/reference/pip_install.html?highlight=i#cmdoption-i>`_.

**Speeding up GSTools by parallelization**

To enable the OpenMP support, you have to provide a C compiler and OpenMP.
Parallel support is controlled by an environment variable ``GSTOOLS_BUILD_PARALLEL``,
that can be ``0`` or ``1`` (interpreted as ``0`` if not present).
Expand All @@ -89,6 +91,33 @@ For the development version, you can do almost the same:
export GSTOOLS_BUILD_PARALLEL=1
pip install git+git://github.com/GeoStat-Framework/GSTools.git@main

**Using experimental GSTools-Core for even more speed**

You can install the optional dependency `GSTools-Core <https://github.com/GeoStat-Framework/GSTools-Core>`_,
which is a re-implementation of the main algorithms used in GSTools. The new
package uses the language Rust and it should be faster (in some cases by orders
of magnitude), safer, and it will potentially completely replace the current
standard implementation in Cython. Once the package GSTools-Core is available
on your machine, it will be used by default. In case you want to switch back to
the Cython implementation, you can set :code:`gstools.config.USE_RUST=False` in
your code. This also works at runtime. You can install the optional dependency
e.g. by

.. code-block:: none

pip install gstools[rust]

or by manually installing the package

.. code-block:: none

pip install gstools-core

GSTools-Core will automatically use all your cores in parallel, without having
LSchueler marked this conversation as resolved.
Show resolved Hide resolved
to use OpenMP or a local C compiler.
In case you want to restrict the number of threads used, you can set the
environment variable ``RAYON_NUM_THREADS`` to the desired amount.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a little function in gstools.config to set this environment variable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One argument against that is that this variable will affect all Rust software using Rayon, not just GSTools-Core's backend.

If we do provide a programmatic interface, we should probably make the Core initialize a dedicated thread pool and make this function affect only this thread pool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamreichold raises a fair point there. But that env. variable would only exist during the shell session from which GSTools is run (I don't have enough experience with Windows to know how it handles env. variables).
Did I understand it correctly, that the number of threads used by Rayon can only be set once at the start and doing it a second time results in an error? I'm not if this is worth the effort.
Another alternative would be to capture the env. variable at the start of GSTools and set it back to that value when exiting, but that doesn't sound very elegant.
I personally would just keep the hint in the readme for the moment. But if you have stronger opinions than I do, I'd be happy to implement your preferred solution.

Copy link

@adamreichold adamreichold Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I understand it correctly, that the number of threads used by Rayon can only be set once at the start and doing it a second time results in an error? I'm not if this is worth the effort.

Setting the environment variable will affect Rayon's so-called global thread pool which is initialized once, either on-demand when the first Rayon task is spawned or explicitly be the application. Hence, changing the environment variable after a first computation using Rayon was run has no effect.

This why I suggested making the GSTools-Core module store an Option<ThreadPool> which would be affected (meaning re-initialized using a given number of threads) by the function from gstools.config. If the option is set, that thread pool is used via ThreadPool::install, otherwise GSTools-Core falls back to the global thread pool.

I personally would just keep the hint in the readme for the moment.

Personally, I would say that online control over parallelism is a somewhat orthogonal issue and would best be served via a follow-up PR / issue.



Citation
========
Expand Down Expand Up @@ -387,6 +416,7 @@ Requirements
Optional
--------

- `GSTools-Core >= 0.2.0 <https://github.com/GeoStat-Framework/GSTools-Core>`_
- `matplotlib <https://matplotlib.org>`_
- `pyvista <https://docs.pyvista.org>`_

Expand Down
1 change: 1 addition & 0 deletions gstools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@
"""
# Hooray!
from gstools import (
config,
covmodel,
field,
krige,
Expand Down
14 changes: 14 additions & 0 deletions gstools/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-
"""
GStools subpackage providing global variables.

.. currentmodule:: gstools.config

"""
# pylint: disable=W0611
try:
LSchueler marked this conversation as resolved.
Show resolved Hide resolved
import gstools_core

USE_RUST = True # pragma: no cover
except ImportError:
USE_RUST = False
11 changes: 9 additions & 2 deletions gstools/field/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,23 @@
RandMeth
IncomprRandMeth
"""
# pylint: disable=C0103, W0222
# pylint: disable=C0103, W0222, C0412
import warnings
from copy import deepcopy as dcp

import numpy as np

from gstools import config
from gstools.covmodel.base import CovModel
from gstools.field.summator import summate, summate_incompr
from gstools.random.rng import RNG

if config.USE_RUST:
# pylint: disable=E0401
from gstools_core import summate, summate_incompr # pragma: no cover
else:
# pylint: disable=C0412
from gstools.field.summator import summate, summate_incompr
LSchueler marked this conversation as resolved.
Show resolved Hide resolved

__all__ = ["RandMeth", "IncomprRandMeth"]


Expand Down
19 changes: 14 additions & 5 deletions gstools/krige/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,32 @@
.. autosummary::
Krige
"""
# pylint: disable=C0103, W0221, E1102, R0201
# pylint: disable=C0103, W0221, E1102, R0201, C0412
import collections

import numpy as np
import scipy.linalg as spl
from scipy.spatial.distance import cdist

from gstools import config
from gstools.field.base import Field
from gstools.krige.krigesum import (
calc_field_krige,
calc_field_krige_and_variance,
)
from gstools.krige.tools import get_drift_functions, set_condition
from gstools.tools.geometric import rotated_main_axes
from gstools.tools.misc import eval_func
from gstools.variogram import vario_estimate

if config.USE_RUST:
# pylint: disable=E0401
from gstools_core import ( # pragma: no cover
calc_field_krige,
calc_field_krige_and_variance,
)
else:
from gstools.krige.krigesum import (
calc_field_krige,
calc_field_krige_and_variance,
)
LSchueler marked this conversation as resolved.
Show resolved Hide resolved

__all__ = ["Krige"]


Expand Down
4 changes: 2 additions & 2 deletions gstools/variogram/estimator.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,6 @@ cdef _normalization_func_vec choose_estimator_normalization_vec(str estimator_ty


def directional(
const int dim,
const double[:,:] f,
const double[:] bin_edges,
const double[:,:] pos,
Expand All @@ -203,6 +202,7 @@ def directional(
choose_estimator_normalization_vec(estimator_type)
)

cdef int dim = pos.shape[0]
cdef int d_max = direction.shape[0]
cdef int i_max = bin_edges.shape[0] - 1
cdef int j_max = pos.shape[1] - 1
Expand Down Expand Up @@ -237,13 +237,13 @@ def directional(
return np.asarray(variogram), np.asarray(counts)

def unstructured(
const int dim,
const double[:,:] f,
const double[:] bin_edges,
const double[:,:] pos,
str estimator_type='m',
str distance_type='e',
):
cdef int dim = pos.shape[0]
cdef _dist_func distance

if distance_type == 'e':
Expand Down
40 changes: 30 additions & 10 deletions gstools/variogram/variogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@
vario_estimate
vario_estimate_axis
"""
# pylint: disable=C0412
import numpy as np

from gstools import config
from gstools.normalizer.tools import remove_trend_norm_mean
from gstools.tools.geometric import (
ang2dir,
Expand All @@ -20,12 +22,29 @@
generate_grid,
)
from gstools.variogram.binning import standard_bins
from gstools.variogram.estimator import (
directional,
ma_structured,
structured,
unstructured,
)

if config.USE_RUST:
# pylint: disable=E0401
from gstools_core import (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be possible to merge those or this is a formatter issue?

Copy link
Member Author

@LSchueler LSchueler Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... that was isort's idea. It's about as ugly as it gets, right?! :-D And black and isort can't even agree on where to put the # pragma: no cover.
rustfmt is so nice :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about setting # pragma: no cover behind if config.USE_RUST: once?
Also the # pylint: disable=C0412 is redundant (see line 13) and we could also just add E0401 in line 13.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I was a bit annoyed yesterday by the combination of a false positive coming from pylint and then the isort and black peskiness. I imnproved the situation, but it seems that isort does not like grouped renaming of imports.
I don't want to disable the pylint error E0401 globally, as import errors can be quite a severe problem.

variogram_directional as directional, # pragma: no cover
)
from gstools_core import (
variogram_ma_structured as ma_structured, # pragma: no cover
)
from gstools_core import (
variogram_structured as structured, # pragma: no cover
)
from gstools_core import (
variogram_unstructured as unstructured, # pragma: no cover
)
else:
# pylint: disable=C0412
from gstools.variogram.estimator import (
directional,
ma_structured,
structured,
unstructured,
)

__all__ = [
"vario_estimate",
Expand Down Expand Up @@ -59,7 +78,8 @@ def _separate_dirs_test(direction, angles_tol):
for j in range(i + 1, direction.shape[0]):
s_prod = np.minimum(np.abs(np.dot(direction[i], direction[j])), 1)
separate_dirs &= np.arccos(s_prod) >= 2 * angles_tol
return separate_dirs
# gstools-core doesn't like the type `numpy.bool_`
return bool(separate_dirs)


def vario_estimate(
Expand Down Expand Up @@ -341,7 +361,6 @@ def vario_estimate(
# "h"aversine or "e"uclidean distance type
distance_type = "h" if latlon else "e"
estimates, counts = unstructured(
dim,
field,
bin_edges,
pos,
Expand All @@ -350,7 +369,6 @@ def vario_estimate(
)
else:
estimates, counts = directional(
dim,
field,
bin_edges,
pos,
Expand Down Expand Up @@ -441,7 +459,9 @@ def vario_estimate_axis(
field = np.ma.array(field, ndmin=1, dtype=np.double)
if missing:
field.mask = np.logical_or(field.mask, missing_mask)
mask = np.asarray(np.ma.getmaskarray(field), dtype=np.int32)
mask = np.ma.getmaskarray(field)
if not config.USE_RUST:
mask = np.asarray(mask, dtype=np.int32)
else:
field = np.array(field, ndmin=1, dtype=np.double, copy=False)
missing_mask = None # free space
Expand Down
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ doc =
plotting =
matplotlib>=3,<4
pyvista>=0.29,<1
rust =
gstools_core>=0.2.0,<1
test =
coverage[toml]>=5.2.1,<6
pytest>=6.0,<7
Expand Down