Ordinary Kriging: help please? #36

banesullivan · 2019-10-22T22:30:41Z

So I'm trying to get up and running with this software, and I'm finding it rather difficult to produce a dirty ordinary kriging result. I learned geostats/kriging using SGeMS and I'm realizing much of what I learned is difficult to transfer here (so far)... I'm curious if someone could help me get some quick/dirty results with a dataset that I've used in SGeMS before.

The Example

I believe my Gaussian model is probably problematic

import numpy as np
from gstools import Exponential, Gaussian, krige
import pyvista as pv

# Load the data
probes = pv.read("probes.vtp")
model_space = pv.read("model_space.vtr")

model = Gaussian(dim=3, var=10, len_scale=500)
cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']
# Create the kriging model
krig = krige.Ordinary(model, cond_pos=cond_pos, cond_val=field)

krige_field, krige_var = krig((model_space.points.T))

model_space["krige_field"] = krige_field
model_space["krige_var"] = krige_var

model_space.contour().plot(scalars="krige_field" )

I think I am running into trouble with the len_scale option in the Gaussian model. GSTools enforces a limit of 1000 on that but for this dataset, I may need a higher len_scale?

If you all were to use this given dataset to produce an example showcasing what GSTools can do, what would that be? This is all in an effort to help me build out how PyVista might interface with GSTools

Data

I have sparse, subsurface temperature probe data and want to krige a volume of these data.

data.zip

"Truth"

I'm working towards results that look more like this (prodcued in SGeMS):

The text was updated successfully, but these errors were encountered:

LSchueler · 2019-10-23T08:57:03Z

Without having had a more in depth look at your problem yet and if the length scale bound is the problem, you can do following for example:

model = Gaussian(dim=3, var=10)
model.set_arg_bounds(len_scale=[0, 10000])
model.len_scale = 2000

LSchueler · 2019-10-23T12:39:32Z

If you pull the newest version of GSTools, the boundaries are now much generous ;-)

banesullivan · 2019-10-23T17:22:59Z

Okay, awesome! I pulled the latest changes and increasing the length scale is helping me approach a better solution.

Do you all have any examples, tips, or tricks for finding a good Gaussian model?

javoha · 2019-10-29T10:07:40Z

Hello,

let's see if I can make a suggestion (please correct me if I got anything wrong):

The len_scale of GSTools corresponds to the range describing the spatial correlation of data when performing variogram analysis. There is basically two ways to obtain a range for your model.

Expert knowledge: You have some idea of the length scale of spatial correlation in your data set that you use as a range. In this case using the same range (len_scale) that you used for the SGeMS model should give you the same result. Just be aware that some software packages use the "effective range" (for the Gaussian model this is sqrt(3)*range I think).
You perform a variogram analysis (basically a data driven approach to obtain a range). If I am correct GSTools offers this: https://geostat-framework.readthedocs.io/projects/gstools/en/latest/tutorial_03_vario.html

Apart from that you might have anisotropies in your model (in your example it looks like you have strong continuity in x and y direction (horizontal) and variation in z direction corresponding to the direction of sedimentation. Also in literature (e.g. Webster and Oliver (2008)) the Gaussian model is deprecated for approaching the origing with zero gradient. You might want to consider a different variogram model.

Maybe this is helpful and please correct me if I got anything wrong.

LSchueler · 2019-11-28T15:12:30Z

Hi @banesullivan,

did the reply from @javoha (thanks for that!) help? - Could you reproduce the "truth" from SGeMS?

banesullivan · 2019-12-04T00:26:13Z

Thanks for the tips @javoha!

I haven't had the time to jump back into this... I will need to next week-ish and I will report back!

You might want to consider a different variogram model.

@javoha, Are there any other options for models when kriging with GSTools?

@LSchueler, feel free to close this issue as it's not really an issue/bug with the software but a support topic

MuellerSeb · 2019-12-08T13:58:08Z

@banesullivan : you can use all provided models for kriging: provided-covariance-models
And of course self defined ones. ;-)

MuellerSeb · 2019-12-09T11:41:38Z

Closing now. But thanks for the input ;-)

banesullivan · 2019-12-09T21:01:05Z

So I'm trying this again but with another dataset that is a bit cleaner. Data attached and this data has a permissive license if you want to use it in an example in your docs - just let me know.

I've compiled with OpenMP support per #31 (comment) but as soon as I start running the kriging (either on a structured or unstructured mesh) the memory consumption blows up with over 100 GB used on the SWAP so I kill the task. This amount of memory usage seems ridiculous to me...

@MuellerSeb, @LSchueler, or @javoha: Can you all help in some capacity to make this example run?

Or can you put some 3D examples in the docs besides just a random field? I can't get anything to work in 3D with this library yet.

Data files: Archive.zip

import numpy as np
import matplotlib.pyplot as plt

import pyvista as pv

from gstools import (Exponential, Gaussian, krige, 
                     vario_estimate_unstructured, Stable)

grid = pv.read("grid.vtr")
assay = pv.read("assay.vtp")

cond_pos = assay.points.T # gstools convention
# conditional values - what i want to interpolate
field = assay["CU_pct"]


model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(40)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=40)
ax.plot(bin_center, gamma)
print(fit_model)

Not too shabby of a variogram, so I should be able to at least produce some results with this....

# Create the kriging model
krig = krige.Ordinary(model, 
                      cond_pos=cond_pos, 
                      cond_val=field)

And this is where the memory usage blows up

krige_field, krige_var = krig((grid.x, grid.y, grid.z), mesh_type="structured")

kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var

banesullivan · 2019-12-09T21:01:55Z

Or as unstructured:

krige_field, krige_var = krig(grid.points.T, mesh_type="unstructured")

kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var

banesullivan · 2019-12-09T21:07:03Z

And...

model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(100)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=100)
ax.plot(bin_center, gamma)
print(fit_model)

MuellerSeb · 2019-12-10T02:17:45Z

Good heavens!
I think the problem is here:

GSTools/gstools/krige/ordinary.py

Line 140 in 5326163

cdist(

This function creates a matrix containing all pairwise distances between the conditional points and the given field-points. In your case a matrix with 1000000 * 8583 entries... quite huge.

We need to put this into chunks I guess. Therefore we need to introduce a controlling parameter like chunk_size which is maybe None by default, or something like 1e6.

ATM the output mesh has just too many points. Maybe as a work around you could chunk it manually, by cutting the x,y,z positions. Sorry for this inconvenience.

MuellerSeb · 2019-12-10T02:42:43Z

Maybe try something like this:

krige_field = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)
krige_var = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)

for i in range(5):
    for j in range(5):
        for k in range(5):
            (
                krige_field[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20], 
                krige_var[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20],
            ) = krig(
                (
                    grid.x[i*20:(i+1)*20],
                    grid.y[j*20:(j+1)*20], 
                    grid.z[k*20:(k+1)*20],
                ),
                mesh_type="structured",
            )

banesullivan · 2019-12-10T07:24:59Z

huh, okay this makes sense now why the memory usage was blowing up and makes me feel a bit better knowing it wasn't just a user issue.

With that suggested code, it runs without massive memory consumption but is taking ages to execute... (no result as of yet)

Can you all implement that behind the scenes? And possibly parallelize it or find another way to speed this up?

LSchueler · 2019-12-10T08:11:22Z

Yeah that's indeed not a user issue, it's an inherent problem with Python.
In order to speed things up, Numpy and Scipy often rely on underlying C-code, which is compiled to machine code and thus really fast. But the problem is, these C-functions have to be fed by the Python code with all the data at once.
That's the reason why MuellerSeb's work around helps with the memory consumption.

We'll discuss some options.

And thank you very much for providing the data!

MuellerSeb · 2020-01-08T13:53:51Z

@banesullivan : I started to re-implement the kriging routines. You can have a look at the feature branch: https://github.com/GeoStat-Framework/GSTools/tree/krige_update

There you now have the opportunity to define a chunksize when calling the kriging routine. The routines have also been optimized, so the kriging system is only determined once on creation.

Maybe you can play around with that and give a feedback ;-)

Cheers, Sebastian

banesullivan · 2020-01-10T05:29:45Z

Ah awesome, @MuellerSeb! I will have to try that out when I have a chance... been a bit focused on a specific project lately.

FYI, I managed to get pretty good results with the original data from the top post (see #36 (comment)) with the following Exponential variogram:

model = Exponential(dim=3,   # 3D
                    var=9e3, # Sill
                    len_scale=[5000, 5000, 850]
                   )

I think it would be pretty cool to include this example in the docs somewhere... also it would be cool to switch the examples over to sphinx-gallery but in order to do that with PyVista, you would have to leave readthedocs.org for github pages

Also, since this is the original topic of this issue, I cannot get a decent variogram fit for that dataset. The following doesn't seem to work:

_, bins = np.histogram(field, 50,)
bin_center, gamma = vario_estimate_unstructured(
    cond_pos,
    field,
    bins,
    )


fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)

from gstools.covmodel.plot import plot_variogram

plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()

any tips?

MuellerSeb · 2020-01-10T12:25:00Z

@banesullivan : That kriging result looks good indeed!
Could you provide a full working script to reproduce the temperature image? We are aiming at a better structure for the examples folder, so this could be nice for the collection.

Addressing the second problem:
Why are you using the bins from the histogram? These bins are setup in order to to proper bin the field values. But the bins in the variogram are needed for the pairwise distances of the field points. I think you are mixing things up there.

We are working on a new variogram estimation routine at the moment, where bining could be done automatically. So this should be more convenient in the future.

banesullivan · 2020-01-10T19:59:58Z

Aha! I totally misinterpreted the purpose of the bins. Much better now:

Here is the data: probes.vtp.zip

Here is the full script:

import numpy as np
from gstools import Exponential, krige, vario_estimate_unstructured
import matplotlib.pyplot as plt
import pyvista as pv

# Load the data
probes = pv.read("probes.vtp")

cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']

bins = np.linspace(0, 12300, 1000)
bin_center, gamma = vario_estimate_unstructured(
    cond_pos,
    field,
    bins,
    )


fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)

from gstools.covmodel.plot import plot_variogram

plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()

# Create the kriging grid
grid = pv.UniformGrid()
grid.origin = (329700, 4252600, -2700)
grid.spacing = (250, 250, 50)
grid.dimensions = (60, 75, 100)

# Create the kriging model
krig = krige.Ordinary(fit_model, cond_pos=cond_pos, cond_val=field)

r = grid.cast_to_rectilinear_grid() # Use structed grid for better performance
krige_field, krige_var = krig((r.x, r.y, r.z), mesh_type="structured")

grid["temperature (C)"] = krige_field.ravel(order="f")
grid["krige_var"] = krige_var.ravel(order="f")

dargs = dict(cmap="coolwarm", clim=[0,300], scalars="temperature (C)")

p = pv.Plotter()
p.add_volume(grid, opacity="sigmoid_8", **dargs)
p.add_mesh(probes, **dargs)
p.show()

There's a lot of other data for this project with some isosurfaces of "true" temperature values. Here's a GIF with other stuff I didn't include here.

MuellerSeb · 2020-01-10T23:58:12Z

@banesullivan : Cool! Thanks for the scripts.. stunning!
#55 gives a proposal for automated bining. maybe you can try this and compare the results.

MuellerSeb · 2021-01-27T12:33:47Z

@banesullivan : Automatic Binning was implemented with #131 and PyVista is now support with #59.

LSchueler mentioned this issue Oct 23, 2019

Boundaries for Covmodel parameters are not general and also not necessary #37

Closed

MuellerSeb added help wanted Extra attention is needed question Further information is requested labels Oct 23, 2019

MuellerSeb closed this as completed Dec 9, 2019

MuellerSeb reopened this Dec 10, 2019

LSchueler mentioned this issue Dec 10, 2019

Improve computing time and memory consumption of ordinary Kriging #50

Closed

MuellerSeb self-assigned this Jan 11, 2020

This was referenced Jan 11, 2020

Move examples to sphinx-gallery #57

Closed

Automatic binning #55

Closed

Implement automated binning #58

Closed

banesullivan mentioned this issue Jan 12, 2020

Add PyVista mesh support to Field #59

Merged

banesullivan mentioned this issue Feb 10, 2020

Volume from polydata (point cloud) pyvista/pyvista-support#125

Closed

MuellerSeb linked a pull request Jan 14, 2021 that will close this issue

Automatic Binning #131

Closed

MuellerSeb mentioned this issue Jan 14, 2021

Automatic Binning #131

Closed

MuellerSeb closed this as completed Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ordinary Kriging: help please? #36

Ordinary Kriging: help please? #36

banesullivan commented Oct 22, 2019 •

edited

Loading

LSchueler commented Oct 23, 2019

LSchueler commented Oct 23, 2019

banesullivan commented Oct 23, 2019 •

edited

Loading

javoha commented Oct 29, 2019

LSchueler commented Nov 28, 2019

banesullivan commented Dec 4, 2019

MuellerSeb commented Dec 8, 2019

MuellerSeb commented Dec 9, 2019

banesullivan commented Dec 9, 2019 •

edited

Loading

banesullivan commented Dec 9, 2019

banesullivan commented Dec 9, 2019

MuellerSeb commented Dec 10, 2019 •

edited

Loading

MuellerSeb commented Dec 10, 2019

banesullivan commented Dec 10, 2019

LSchueler commented Dec 10, 2019

MuellerSeb commented Jan 8, 2020

banesullivan commented Jan 10, 2020

MuellerSeb commented Jan 10, 2020

banesullivan commented Jan 10, 2020 •

edited

Loading

MuellerSeb commented Jan 10, 2020

MuellerSeb commented Jan 27, 2021

Ordinary Kriging: help please? #36

Ordinary Kriging: help please? #36

Comments

banesullivan commented Oct 22, 2019 • edited Loading

The Example

Data

"Truth"

LSchueler commented Oct 23, 2019

LSchueler commented Oct 23, 2019

banesullivan commented Oct 23, 2019 • edited Loading

javoha commented Oct 29, 2019

LSchueler commented Nov 28, 2019

banesullivan commented Dec 4, 2019

MuellerSeb commented Dec 8, 2019

MuellerSeb commented Dec 9, 2019

banesullivan commented Dec 9, 2019 • edited Loading

banesullivan commented Dec 9, 2019

banesullivan commented Dec 9, 2019

MuellerSeb commented Dec 10, 2019 • edited Loading

MuellerSeb commented Dec 10, 2019

banesullivan commented Dec 10, 2019

LSchueler commented Dec 10, 2019

MuellerSeb commented Jan 8, 2020

banesullivan commented Jan 10, 2020

MuellerSeb commented Jan 10, 2020

banesullivan commented Jan 10, 2020 • edited Loading

MuellerSeb commented Jan 10, 2020

MuellerSeb commented Jan 27, 2021

banesullivan commented Oct 22, 2019 •

edited

Loading

banesullivan commented Oct 23, 2019 •

edited

Loading

banesullivan commented Dec 9, 2019 •

edited

Loading

MuellerSeb commented Dec 10, 2019 •

edited

Loading

banesullivan commented Jan 10, 2020 •

edited

Loading