Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordinary Kriging: help please? #36

Closed
banesullivan opened this issue Oct 22, 2019 · 21 comments
Closed

Ordinary Kriging: help please? #36

banesullivan opened this issue Oct 22, 2019 · 21 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@banesullivan
Copy link
Collaborator

banesullivan commented Oct 22, 2019

So I'm trying to get up and running with this software, and I'm finding it rather difficult to produce a dirty ordinary kriging result. I learned geostats/kriging using SGeMS and I'm realizing much of what I learned is difficult to transfer here (so far)... I'm curious if someone could help me get some quick/dirty results with a dataset that I've used in SGeMS before.

The Example

I believe my Gaussian model is probably problematic

import numpy as np
from gstools import Exponential, Gaussian, krige
import pyvista as pv

# Load the data
probes = pv.read("probes.vtp")
model_space = pv.read("model_space.vtr")

model = Gaussian(dim=3, var=10, len_scale=500)
cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']
# Create the kriging model
krig = krige.Ordinary(model, cond_pos=cond_pos, cond_val=field)

krige_field, krige_var = krig((model_space.points.T))

model_space["krige_field"] = krige_field
model_space["krige_var"] = krige_var

model_space.contour().plot(scalars="krige_field" )

download

I think I am running into trouble with the len_scale option in the Gaussian model. GSTools enforces a limit of 1000 on that but for this dataset, I may need a higher len_scale?

If you all were to use this given dataset to produce an example showcasing what GSTools can do, what would that be? This is all in an effort to help me build out how PyVista might interface with GSTools

Data

I have sparse, subsurface temperature probe data and want to krige a volume of these data.

data.zip

"Truth"

I'm working towards results that look more like this (prodcued in SGeMS):

Screen Shot 2019-10-22 at 4 29 33 PM

@LSchueler
Copy link
Member

Without having had a more in depth look at your problem yet and if the length scale bound is the problem, you can do following for example:

model = Gaussian(dim=3, var=10)
model.set_arg_bounds(len_scale=[0, 10000])
model.len_scale = 2000

@LSchueler
Copy link
Member

If you pull the newest version of GSTools, the boundaries are now much generous ;-)

@banesullivan
Copy link
Collaborator Author

banesullivan commented Oct 23, 2019

Okay, awesome! I pulled the latest changes and increasing the length scale is helping me approach a better solution.

Do you all have any examples, tips, or tricks for finding a good Gaussian model?

@MuellerSeb MuellerSeb added help wanted Extra attention is needed question Further information is requested labels Oct 23, 2019
@javoha
Copy link

javoha commented Oct 29, 2019

Hello,

let's see if I can make a suggestion (please correct me if I got anything wrong):

The len_scale of GSTools corresponds to the range describing the spatial correlation of data when performing variogram analysis. There is basically two ways to obtain a range for your model.

  1. Expert knowledge: You have some idea of the length scale of spatial correlation in your data set that you use as a range. In this case using the same range (len_scale) that you used for the SGeMS model should give you the same result. Just be aware that some software packages use the "effective range" (for the Gaussian model this is sqrt(3)*range I think).

  2. You perform a variogram analysis (basically a data driven approach to obtain a range). If I am correct GSTools offers this: https://geostat-framework.readthedocs.io/projects/gstools/en/latest/tutorial_03_vario.html

Apart from that you might have anisotropies in your model (in your example it looks like you have strong continuity in x and y direction (horizontal) and variation in z direction corresponding to the direction of sedimentation. Also in literature (e.g. Webster and Oliver (2008)) the Gaussian model is deprecated for approaching the origing with zero gradient. You might want to consider a different variogram model.

Maybe this is helpful and please correct me if I got anything wrong.

@LSchueler
Copy link
Member

Hi @banesullivan,

did the reply from @javoha (thanks for that!) help? - Could you reproduce the "truth" from SGeMS?

@banesullivan
Copy link
Collaborator Author

Thanks for the tips @javoha!

I haven't had the time to jump back into this... I will need to next week-ish and I will report back!

You might want to consider a different variogram model.

@javoha, Are there any other options for models when kriging with GSTools?

@LSchueler, feel free to close this issue as it's not really an issue/bug with the software but a support topic

@MuellerSeb
Copy link
Member

@banesullivan : you can use all provided models for kriging: provided-covariance-models
And of course self defined ones. ;-)

@MuellerSeb
Copy link
Member

Closing now. But thanks for the input ;-)

@banesullivan
Copy link
Collaborator Author

banesullivan commented Dec 9, 2019

So I'm trying this again but with another dataset that is a bit cleaner. Data attached and this data has a permissive license if you want to use it in an example in your docs - just let me know.

I've compiled with OpenMP support per #31 (comment) but as soon as I start running the kriging (either on a structured or unstructured mesh) the memory consumption blows up with over 100 GB used on the SWAP so I kill the task. This amount of memory usage seems ridiculous to me...

@MuellerSeb, @LSchueler, or @javoha: Can you all help in some capacity to make this example run?

Or can you put some 3D examples in the docs besides just a random field? I can't get anything to work in 3D with this library yet.

Data files: Archive.zip

import numpy as np
import matplotlib.pyplot as plt

import pyvista as pv

from gstools import (Exponential, Gaussian, krige, 
                     vario_estimate_unstructured, Stable)

grid = pv.read("grid.vtr")
assay = pv.read("assay.vtp")

cond_pos = assay.points.T # gstools convention
# conditional values - what i want to interpolate
field = assay["CU_pct"]


model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(40)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=40)
ax.plot(bin_center, gamma)
print(fit_model)

download

Not too shabby of a variogram, so I should be able to at least produce some results with this....

# Create the kriging model
krig = krige.Ordinary(model, 
                      cond_pos=cond_pos, 
                      cond_val=field)

And this is where the memory usage blows up

krige_field, krige_var = krig((grid.x, grid.y, grid.z), mesh_type="structured")

kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var

@banesullivan
Copy link
Collaborator Author

Or as unstructured:

krige_field, krige_var = krig(grid.points.T, mesh_type="unstructured")

kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var

@banesullivan
Copy link
Collaborator Author

And...

model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(100)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=100)
ax.plot(bin_center, gamma)
print(fit_model)

download

@MuellerSeb
Copy link
Member

MuellerSeb commented Dec 10, 2019

Good heavens!
I think the problem is here:

This function creates a matrix containing all pairwise distances between the conditional points and the given field-points. In your case a matrix with 1000000 * 8583 entries... quite huge.

We need to put this into chunks I guess. Therefore we need to introduce a controlling parameter like chunk_size which is maybe None by default, or something like 1e6.

ATM the output mesh has just too many points. Maybe as a work around you could chunk it manually, by cutting the x,y,z positions. Sorry for this inconvenience.

@MuellerSeb
Copy link
Member

Maybe try something like this:

krige_field = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)
krige_var = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)

for i in range(5):
    for j in range(5):
        for k in range(5):
            (
                krige_field[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20], 
                krige_var[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20],
            ) = krig(
                (
                    grid.x[i*20:(i+1)*20],
                    grid.y[j*20:(j+1)*20], 
                    grid.z[k*20:(k+1)*20],
                ),
                mesh_type="structured",
            )

@MuellerSeb MuellerSeb reopened this Dec 10, 2019
@banesullivan
Copy link
Collaborator Author

huh, okay this makes sense now why the memory usage was blowing up and makes me feel a bit better knowing it wasn't just a user issue.

With that suggested code, it runs without massive memory consumption but is taking ages to execute... (no result as of yet)

Can you all implement that behind the scenes? And possibly parallelize it or find another way to speed this up?

@LSchueler
Copy link
Member

Yeah that's indeed not a user issue, it's an inherent problem with Python.
In order to speed things up, Numpy and Scipy often rely on underlying C-code, which is compiled to machine code and thus really fast. But the problem is, these C-functions have to be fed by the Python code with all the data at once.
That's the reason why MuellerSeb's work around helps with the memory consumption.

We'll discuss some options.

And thank you very much for providing the data!

@MuellerSeb
Copy link
Member

@banesullivan : I started to re-implement the kriging routines. You can have a look at the feature branch: https://github.com/GeoStat-Framework/GSTools/tree/krige_update

There you now have the opportunity to define a chunksize when calling the kriging routine. The routines have also been optimized, so the kriging system is only determined once on creation.

Maybe you can play around with that and give a feedback ;-)

Cheers, Sebastian

@banesullivan
Copy link
Collaborator Author

Ah awesome, @MuellerSeb! I will have to try that out when I have a chance... been a bit focused on a specific project lately.

FYI, I managed to get pretty good results with the original data from the top post (see #36 (comment)) with the following Exponential variogram:

model = Exponential(dim=3,   # 3D
                    var=9e3, # Sill
                    len_scale=[5000, 5000, 850]
                   )

Screen Shot 2020-01-09 at 10 25 18 PM

I think it would be pretty cool to include this example in the docs somewhere... also it would be cool to switch the examples over to sphinx-gallery but in order to do that with PyVista, you would have to leave readthedocs.org for github pages


Also, since this is the original topic of this issue, I cannot get a decent variogram fit for that dataset. The following doesn't seem to work:

_, bins = np.histogram(field, 50,)
bin_center, gamma = vario_estimate_unstructured(
    cond_pos,
    field,
    bins,
    )


fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)

from gstools.covmodel.plot import plot_variogram

plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()

download

any tips?

@MuellerSeb
Copy link
Member

@banesullivan : That kriging result looks good indeed!
Could you provide a full working script to reproduce the temperature image? We are aiming at a better structure for the examples folder, so this could be nice for the collection.

Addressing the second problem:
Why are you using the bins from the histogram? These bins are setup in order to to proper bin the field values. But the bins in the variogram are needed for the pairwise distances of the field points. I think you are mixing things up there.

We are working on a new variogram estimation routine at the moment, where bining could be done automatically. So this should be more convenient in the future.

@banesullivan
Copy link
Collaborator Author

banesullivan commented Jan 10, 2020

Aha! I totally misinterpreted the purpose of the bins. Much better now:

Here is the data: probes.vtp.zip

Here is the full script:

import numpy as np
from gstools import Exponential, krige, vario_estimate_unstructured
import matplotlib.pyplot as plt
import pyvista as pv

# Load the data
probes = pv.read("probes.vtp")

cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']

bins = np.linspace(0, 12300, 1000)
bin_center, gamma = vario_estimate_unstructured(
    cond_pos,
    field,
    bins,
    )


fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)

from gstools.covmodel.plot import plot_variogram

plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()

download

# Create the kriging grid
grid = pv.UniformGrid()
grid.origin = (329700, 4252600, -2700)
grid.spacing = (250, 250, 50)
grid.dimensions = (60, 75, 100)

# Create the kriging model
krig = krige.Ordinary(fit_model, cond_pos=cond_pos, cond_val=field)

r = grid.cast_to_rectilinear_grid() # Use structed grid for better performance
krige_field, krige_var = krig((r.x, r.y, r.z), mesh_type="structured")

grid["temperature (C)"] = krige_field.ravel(order="f")
grid["krige_var"] = krige_var.ravel(order="f")

dargs = dict(cmap="coolwarm", clim=[0,300], scalars="temperature (C)")

p = pv.Plotter()
p.add_volume(grid, opacity="sigmoid_8", **dargs)
p.add_mesh(probes, **dargs)
p.show()

download


There's a lot of other data for this project with some isosurfaces of "true" temperature values. Here's a GIF with other stuff I didn't include here.

2020-01-10 12 54 03

@MuellerSeb
Copy link
Member

@banesullivan : Cool! Thanks for the scripts.. stunning!
#55 gives a proposal for automated bining. maybe you can try this and compare the results.

@MuellerSeb
Copy link
Member

@banesullivan : Automatic Binning was implemented with #131 and PyVista is now support with #59.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants