Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Making scikit-gstat and gstools work well together #92

Closed
mmaelicke opened this issue May 10, 2020 · 3 comments
Closed

[Enhancement] Making scikit-gstat and gstools work well together #92

mmaelicke opened this issue May 10, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@mmaelicke
Copy link
Member

I'll open an issue, still no access to the project board. This is a discussion issue for the GS-Framework v2.

To bring gstools and scikit-gstat work well together I see a few avenues we could take:

  1. we can use scikit-learns Estimator etc. and Pipeline to interface.
  2. skgstat.Variogram could export a fitted CovModel, which would partly contradict [Refactor] prefere "cor" to specify userdefined CovModel #90
  3. Combination

A main challenge I see is that the skgstat.Variogram is already fitting the model to the empirical variogram on instantiation. I cannot see how an object in that state can be transformed into a CovModel in a concise and clean way, which can be only the model without anything fitted, right @MuellerSeb ?

1. using sklearn

This is as of now my favorite path. We can either directly implement the estimators interface directly into existing classes, or provide a helper class that inherits from Estimator and handles the existing classes.
For scikit-gstat direct inheritance will break my data-flow and therefore a major version shift will be neccessary.
We would also have to define, what the result or outcome of these classes is. In the case of Kriging: The PyKrige classes should stick to Estimators or Predictors as well and would need the variogram estimation result for config and i.e. a meshgrid or something else to predict on (in the predict method).
I see the main advantages that GS-Framework can easily be used in data science workflows, where sklearn is really common. For GS-Framework would gain clean and clear interfaces between classes and skgstat.Variogram could be used instead of CovModel and vice versa. In this case, scikit-gstat can focus more on fitting, variogram analysis and CovModel with the Cython backbone would be the performant, fast big brother for production.
Just suggestions.
A possbile result could be the varigram parameters along with the fitted model as a callable. Should be enough for Kriging.

2. Direct interface

I only have this in the list as it is way easier to implement and would not impact the whole framework as with 1.
At the moment I have an experimental feature that does exactly this: https://github.com/mmaelicke/scikit-gstat/blob/master/skgstat/interfaces/gstools.py

It is not working correctly at the moment and I might remove it directly again if we go for another avenue.
The advantage of this implementation would be that we can keep both packages in their current logic while still offering a way to map between both to the users, which will definitely be appreciated.

The main challenge from my current point of view is: Once you instantiate a skgstat.Variogram, it will run through the fitting procedure. That means it's per se a theoretical function fitted to empirical values. It is meant to be used as an analysis tool, because you can change all parameters and stuff at runtime, immediately yielding the new fit. gstools.CovModel is as far as I understand it different. You can create the Model and use it and variogram fitting is just one thing that you might do to it or not.
Hence, the only thing that makes sense is that CovModel indicates if it was fitted or not (which it might well do already). The interface paths would be

  • skgstat.Variogram --> gstools.FittedCovModel or
  • skgstat.Variogram <--> gstools.FittedCovModel if you see benefits here.
    The only option to export the default CovModel would be to use it in Variogram.model somehow, but here it might be way easier not to allow that and see that all theoretical model functions are available for both packages. Like a Gaussian class that can return the Variogram.model and the CovModel.cor.
    At the end of the day I think only the fitted versions would be helpful to interface, as for unfitted Models the user can simply provide the data to both classes, that's not really a hassle.

3. Combination

Here, we would go for 1 but implement a low-level export functionality, like Variogram.make_CovModel and CovModel_makeVariogram or whatsoever.
The main advantage here would be that while VariogramEstimator could well return the result needed for Kriging, I am not sure if it could be used for Field generation etc. And why would it?

In any case the sklearn pathways (1, 3) would also have a very personal advantage. Implementing the interface via. sklearn is not too complicated when using a helper function (although it has downsides) and I could do that in the near future. Then, future developments in one package do not necessarily be reflected in the other. Everyone could develop in his own speed.

For scikit-gstat I already played around and I have a working interface. It is far from unvailing all the power in sklearn and a bit clumsy, but it's working:
https://github.com/mmaelicke/scikit-gstat/blob/master/skgstat/interfaces/variogram_estimator.py

Open for discussion!

@mmaelicke mmaelicke added the enhancement New feature or request label May 10, 2020
@MuellerSeb MuellerSeb added this to the 2.0 milestone Aug 18, 2020
@MuellerSeb
Copy link
Member

Since GSTools 1.3 will cover all variogram/covariance models provided by scikit-gstat, we should add a to_gstools method to the Variogram class in scikit-gstat. This should be a simple mapping of the describe dict output to CovModel instances.

@MuellerSeb
Copy link
Member

Questions are:

  • how does scikit-gstat represent anisotropy and rotation?
  • how to get the dimension of the underlying data?
  • how to treat the directional variogram?

@MuellerSeb
Copy link
Member

With the new tutorial section in the scikit-gstat documentation, we made a big step forward: https://mmaelicke.github.io/scikit-gstat/tutorials/06_gstools.html

@MuellerSeb MuellerSeb unpinned this issue Jun 3, 2021
@MuellerSeb MuellerSeb modified the milestones: 2.0, 1.3 Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants