Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision #9

Open
4 tasks
MuellerSeb opened this issue Oct 25, 2021 · 10 comments
Open
4 tasks

Vision #9

MuellerSeb opened this issue Oct 25, 2021 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@MuellerSeb
Copy link
Member

I really like the pace of this project and I am really thankful that @LSchueler started this and @adamreichold jumped in and already put so much work into speeding this up. I didn't have the time to take a deeper look but I already digged through some rust tutorials to catch up some day 😉

At some point I'd like to discuss, what the aim and the vision of this package is. I could imagine this repository to be the common part of PyKrige and GSTools for the upcomming version 2 of both packages to increase interoperability.
We already created this project to track this: https://github.com/orgs/GeoStat-Framework/projects/1

We now already have some routines implemented here, that can be used by both projects (kriging summation and variogram estimation)

What we would need is:

  • representation of covariance/variogram (including yadrenko variant) models (and implementation of all models of GSTools)
  • implementation of geometric operations ( [an-]isometrize, (an-)isotropify, (de-)rotate) of GSTools (link)
  • local kriging (moving window, n-nearest neighbors and fix radius)
  • netcdf interface (re-implemetation of https://git.ufz.de/chs/progs/edk_nc)

One problem I see at the moment is the limit set of available "special" functions in rust, that are needed for some covariance models. There are some libraries already available, but with a limited set of functions (we need an overview of what is needed):

This package could then also be a geostatistical package for rust (ATM I only see friedrich) with python-bindings for GSTools and PyKrige as described above. This would be awesome! 🎉

@LSchueler @adamreichold what do you think?

Cheers, Sebastian

@MuellerSeb MuellerSeb added the enhancement New feature or request label Oct 25, 2021
@MuellerSeb MuellerSeb pinned this issue Oct 25, 2021
@adamreichold
Copy link
Contributor

netcdf interface

Are data exchange formats really something the core code needs to know? What problems are there with passing NumPy arrays around without indicating whether those we loaded from or will be stored into NetCDF files?

One problem I see at the moment is the limit set of available "special" functions in rust

There are bindings for the GNU Scientific Library which includes quite a few of those: https://www.gnu.org/software/gsl/doc/html/specfunc.html The worst case here would probably be that additional bindings need to be written.

@LSchueler
Copy link
Member

LSchueler commented Oct 25, 2021

Thanks for compiling this. I think this is a really cool project.
Regarding the special functions, the worst case scenario (which wouldn't be too bad) is to use C or C++ libraries for the more exotic ones, which could be replaced one by one when Rust implementations become available.

@adamreichold This would be much more than just the core computations, but rather a complete geostatistical Rust library. There are some applications where huge data sets (TB or at least 100s of GB) need to be processed. In these cases it is necessary to flush the data to disk every once in a while. Therefore the NetCDF interface.

@LSchueler
Copy link
Member

As a first step, I would suggest to include the calls to GSTools-Core in GSTools and replace the Cython code. This would include cleaning up and reworking the deployment of GSTools. I think the experience we gather in that process will help later on. And I think we are nearly there.
What do you think?

@adamreichold
Copy link
Contributor

I think the experience we gather in that process will help later on. And I think we are nearly there.

I think an incremental approach is almost always preferable especially since this would avoid having to chase a constantly moving feature set but being able to port/optimize what benefits from the effort required to do so.

In these cases it is necessary to flush the data to disk every once in a while. Therefore the NetCDF interface.

So this is about incremental processing which cannot be expressed as "read a chunk of the data into an array; process that array; write the results out; repeat"?

@adamreichold
Copy link
Contributor

Is there a branch/pull request somewhere which prototypes integration of this crate/package into one of the target Python packages?

@LSchueler
Copy link
Member

I'm currently working on that. I'll post a link here as soon as it's pushed.

@LSchueler
Copy link
Member

LSchueler commented Oct 27, 2021

So this is about incremental processing which cannot be expressed as "read a chunk of the data into an array; process that array; write the results out; repeat"?

I'm not familiar with this specific application. @MuellerSeb, can you tell us something about the problems you faced there?

@LSchueler
Copy link
Member

LSchueler commented Oct 27, 2021

The GSTools-Core - GSTools integration is being prepared in this branch. It's getting exciting!
This is the relevant PR.

@MuellerSeb
Copy link
Member Author

Required special functions

Additionally needed for the spectral densities or similar

So it seams, that all required special functions are available.

@MuellerSeb
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants