-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic binning #55
Comments
@MuellerSeb, what is the
|
It should be contantly 2, since it is the euclidean norm.. I just mixed it up with the d-norm. Comment was updated. |
@MuellerSeb, I did this for the example on #36 and it has really great results. The variogram doesn't look all that pretty, but the results were compelling. I will make a PR with this now |
Thanks for your PR. @LSchueler is working on this at the moment, so I have closed your PR in favor of the re-written routines. But it is good to hear, that the scheme was working for you! That means we are on the right way. |
@LSchueler: It would be nice to get some information on the variance within each bin during the variogram estimation. One could use a sequential update during the estimation, so we don't consume memory: Where the We could later use this variance as a target function to estimate anisotropy and rotation by minimizing it. Or we could use this as weights during the fitting of a variogram function. |
Then we could get the following values for each bin:
|
Hi everyone, EDIT: Sorry about that! |
Regarding your idea here I think it's a good idea and I tried something similar before (the rotation and stretching part not the minimizing variance part) what I failed to accomplish (in my quick tests without investing much time) was to properly de-stretch the data, since one would need to stretch based on variance. My problem was, that very anisotrophic and very sparse data (well drilling data) let to bad estimation results. I did not invest a lot of time in it though. Regarding the rotation: Generating a test model: from scipy.spatial.transform import Rotation as R
import numpy as np
import gstools as gs
import matplotlib.pyplot as plt x = y = np.arange(100)
model = gs.Exponential(dim=2, var=1, len_scale=[12.0, 3.0], angles=np.pi / 8)
srf = gs.SRF(model, seed=20170519)
srf.structured([x, y])
srf.plot()
val = srf.field.copy()
(gridx, gridy) = srf.pos Selecting a random unstructured subset of the data: X, Y = np.meshgrid(gridx, gridy)
idx = np.random.choice(X.size, 5000)
x = X.flatten()[idx]
y = Y.flatten()[idx]
v = val.flatten()[idx] And rotating it: r = R.from_euler('z', 180-67, degrees=True)
xyz = np.stack((x, y, np.zeros_like(x))).T
xyzr = np.apply_along_axis(r.apply, 1, xyz)
f, axes = plt.subplots(1,2, figsize=(10,5))
axes[0].scatter(xyz[:,0], xyz[:,1], c=v, marker='.')
axes[1].scatter(xyzr[:,0], xyzr[:,1], c=v, marker='.') I found that for my examples the speed of the rotation operation by far fast enough. |
I just did a quick test and found that there is a slight error between the formulas you proposed and import numpy as np
for n in [100, 1000, 10000, 100000]:
x = np.random.randn(n,1) * 3 + 1
ninv = 1./float(n)
print((n, ninv, np.mean(x)))
x_var_i = 0
x_dash_i = x[0]
for i in range(1, n):
x_dash_i_new = x_dash_i + 1/i * (x[i] - x_dash_i)
x_var_i = x_var_i + 1/i * ( (x[i] - x_dash_i) * (x[i] - x_dash_i_new) - x_var_i )
x_dash_i = x_dash_i_new
print('mean: np.mean={}, x_dash_n={}, diff={}'.format(np.mean(x), x_dash_i[0], np.abs(np.mean(x) - x_dash_i)[0]))
print(' var: np.var ={}, x_var_i ={}, diff={}'.format(np.var(x), x_var_i[0], np.abs(np.var(x) - x_var_i)[0]))
print('') yields:
|
@TobiasGlaubach : Indeed they should match. The advantage of the sequential formulation is, that you can update the current value for variance, when there comes new data. Since the cython routines loop over all pairs of points and then decides for the bin to put these combination into, it is more convenient to use a sequential update of the variance value, so we don't have to store all values to calculate the variance afterwards (memory efficiency). I guess the difference in the variance formulation comes from the use of population vs. sample variance: https://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance where the only difference is the divisor:
For increasing Aaaand in your code-sample, you are starting at Regarding your de-stretching question, GSTools provides routines to de-stretch and re-rotate fields given here: GSTools/gstools/field/tools.py Line 114 in 2c26d74
GSTools/gstools/field/tools.py Line 136 in 2c26d74
|
Thanks for looking into this. Actually the starting at 1 is no problem if one sets x_var_i = 0
x_dash_i = x[0] I am currently working on estimate.pyx for the directional variogram estimation in #83. I can implement the variance and mean estimation as well. So it can be used for other things like the mentioned anisotrophy and orientation fitting. |
Closed by #131 |
This is just a note, so we don't forget and can talk about it:
My proposal for default bining:
number of bins calculated by either:
(a) Sturges rule:
(b) Rice rule:
maximal bin edge as a third of the box diameter:
resulting bin sizes (quadratic growth):
Example: 30 points with
diam = 900
and Struges rule:The text was updated successfully, but these errors were encountered: