Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation -- standardization example: NMAD used instead of STD #381

Closed
MatteaE opened this issue Jun 29, 2023 · 6 comments · Fixed by #390
Closed

Documentation -- standardization example: NMAD used instead of STD #381

MatteaE opened this issue Jun 29, 2023 · 6 comments · Fixed by #390
Labels
priority Needs to be fixed rapidly

Comments

@MatteaE
Copy link
Contributor

MatteaE commented Jun 29, 2023

At plot_standardization.py:101, it is claimed that "We perform a scale-correction for the standardization, to ensure that the standard deviation of the data is exactly 1." (and at line 141: "With standardized input, the variogram should converge towards one.").

But actually the code uses xdem.spatialstats.nmad() to compute the rescaling factor, and the empirical variogram later converges to 1.48 instead of 1 (see attachment).

xdem 0.0.10

variogram_empirical

@rhugonnet
Copy link
Member

rhugonnet commented Jun 29, 2023

@MatteaE Are you using different data than the example? The variogram looks OK at https://xdem.readthedocs.io/en/latest/advanced_examples/plot_standardization.html right after the standardization (which doesn't seem to affect the estimate: "Standard deviation before scale-correction: 1.0; Standard deviation after scale-correction: 1.0").

I realize that right now the default variogram estimator in sample_empirical_variogram is not "dowd" (which is the one to use to match the NMAD), while it is in wrapper functions estimate_model_spatial_correlation and infer_spatial_correlation_from_stable... Maybe something to change and which could explain your variogram (by default your variogram will be estimated using the "matheron" estimator).

@MatteaE
Copy link
Contributor Author

MatteaE commented Jun 30, 2023

@rhugonnet Yes, I am using a dh grid from SPOT and NASADEM.
Thanks for the suggestion about Dowd's estimator - I have now tried passing estimator = "dowd" to function sample_empirical_variogram, now the variogram converges to... 2.0! (See attachment).
Interestingly, if I run the unmodified example plot_standardization.py I also get a variogram which converges to 1.0, but if I pass estimator = "dowd" to sample_empirical_variogram (plot_standardization.py:127) then the variogram again converges to about 2.0
variogram_empirical_splitscale

@rhugonnet
Copy link
Member

rhugonnet commented Jun 30, 2023

OK perfect!
Yes, the factor of 2 is due to how Dowd's variogram is defined in SciKit-GStat (it needs to be divided by 2 to be compared to a NMAD). This is inconsistent with the fact that Matheron or Cressie's estimators are directly comparable to a STD, and should be fixed.

It's on my list of things to modify in SciKit-GStat, I'll open a PR there during the summer! 😉

To summarize, to-do-list for closing this issue:

  • Use Dowd's variogram consistently with the NMAD in the uncertainty tools,
  • Open a PR to SciKit-GStat to fix the 2 scaling factor in Dowd's estimator,
  • Clarify the section of the documentation on robust estimators,
  • Put a link to that section in the "Standardization" gallery example.

Anything else I missed @MatteaE?
Also note that we're going to rework the structure of spatialstats.py soon: #378

@MatteaE
Copy link
Contributor Author

MatteaE commented Jul 1, 2023

Thanks a lot for the detailed explanation @rhugonnet! Just one last question then - right now, do I need to do the scaling by 2 manually to later use the variogram parameters and de-standardize the integrated uncertainty? If yes, where? (I use functions fit_sum_model_variogram, number_effective_samples and neff_circular_approx_numerical)

@rhugonnet
Copy link
Member

@MatteaE Good question! You don't need to scale 🙂.
The function number_effective_samples considers that the sum of partial sills of the variogram adds up to 100% of the total sill (https://github.com/GlacioHack/xdem/blob/main/xdem/spatialstats.py#L2007), so the total sill of the variogram does not matter.
(the assumption behind is that all the total average variance is observed in the variogram sampling; which is largely true for DEMs as long as you have enough stable terrain samples)

@rhugonnet rhugonnet added the priority Needs to be fixed rapidly label Aug 1, 2023
@rhugonnet
Copy link
Member

Scale factor of Dowd's estimator fixed in scikit-gstat: mmaelicke/scikit-gstat#158
Will try to fix the rest linked to this issue today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Needs to be fixed rapidly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants