Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: helper functions to handle missing observations #79

Open
jmp75 opened this issue Jan 6, 2018 · 5 comments
Open

Feature: helper functions to handle missing observations #79

jmp75 opened this issue Jan 6, 2018 · 5 comments

Comments

@jmp75
Copy link
Contributor

jmp75 commented Jan 6, 2018

It seems objective functions under spotpy.objectivefunctions do not handle missing values (NaN) in observations out of the box. In effect this currently results in algorithms spinning wheels with nonsense fitness values.

  1. there should at least be a helper function to help censor modelled and corresponding observed data points out of the numpy arrays
  2. existing objective functions could be made to censor missing data points by default,
  3. or there should be facilities to pipeline array preprocessing into objective functions. It could get wider in scope.

item (1) is a given, 2 and 3 are for discussion.
I started drafting something in a fork but before investing substantial time on 2 and 3 would like a discussion.

@kbstn
Copy link
Contributor

kbstn commented Jan 9, 2018

Hi,
i had the same issue here. My idea was to make objectivefunction able to take pandas DataFrames.
Until now it requests numpy arrays.

If it would be able to take pd.DataFrame we could use the advantage of having an index.

With index we could:

  • bring simulation and evaluation list to the same lenght and same index
  • dropping indices wehere evaluation contains NaN by keeping index ( df_sim = dfsim[dfsim.index.isin(dfev.index)])
  • access them like arrays (df_sim.values) an use them for objectivefunctions

this month i dont have time to work on this issue, just wanted to share my ideas

@thouska
Copy link
Owner

thouska commented Jan 9, 2018

Thanks for your ideas. I think it is a very good idea to have a helper function in spotpy.objectivefunctions. I l like the way, how the different nans are masked and removed in the fork of @jmp75 . I think, if we build up on this, we could enable point 2, to have a exluction of missing observation data point by default. This could be activated if the given simulation and evalution lists have not the same lenght (this is checked for every objective function in line 17).
Would be could, if we find a way, which does rely on pandas, in order to keep the dependencies as low as possilbe. However, Pandas support would be nice to have.

@philippkraft
Copy link
Collaborator

I've used another way to handle this in the cmf 1d example. This approach needs numpy arrays but not pandas, which is a pain to keep as a dependency.

@juancastilla
Copy link
Contributor

juancastilla commented Mar 6, 2018

I faced this issue while calibrating a groundwater model (MODFLOW) that may or may not converge depending on the parameters that are sampled by Spotpy. Whenever the model does not converge for a specific parameter set, I've added a simple if statement (if simulation == NaN) that returns "9999" or anything produces a ridiculously low likelihood. This has solved the NaN issue for me and I assume it has the added benefit of telling the sampler to steer away from regions in the parameter space where the model does not converge.

Pandas support would certainly make these issues easier to deal with and provide flexibility with plotting and managing the massive output files :)

@huard
Copy link

huard commented Dec 17, 2018

+1 for automated censoring of nans.

bees4ever added a commit to bees4ever/spotpy that referenced this issue Jan 29, 2019
* master: (22 commits)
  Added missing lines to allow for starting dream proposal vectors
  Update Version number upload to pypi
  Fix bug under mpi use
  Updates version number corresponds to upload on pypi
  Removed test for <Python3.6 due to deprectaed numpy version
  remove parameter interaction test for python 2
  Adopt test scripts to changes in examples
  Removed - sign from Rosenbrock example objectfivefunction
  Slight changes in sceua sampler and added corresponding tutorial
  Update _algorithm.py
  Work with None instead of np.NAN as this was not recognized
  Update __init__.py
  Update Version number, uploaded new pypi version
  Renamed keyword for saving switch
  Removes unfinished model runs from output file
  Enable automatic nan filtering for RMSE thouska#79
  Added comment
  Further version compability test
  Force pytest_cov down to v2.6
  Force decrease version of pytest_cov as v2.6.1 is deprecated
  ...

# Conflicts:
#	.travis.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants