Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new data encapsulation #65

Closed
wants to merge 50 commits into from
Closed

Add new data encapsulation #65

wants to merge 50 commits into from

Conversation

LSchueler
Copy link
Member

@LSchueler LSchueler commented Jan 15, 2020

2 new classes are added in order to separate data and methods more
cleanly. This is very much WIP, but I'd like to get a discussion
started before I put in too much work.
The FieldData class contains information about the mesh and it
holds a dict containing the individual field values, which are
defined on that mesh, e.g. "krige" and "cond".

I'd like to more clearly differentiate between a class instance of Data and the actual field values contained in that instance, i.e. the numpy array. Any ideas?
For a better discussion, I used several different ways to access both, like

  • field["srf"]
  • field.values
  • field.field

Questions

  • Should the class Field be renamed to FieldMethods?
  • If a field is overwritten, with, say a new mesh_type or new pos, should only the field values be deleted or everything, including mean, ...?
  • What should the __call__ methods return?
    • the dictionary containing the field names and the Data instances
    • the Data instance of the default field
    • the values of the default field
  • Should the default_field in the Data class always be called "field", or should the name be more descriptive, like "srf" or "krige"?

TODO

  • cleanup
  • overload the __setattr__ method in order to assign fields to each other
  • create a better separation between the Data instances and the actual field values
  • check for attributes being set to one of the Field children, which now belong to Data
  • documentation
  • get all the tests to run

2 new classes are added in order to separate data and methods more
cleanly. This is very much WIP, but I'd like to get a discussion
started before I put in too much work.
The `FieldData` class contains information about the mesh and it
holds a dict containing the individual field values, which are
defined on that mesh, e.g. "krige" and "cond".
@LSchueler LSchueler requested a review from MuellerSeb January 15, 2020 18:10
@LSchueler LSchueler self-assigned this Jan 15, 2020
@LSchueler LSchueler added enhancement New feature or request help wanted Extra attention is needed Refactoring Code-Refactoring needed here labels Jan 15, 2020
@LSchueler LSchueler added this to the 1.2 milestone Jan 24, 2020
@MuellerSeb
Copy link
Member

I love the approach of being able to access data by using the index operator.
To be backward-compatible we should provide the "field" attribute, which should point to a "default-field". The rest can change.

One issue I have with the approach is, that it is almost a re-implementation of a mesh-container like in meshio:
https://github.com/nschloe/meshio/blob/792dfdd6296a8858e365a005a87afa605834ce66/meshio/_mesh.py#L9

In pyevtk, the export routine "pointsToVTK" also creates a mesh, where the "cells" are simple vertices:
https://github.com/paulo-herrera/PyEVTK/blob/41d7e14e9b92909088b3e789632b1fd518b60d22/src/hl.py#L221

So in general, I would follow the definition of a VTK file, where we have:

  • points (pos in our case),
  • cells (vertices for unstructured, hypercubes for structured),
  • point-data (scalar or vector [or tensor in VTK]) -> we are interested in this
  • field-data (maybe here we can store sth like the default field name)
  • cell-data (i dont think we need this)
  • all *-data as dict{str: np.ndarray / single value}

The Data could also be dropped. Since the value_type could be determined from the data shape and the mesh_type. Also the mean should only be connected with the current Field class.
For the class-names I would propose:

  • Data - drop
  • FieldData -> Mesh (with point_data and field_data {"default_field": "field"})
  • Field -> GSField (since it's the main actor in "GS"Tools) with:
    • "mean" as a field_data entry in the Mesh and a mean attribute pointing there
    • field attribute pointing to the point_data with the name of the default_field (from field_data)
  • value_type -> function determining the value_type of a numpy array (taking mesh_type and pos.shape)
  • within the current Field.mesh method, we could then also overwrite the underlying mesh with the given one.
  • in every call routine, we could leave the "pos" parameter optional to use the already present underlying mesh if nothing was given
  • if a new pos vector is given, the underlying point_data is erased

Then we could easy export meshes with meshio (at least in the unstructured case). Or we could totally move to pyvista and use it for IO and plotting, since there is no option to export a structured. (or we could just covert it to unstructured when exporting)

I guess @banesullivan would be happy about that, since we come closer to the VTK representation of a mesh.

These were my two cents ;-)

Copy link
Member

@MuellerSeb MuellerSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the other comment.



class Field:
class Data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be dropped. (See comment)

self.value_type = value_type


class FieldData:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be called a Mesh class.

raise ValueError("Unknown 'mesh_type': {}".format(mesh_type))


class Field(FieldData):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be the GSField class, since it is our working horse.

@MuellerSeb MuellerSeb mentioned this pull request Jan 28, 2020
3 tasks
@LSchueler
Copy link
Member Author

Thanks for your pretty thought out input!

I dig the name Mesh.

Are you suggesting to use nested dicts for storing something like the variance of the kriged field, like

m = Mesh()
m.add_field("krige_field", ...)
m.field_data["krige_field"]["var"] = 3.14

with the key of the field_data being the same as the name of the point_data? The alternative would be to keep the extra Data class, which we could of course rename to FieldData.

@LSchueler
Copy link
Member Author

As we don't have many variables belonging to auxiliary fields, we should simply add them to the Mesh class like var_krige.

@MuellerSeb
Copy link
Member

When a field is called with a new pos tuple, the mesh should be resetted.
For unstructured meshes, an append option could be interesting, for situations, where you want to iteratively generate new points without resetting the mesh.

@LSchueler
Copy link
Member Author

I've changed the SRF-class and especially the call method to being more mesh-centric. In this concept, it doesn't make a whole lot of sense to give the position-tuple everytime an SRF is to be calculated. The pos-tuple belongs to the mesh.

To not break the backwards compatibility, I've included warnings.

@LSchueler
Copy link
Member Author

@MuellerSeb Please explain to me how TestKrige.test_extdrift works! You are using a structured grid, but you are generating srf's on unstructured grids. That's the only test, which does not work on this branch.

@LSchueler
Copy link
Member Author

The way the conditioning functions get the raw_field is pretty ugly at the moment. But they are so tightly coupled to the SRF class, would it make sense to rework these functions and bring them into the SRF class?

@MuellerSeb
Copy link
Member

@MuellerSeb Please explain to me how TestKrige.test_extdrift works! You are using a structured grid, but you are generating srf's on unstructured grids. That's the only test, which does not work on this branch.

I'll have a look at it.

@MuellerSeb
Copy link
Member

@LSchueler : Found the problem. it is in the Mesh._check_point_data routine. The input pos tuple does not consist of flat arrays in this example. The Mesh class doesn't include any conversion of the pos-tuple like it was done before in the SRF class.

In the specific case, the x component of pos has a shape of (51, 61) and the field has a shape of 3111 (=51*61). in the checking routine this shape is compared to only the first entry of the shape of x (51) and there the error is thrown, that there is a shape missmatch.

I think, the pos tuple components should be flattened when the mesh is created (and converted to numpy arrays with floats).

@MuellerSeb
Copy link
Member

MuellerSeb commented Apr 29, 2020

Checklist:

  • add dim input and attribute to mesh class
  • pos tuple needs to be converted in __init__ (depending on dim as an array shaped (dim, pos_count) )
  • add axis input and attribute to mesh for sturctured meshes (pos then generated from axis on call)
  • add a convert method to convert structured to unstructured meshes
  • add a reset method to mesh, which takes the same arguments as init (init should call that)
  • any Field calls where pos is given should reset the unerlying mesh with the reset method (depending on the mesh_type, pos should be interpreted as pos or axis like it was before) )
  • SRF may always use the unstructured mesh internally (structured will be only used for variogram)... not sure about that one

@LSchueler
Copy link
Member Author

Thanks for compiling our chat from yesterday!

@MuellerSeb
Copy link
Member

Do we need another chat on that? Would be lovely to see this getting merged! :-)

@LSchueler
Copy link
Member Author

We have created such spaghetti code...

@MuellerSeb
Copy link
Member

We have created such spaghetti code...

Oh no... now I feel guilty!

@MuellerSeb MuellerSeb modified the milestones: 1.3, 2.0 Aug 18, 2020
@MuellerSeb
Copy link
Member

Let's focus on GSTools v2 with this one.

@MuellerSeb MuellerSeb linked an issue Aug 18, 2020 that may be closed by this pull request
@MuellerSeb
Copy link
Member

Some new thoughts on this:

  • actually I am quite fine with the way it is now: SRF/Krige/Field hold the information for the last generated field (pos + values)
  • with the mesh class, we could provide a container living outside all these Field classes
  • when evaluating on a mesh class, we can use the already present method: Field.mesh (link
  • then we can provide simpler interfaces to meshio, ogs5py, pyvista, vtk, and so on
  • code refactoring is also easier, since we don't have to care about multiple fields stored somewhere buried deep beneath __get_items__ magic functions

@LSchueler
Copy link
Member Author

I'll close this PR for now, as GSTools has meanwhile gone into a different direction.

@LSchueler LSchueler closed this Jan 26, 2021
@LSchueler LSchueler deleted the variogram_update branch August 10, 2021 07:30
@MuellerSeb MuellerSeb removed this from the 2.0 milestone Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed Refactoring Code-Refactoring needed here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A new tutorial for Field.mesh()
3 participants