Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Need to clarify if required means a required column OR also required valid data within column #70

Open
e-lo opened this issue Jun 13, 2023 · 3 comments

Comments

@e-lo
Copy link
Contributor

e-lo commented Jun 13, 2023

As an implementer of GMNS, I'd like to understand if the required constraint applies to values or columns (or both)

Frictionless spec is really just checking for the column presence and allows for missing values.
If we don't want missing values, then we need to assert pattern or enum or other constraints.

e-lo added a commit to e-lo/GMNSpy that referenced this issue Jun 14, 2023
Provides the functionality to validate against a remote schema hosted on github by github reference (e.g. master HEAD, version tag, etc.). It also pulls out a "spec" as a class object to make it easier to pass around and abstract the differences between getting a spec from Github vs Local files.

This PR also:

- formats code per pre-commit
- refactors apply_schema_to_df into smaller functions
- updates errors and warning to be more helpful (pointing to files, etc)
- adds tests for field constraints in constraint_tests.py
- adds assert statements to basic_tests.py

Closes #25 
Closes #26

NOTE: Tests failing due to zephyr-data-specs/GMNS#70
@ianberg-volpe
Copy link
Collaborator

Our intent was to use the term required to require column presence and prohibit missing values. I think this lines up with the definition used by Frictionless Table Schema's constraints:

Property Type Applies to Description
required boolean All Indicates whether this field cannot be null. If required is false (the default), then null is allowed. See the section on missingValues for how, in the physical representation of the data, strings can represent null values.

Is the issue actually in the frictionless python package's implementation of the term?

@ianberg-volpe
Copy link
Collaborator

@dtemkin-volpe flagging for your work with frictionless. I know the python package has been updated since I made this comment last year, has it changed what it means by "required"?

@dtemkin-volpe
Copy link
Collaborator

From what I can tell, "required" just means that the field can't be null, and "missingValues" is an array of values that when processed by the frictionless python package, equate to null. For example, we list an empty string as a "missingValue" in the node table, and you can see in the cambridge_intersection.sqlite file that these are represented as null values and not as empty strings. So, if required is true and "" is in missingValues, then "" can't be a possible value in that entry (or at the very least, it'll raise an error when we try to create an SQLite database).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants