Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not assume that an entity with type File is a data entity #62

Open
simleo opened this issue Jan 17, 2025 · 1 comment
Open

Do not assume that an entity with type File is a data entity #62

simleo opened this issue Jan 17, 2025 · 1 comment

Comments

@simleo
Copy link
Member

simleo commented Jan 17, 2025

We currently have this check for entities whose type is usually used for data entities:

       [   MUST 13.1    ]  Data Entity MUST be directly referenced:            
                           Check if the Data Entity                            
                           is linked, either                                   
                           directly of inderectly,                             
                           to the Root Data Entity                             
                           using the hasPart (as                               
                           defined in schema.org)                              
                           property"                                           

I.e. we assume that entities of a certain type are data entities even if they are not in the root data entity's hasPart and then require that they are listed in the root data entity's hasPart. This is causing problems in nextflow-io/nf-prov#39 (implementation of WRROC for Nextflow), see in particular nextflow-io/nf-prov#39 (comment).

@kikkomep
Copy link
Member

The specifications state that:

Data Entities representing files must have “File” as the value for @type. “File” is an RO-Crate alias for http://schema.org/MediaObject.

However, in the comments you mentioned:

… entities of type File are not necessarily Data Entities.

If both statements hold true, the File type can be used to denote both a File Data Entity and an entity that is not a Data Entity, making it impossible to uniquely represent a File Data Entity and distinguish it from generic File entities.

This lack of precise terminology to denote data entities complicates the validation of other requirements, such as the requirement MUST 13.1 you mentioned above, which refers to the specs statement:

When files and folders are represented as Data Entities in the RO-Crate JSON-LD, they must be linked, either directly or indirectly, to the Root Data Entity using the hasPart property.

Without a clear and unambiguous way to represent a Data Entity, it becomes impossible to automatically verify that all data entities are referenced from the Root Data Entity.

The assumption underlying the check implementation you mentioned in the issue is simply to mitigate this ambiguity and make the specification requirement automatically verifiable.

The only action we can take to address the issue you’ve raised is to disregard this (and potentially other) unverifiable requirement(s) until more precise terminology is introduced to accurately represent File Data Entities and distinguish them from generic File entities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants