-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions to improve created APIs #53
Comments
Regarding first point, i think there is value in having a single predictable ID that reflects the location in the archive file. However, sure, if that's in If i look at the electrides dataset in the client (https://optimadeclient.materialscloud.io/) , on one hand i agree that it would look better if the I could think of other cases where the folder stucture is good to keep as well. Let's say there are structures in two subfolders of a hypothetical entry:
What if the two cif file lists are non-overlapping (bit of an edge case, i agree)? then the ids would only contain the filename, but i think the full path would be preferable. Or some machine-learning dataset:
Probably good to keep the folder structure here as well. |
This is what I've done in #55 for now; we can still discuss whether it is worth merging though. The other motivation here is that right now the property files also have to include the full file path, which ends up looking pretty weird (and its backwards, you have to know how the archive will be laid out once recording the properties...)
Yep, agreed. My suggested implementation in #55 would generate IDs as |
Thanks @ml-evs, I think these changes are good. Fully agree that it's not nice if the property files need to include the full path including the name of the archive/compressed file. Feel free to finalize this implementation and notify me when i could give it a test run. |
I've just been trying to make use of my own dataset through the API here, and have collected a few comments:
id
field. At the moment we use precisely the relative filepath that the structure came from. I wonder if a better default would be attemptingfilepath.split("/")[-1].split(".")[0]
(i.e., take just the filename). This is not guaranteed to be unique; rather than having some awkward config for specifying this, we could attempt to generate the unique set of IDs from first the filenames themselves, then the first folder up, then second folder up etc. and only fallback to the full file path when required. Theimmutable_id
field (which we dont use atm) can be set by the relative filepath still. I will try experimenting with this...last_modified
to be the datetime of extractionThe text was updated successfully, but these errors were encountered: