-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weekly hash check to NUTS file and XLSX to YAML utility function #40
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general thought from my side. Now that you save a copy of the NUTS file as part of this repository, you don't really need the yaml files anymore. I'd suggest to do away with the yaml files and just read the Excel file directly. This way you also save the additional step of converting the Excel to yaml in case there is any update. And you exclude the error case where the Excel file and the yaml files are out of sync.
@danielhuppmann, what are your thoughts on the matter?
In addition, one comment below in line.
First question: is the performance difference between an xlsx spreadsheet versus the yaml files noticeable? Second question: if Eurostat again changes the file (hosted on the same url), we will only be notified that the hashes don't match, without any guidance on the actual change... Having the xlsx-to-yaml utility can be a useful way to find the change and/or check whether the relevant for us. |
Technically, yaml should be faster than xlsx. However, in practice, reading data is orders of magnitude slower so the difference does not matter.
Ok, if Eurostat does not provide a changelog then I can see the value of comparing yaml files. We should still be careful though that the Excel file and the yaml files match. Otherwise we might get a wrong warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to be merged from my side.
The only thing that might be nice to add is a short markdown or rst file on how to update the NUTS regions in case the file changes again. It might seem straightforward now but in a year or two we are guaranteed to forget about it.
YAML files are also comparable in GitHub changelogs, whereas XLSX files are not, so it allows tracking changes to the regions. |
Closes #38
Adds a weekly GH action to compare the hash of a local NUTS file with the EUROSTAT website version
Adds a XLSX to YAML utility function for use when updating the local file to latest version