Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and simplify reading and writing of tools on the example of print_array_elements #683

Merged
merged 66 commits into from
Dec 1, 2023

Conversation

GernotMaier
Copy link
Contributor

@GernotMaier GernotMaier commented Nov 16, 2023

Addresses partly #643 and is a step towards a more unified approach of input / output of tools. This PR deals mostly with unified output.

Functionality is added and tested mostly with the print_array_elements.py application, but partly also to derive_mirror_rnda.py (and minor changes to other applications to allow integration tests to run).
Goal is to first get input/output/metadata/validation working for one simple applications and then propagate this functionality to all other applications.

Conceptually new: introduced simtools/constant.py to put "hardwired" values like the paths to the schema files. Not sure if this is a best solution, the internet says it is acceptable.

Changes:

  • use MetaDataCollector and ModelDataWriter for reading / writing of metadata and data
  • improved metadata collection (many more fields are non-null now)
  • introduction of CONTEXT:ASSOCIATED_DATA field to list origin / input data
  • simplified interface to ModelDataWriter by adding a dump function. The following lines are need to write a consistently data and metadata:
   writer.ModelDataWriter.dump(
        args_dict=args_dict,
        metadata=MetadataCollector(args_dict=args_dict).top_level_meta,
        product_data=data_table
    )
  • this replaced the method to write the output table in layout_array:export_telescope list (as used in print_array_elements)
  • use input and input_meta as command line parameters for input data/metadata at least for the three applications mention above (this needs another thought, but I think we are a bit inconsistent with our command line between the application)
  • IMPORTANT replaced command line parameter use_plain_output_path by use_simtools_output_path, reversing the logic from before. Default is write into output path (if not set, the path ./simtools-output/ is used) without the additional directories for tools, date, etc. (this will have to be propagated into other applications)
  • read name of schema file for data validation from metadata (from PRODUCT:DATA:MODEL:URL) in metadata_collector:get_data_model_schema
  • simplified some logging messages
  • the widely used method collect_data_from_yaml_or_dict downloads a file to temp disk and returns a dict if the file name is a url
  • some adaption to the metadata examples files in tests/resources (e.g., correct schema variable URLs)
  • add validation of data before writing it to disk. This means that using ModelDataWriter.dump with validate_schema_file set used this schema file to validate the output, transform it if necessary to the units listed in the schemafile.

@GernotMaier GernotMaier self-assigned this Nov 16, 2023
@GernotMaier GernotMaier changed the base branch from main to metadata_refactoring November 16, 2023 14:22
Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (b169b3b) 81.93% compared to head (83a0356) 81.84%.
Report is 2 commits behind head on main.

Files Patch % Lines
simtools/data_model/model_data_writer.py 33.33% 10 Missing ⚠️
simtools/data_model/validate_data.py 87.09% 4 Missing ⚠️
simtools/utils/general.py 89.65% 3 Missing ⚠️
simtools/layout/array_layout.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #683      +/-   ##
==========================================
- Coverage   81.93%   81.84%   -0.09%     
==========================================
  Files          40       41       +1     
  Lines        6128     6203      +75     
==========================================
+ Hits         5021     5077      +56     
- Misses       1107     1126      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GernotMaier GernotMaier changed the base branch from metadata_refactoring to main November 16, 2023 16:48
@GernotMaier GernotMaier marked this pull request as ready for review November 21, 2023 14:22
Copy link
Contributor

@VictorBarbosaMartins VictorBarbosaMartins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GernotMaier. Sorry for the delay, this was a relatively long one and it took me some time to understand the code. I provide here a couple of comments and I hope it helps improving the code.
Thank you!

simtools/applications/make_regular_arrays.py Show resolved Hide resolved
simtools/applications/make_regular_arrays.py Show resolved Hide resolved
simtools/applications/print_array_elements.py Show resolved Hide resolved
simtools/applications/print_array_elements.py Show resolved Hide resolved
tests/unit_tests/data_model/test_metadata_model.py Outdated Show resolved Hide resolved
tests/unit_tests/data_model/test_metadata_model.py Outdated Show resolved Hide resolved
layout.export_telescope_list(crs_name="corsika")
_table = layout.export_telescope_list_table(crs_name="corsika")
_export_file = tmp_test_directory / "test_layout.ecsv"
_table.write(_export_file, format="ascii.ecsv", overwrite=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to test whether it was written, I suggest opening the file and checking whether the table is the same as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an integration test we need to do (and I have it on my list to implement).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still on this PR or in another one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - we need to work on integration tests in our maintenance period (remember that we don't have good tests).

tests/unit_tests/utils/test_general.py Show resolved Hide resolved
Copy link
Contributor

@VictorBarbosaMartins VictorBarbosaMartins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering the comments. I think we are good to go now if the tests pass.

Copy link
Contributor

@orelgueta orelgueta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. Did not go through everything very carefully since @VictorBarbosaMartins is actually doing the review.


if self.data_model_name:
self._logger.debug(f"Schema file from data model name: {self.data_model_name}")
return simtools.constants.SCHEMA_URL + self.data_model_name + ".schema.yml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be shortened by changing the import to from simtools import constants or importing as a short name. Also, please use f-string instead of combining strings like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consciously want to "simtools.constants", as modules called constants are quite common. Writing it like this improves readability (as one does not have to scroll up to see which import it is).

What is an advantage of f-strings to "+"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, no problem to keep simtools.constants then, makes sense.

f-strings are significantly faster than adding strings and also for the purpose of consistency since I switched everything to f-string a while back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed.

simtools/data_model/metadata_collector.py Show resolved Hide resolved
simtools/data_model/metadata_model.py Show resolved Hide resolved
simtools/schemas/metadata.schema.yml Show resolved Hide resolved
simtools/utils/general.py Show resolved Hide resolved
simtools/utils/general.py Show resolved Hide resolved
@orelgueta
Copy link
Contributor

BTW, notice that there are some new lines uncovered by tests (codacy complains).

@GernotMaier
Copy link
Contributor Author

@VictorBarbosaMartins , @orelgueta - thanks for looking into the code. Will merge after the tests.

@GernotMaier GernotMaier merged commit 97c22a5 into main Dec 1, 2023
7 of 8 checks passed
@GernotMaier GernotMaier deleted the derive_array_elements_output branch December 1, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants