Add uc 08 derive simulation production configuration parameters #1098

tobiaskleiner · 2024-08-01T13:25:14Z

This PR adds the functionality defined in UC8 with the following four applications:

simtools-production-calculate-resource-estimates
-- calculates compute and storage resources --
simtools-production-generate-grid
-- generates a grid of simulation points --
simtools-production-generate-simulation-config
-- generates simulation parameters for a specific grid point --
simtools-production-scale-events
-- metric evaluation and statistical error calculations --

Modules are stored in production_configuration:

calculate_statistical_errors_grid_point.py
derive_computing_resources.py
generate_production_grid.py
generate_simulation_config.py
interpolation_handler.py

GernotMaier · 2024-09-10T09:37:50Z

Would like start the discussions how to best organize the code. At this point most of it is in the applications, but we want to move it into modules.

Suggest to start a module to put those classes in. What would be a good name?

simtools.production_tools (don't like tools, seems to cover everything)
simtools.production_configuration
simtools.simulation_production
...something shorter...?

This module would include all classes currently defined in

simtools/applications/derive_resources.py (which will be renamed to simtools/applications/derive_computing_resources.py
simtools/applications/generate_grid.py (which will be renamed to simtools/applications/generate_production_grid.py)
simtools/applications/generate_simulation_config.py
`simtools/utils/calculate_statistical_errors_grid_point.py (maybe we can find a shorter name)

@orelgueta , @tobiaskleiner : please comment.

orelgueta · 2024-09-10T09:52:24Z

Not much for me to comment. This is the first time I see that those modules are in applications and I definitely agree they should go into simtools instead. In terms of which name to put them under, I would vote for simtools.production_configuration.
However, I would also check if we can integrate these classes with the current modules we have (at least partly).

tobiaskleiner · 2024-09-10T13:40:24Z

I agree with the simtools.production_configuration suggestion and to move some parts into visualization or in other parts.

GernotMaier · 2024-10-08T07:41:47Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+
+        # Determine the effective area threshold (50% of max effective area)
+        max_efficiency = np.max(efficiencies)
+        threshold_efficiency = 0.1 * max_efficiency


The docstring says "exceeds 50%". Do you use actually 10%?

GernotMaier · 2024-11-06T14:36:17Z

@tobiaskleiner - before I start the review, could you make sure that the integration tests run successfully?

GernotMaier · 2024-11-07T10:19:12Z

Running the example production_generate_simulation_config gives:

 python simtools/applications/production_generate_simulation_config.py  --azimuth 60.0 --elevation 45.0 --nsb 0.3 --data_level "A" --science_case "high_precision"  --file_path tests/resources/production_dl2_fits/prod6_LaPalma-20deg_gamma_cone.N.Am-4LSTs09MSTs_ID0_reduced.fits --file_type "On-source" --metrics_file  tests/resources/production_simulation_config_metrics.yaml --site North
Effective Area Error (avg): 0.000, Reference: 0.020
Signal Efficiency Error: 0.020, Reference: 0.020
INFO::calculate_statistical_errors_grid_point(l278)::calculate_error_energy_estimate_bdt_reg_tree::Calculating Energy Resolution Error
Energy Estimate Error: 0.184, Reference: 0.050
Gamma-Ray PSF Error: 0.010, Reference: 0.010
Image Template Methods Error: 0.050, Reference: 0.030
error_eff_area {'relative_errors': array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.01799919e-08,
       2.21965306e-08, 3.88179482e-08, 5.76378038e-08, 7.53500614e-08,
       8.98535771e-08, 1.20081083e-07, 1.46732027e-07, 1.66036331e-07,
       2.16994290e-07, 2.42594940e-07, 2.92140457e-07, 3.35214284e-07,
       3.58574316e-07, 4.90549293e-07, 4.77865852e-07, 6.57241537e-07,
       6.62752408e-07, 6.89972899e-07, 1.03031658e-06, 1.26143022e-06,
       1.53338834e-06, 1.47603219e-06, 1.79685160e-06, 2.19451145e-06,
       1.94774912e-06, 3.34303448e-06, 4.51782360e-06, 3.89789862e-06,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00])}
INFO::production_generate_simulation_config(l180)::main::Simulation parameters: {'core_scatter_area': np.float64(200070.0), 'viewcone': np.float64(1053.0), 'number_of_events': 11000004994}
INFO::production_generate_simulation_config(l183)::main::Simulation parameters saved to: /workdir/external/simtools/simtools-output/production_generate_simulation_config/configured_simulation_params.json

Is a viewcone of 1053 reasonable? I assume it is in degrees.

Here and throughput the added code: there are no units anywhere. I assume that this output is in the units expected by CORSIKA, but I am not entirely sure. The code also relies on certain units of the values read from the DL2 file, and I think at least there the units are given in the header or table columns. I suggest to use units (as we do not know if DL2 files change in future).

GernotMaier · 2024-11-07T10:34:15Z

Also surprised how small the error on the effective areas area (same for the values in tests/unit_tests/production_configuration/test_generate_simulation_config.py)

GernotMaier · 2024-11-07T10:47:10Z

First: great work @tobiaskleiner! This adds an almost complete framework for the configuration.

This PR is unusual big with several independent applications added, all including important . Too late to split it up, but we need to do it in pieces (I will sent a review for the first application out soon).

I have looked until now at production_generate_simulation_config only, which determines the configuration parameters for a single grid point.

Quite a few of the statistics questions are open, including the details of the metrics. I would suggest that you add a small discussion note on this topic into the implementation gitlab (UC8) directory addressing the following question:

what metrics do we want to consider and how are they calculated
how to we determine from the metrics the number of required events
how to you plan to calculate the scatter area and view cone

I think it is easier to discuss the methodical approach there and not as part of the code review. Maybe for future we should discuss the methods before the implementation starts.

Additional, we should discuss the concepts / impact of data level and science cases in with implementation gitlab. I think the science cases are not documented anywhere? What are the assumptions / motivation?

GernotMaier

This is the first part of my review concentrating on simtools-production-generate-simulation-config and code called from this class. Further review will come, but I think it is more efficient to resolve for the issues related to one application (plus agree on the methods, see my comment in the PR) and then go to the next one.

Approve the overall structure and approach, this is good.

Todos:

need to agree on the statistical methods and metrics.
decide what do do with units (at this point there are no units)
decide how to document the methods
...

GernotMaier · 2024-11-07T06:36:23Z

simtools/applications/production_generate_simulation_config.py

+            "error_gamma_ray_psf": 0.01,
+            "error_image_template_methods": 0.03,}
+    """
+    if file_path and os.path.exists(file_path):


Replace by general.collect_data_from_file_or_dict. This is more flexible, as it allows also to load a file using an url, json format, etc (and is used commonly throughout the code)

GernotMaier · 2024-11-07T06:36:32Z

simtools/applications/production_generate_simulation_config.py

+
+        Example:
+
+        metrics = {


Would prefer to have them called uncertainty_eff_area, as these are not errors, Errors are typically due to measurements, not for metrics.

GernotMaier · 2024-11-07T06:39:24Z

simtools/applications/production_generate_simulation_config.py

+        else:
+            serializable_config[key] = value
+
+    logger.info(f"Simulation parameters: {serializable_config}")


Simulation parameters or Simulation configuration? My understanding is that it is configuration.
(sorry for the language remarks..)

yes, changed

GernotMaier · 2024-11-07T06:39:58Z

simtools/applications/production_generate_simulation_config.py

+#!/usr/bin/python3
+
+r"""
+Configure and run a simulation based on command-line arguments.


Does this application actually run simulations? I don't see it, seems like it is configuration only (I think this is what we want)

GernotMaier · 2024-11-07T06:42:03Z

simtools/production_configuration/generate_simulation_config.py

+generate simulation parameters for a specific grid point in a statistical error
+evaluation setup. The class considers various parameters, such as azimuth,
+elevation, and night sky background, to compute core scatter areas, viewcones,
+and the required number of simulated events.


replace error by uncertainty.

GernotMaier · 2024-11-07T10:30:32Z

...sources/production_dl2_fits/prod6_LaPalma-20deg_gamma_cone.N.Am-4LSTs09MSTs_ID0_reduced.fits

Files are small. Could be smaller by simply 'gzipping' them (not important, but easy to do)

GernotMaier · 2024-11-07T10:32:07Z

tests/integration_tests/config/production_generate_simulation_config.yml

+    OUTPUT_PATH: simtools-output
+    OUTPUT_FILE: "configured_simulation_params.json"
+
+  INTEGRATION_TESTS:


Suggest to add an integration test which compares the output of this application run with an expected output. See this example

GernotMaier · 2024-11-07T10:37:04Z

simtools/production_configuration/generate_simulation_config.py

+        """
+        return self.evaluator.data["viewcone"]
+
+    def calculate_required_events(self) -> int:


Do we have a unit test for this function?

GernotMaier · 2024-11-07T10:42:42Z

docs/source/user-guide/applications/production_generate_simulation_config.rst

Suggest to open an issue to add in a follow-up pull request:

a description of the assumptions, default values, statistical methods.

metric documentation

science cases documentation

Added here #1233

GernotMaier · 2024-11-07T11:00:57Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+
+                sim_events_data = hdul["SIMULATED EVENTS"].data  # pylint: disable=E1101
+                bin_edges_low = sim_events_data["MC_ENERG_LO"]
+                bin_edges_high = sim_events_data["MC_ENERG_HI"]


Is the assumption that we want to pass the metric over the energy range given by the DL2 file good? Would it make sense to define a valid energy range for a given metric? e.g. generate a simulation production configuration with a 0.1% statistical uncertainty in the 30 GeV to 300 TeV range (although we simulated from 10 GeV to 500 TeV?)

Already added a validity range with units to each metric in the yaml files.

tobiaskleiner · 2024-11-12T08:27:54Z

@GernotMaier thanks for the review. I went through the comments and adressed most of them. Few more points need discussion/implementation see #1233, #1227, #1219. Will let you know when I have fixed the unit tests for another review.

tobiaskleiner · 2024-11-15T16:20:36Z

@GernotMaier thanks again for the review of the first part of the PR. Went through your comments and fixed the unit/integration tests. I factored out the event scaling logic and added a file for helper functions in the production configuration folder. In there we could also move the dl2 reading part in a later stage.
Let me know if the changes make sense and if so you could start reviewing the other parts of the PR.

GernotMaier

A couple of more comments on the production_generate_simulation_config.

I will talk to you directly about a couple of points.

GernotMaier · 2024-11-19T06:41:17Z

simtools/applications/production_generate_simulation_config.py

+    The data level for the simulation (e.g., 'A', 'B', 'C').
+science_case (str, required)
+    The science case for the simulation.
+file_path (str, required)


Suggest to change the file_path doc string to

Path to file with MC events at CTAO DL2 data level. Used for statistical uncertainty evaluation.

GernotMaier · 2024-11-19T06:41:32Z

simtools/applications/production_generate_simulation_config.py

+elevation (float, required)
+    Elevation angle in degrees.
+nsb (float, required)
+    Night sky background value.


I think the units for the NSB is "Hz" (but please check)

Not 1/(srnscm**2) ?

please find it out. I know that in some places we use Hz (which requires the knowledge of the pixel fov)

GernotMaier · 2024-11-19T06:46:30Z

simtools/applications/production_generate_simulation_config.py

+    config.parser.add_argument(
+        "--data_level", type=str, required=True, help="Data level (e.g., 'A', 'B', 'C')."
+    )
+    config.parser.add_argument(


Agree, added this to the list of discussions.

GernotMaier · 2024-11-19T06:46:56Z

simtools/applications/production_generate_simulation_config.py

+        "--science_case", type=str, required=True, help="Science case for the simulation."
+    )
+    config.parser.add_argument(
+        "--file_path", type=str, required=True, help="Path to the dl2_mc_events_file FITS file."


See comment above.

Path to MC event file in DL2 format

Adding some comments here, but the changes are done in the separate part1 PR.
Changed the comment to your suggestion.

GernotMaier · 2024-11-19T06:48:20Z

simtools/applications/production_generate_simulation_config.py

+        "--metrics_file",
+        required=True,
+        type=str,
+        help="Path to YAML file containing metrics and required precision as values (required).",


I think you can remove the (required) from the comment (it is the only parameter with this added, although others are required).

GernotMaier · 2024-11-19T08:11:40Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+        max_error : float
+            Maximum relative error.
+        """
+        if "relative_errors" in self.metric_results["error_eff_area"]:


Trying to understand a case where relative_errors is not in metric_results by error_eff_area is filled. If I understand it correctly, both variables are always filled in calculate_metrics?

No currently this depends on the production_simulation_config_metrics file and what metrics are given there (i.e. which metric computation is required).

GernotMaier · 2024-11-19T08:25:42Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+            )
+        valid = (simulated_event_counts > 0 * u.ct) & (triggered_event_counts > 0 * u.ct)
+
+        uncertainties = np.zeros_like(triggered_event_counts) * u.ct**-0.5


Can you explain the '-0.5'?

good point, this was wrongly implemented and the errors should be dimensionless. Previously when keeping the units the rel error turns out with this dimension.

GernotMaier · 2024-11-19T08:29:19Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+
+        return efficiencies, relative_errors
+
+    def calculate_energy_threshold(self):


Suggest to replace the hardwired 10% by a variable (which default is 10%)

GernotMaier · 2024-11-19T08:31:22Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+        bin_edges = np.concatenate([bin_edges_low, [bin_edges_high[-1]]])
+        return np.unique(bin_edges)
+
+    def compute_histogram(self, event_energies_reco, bin_edges):


Suggest 'compute_triggered_event_histogram' to make the purpose of this function clearer.

GernotMaier · 2024-11-19T08:32:22Z

simtools/production_configuration/calculate_statistical_errors_grid_point.py

+        Parameters
+        ----------
+        event_energies_reco : array
+            Array of energies of the observed events.


Array of reconstructed energy per event

…UC-08-derive-simulation-production-configuration-parameters

…masim/simtools into add-UC-08-derive-simulation-production-configuration-parameters

…rameters' of https://github.com/gammasim/simtools into add-UC-08-derive-simulation-production-configuration-parameters

This reverts commit dd2cc81.

…UC-08-derive-simulation-production-configuration-parameters

ctao-dpps-sonarqube · 2024-11-26T13:55:10Z

Analysis Details

0 Issues

0 Bugs
0 Vulnerabilities
0 Code Smells

Coverage and Duplications

82.20% Coverage (92.80% Estimated after merge)
0.00% Duplicated Code (0.00% Estimated after merge)