Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uc 08 derive simulation production configuration parameters #1098

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tobiaskleiner
Copy link
Collaborator

@tobiaskleiner tobiaskleiner commented Aug 1, 2024

This PR adds the functionality defined in UC8 with the following four applications:

  • simtools-production-calculate-resource-estimates
    -- calculates compute and storage resources --

  • simtools-production-generate-grid
    -- generates a grid of simulation points --

  • simtools-production-generate-simulation-config
    -- generates simulation parameters for a specific grid point --

  • simtools-production-scale-events
    -- metric evaluation and statistical error calculations --

Modules are stored in production_configuration:

  • calculate_statistical_errors_grid_point.py
  • derive_computing_resources.py
  • generate_production_grid.py
  • generate_simulation_config.py
  • interpolation_handler.py

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@GernotMaier
Copy link
Contributor

Would like start the discussions how to best organize the code. At this point most of it is in the applications, but we want to move it into modules.

Suggest to start a module to put those classes in. What would be a good name?

  • simtools.production_tools (don't like tools, seems to cover everything)
  • simtools.production_configuration
  • simtools.simulation_production
  • ...something shorter...?

This module would include all classes currently defined in

  • simtools/applications/derive_resources.py (which will be renamed to simtools/applications/derive_computing_resources.py
  • simtools/applications/generate_grid.py (which will be renamed to simtools/applications/generate_production_grid.py)
  • simtools/applications/generate_simulation_config.py
  • `simtools/utils/calculate_statistical_errors_grid_point.py (maybe we can find a shorter name)

@orelgueta , @tobiaskleiner : please comment.

@orelgueta
Copy link
Contributor

Not much for me to comment. This is the first time I see that those modules are in applications and I definitely agree they should go into simtools instead. In terms of which name to put them under, I would vote for simtools.production_configuration.
However, I would also check if we can integrate these classes with the current modules we have (at least partly).

@tobiaskleiner
Copy link
Collaborator Author

I agree with the simtools.production_configuration suggestion and to move some parts into visualization or in other parts.

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

6 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@tobiaskleiner tobiaskleiner marked this pull request as ready for review September 19, 2024 15:53

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.


# Determine the effective area threshold (50% of max effective area)
max_efficiency = np.max(efficiencies)
threshold_efficiency = 0.1 * max_efficiency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "exceeds 50%". Do you use actually 10%?

@GernotMaier
Copy link
Contributor

@tobiaskleiner - before I start the review, could you make sure that the integration tests run successfully?

@GernotMaier
Copy link
Contributor

Running the example production_generate_simulation_config gives:

 python simtools/applications/production_generate_simulation_config.py  --azimuth 60.0 --elevation 45.0 --nsb 0.3 --data_level "A" --science_case "high_precision"  --file_path tests/resources/production_dl2_fits/prod6_LaPalma-20deg_gamma_cone.N.Am-4LSTs09MSTs_ID0_reduced.fits --file_type "On-source" --metrics_file  tests/resources/production_simulation_config_metrics.yaml --site North
Effective Area Error (avg): 0.000, Reference: 0.020
Signal Efficiency Error: 0.020, Reference: 0.020
INFO::calculate_statistical_errors_grid_point(l278)::calculate_error_energy_estimate_bdt_reg_tree::Calculating Energy Resolution Error
Energy Estimate Error: 0.184, Reference: 0.050
Gamma-Ray PSF Error: 0.010, Reference: 0.010
Image Template Methods Error: 0.050, Reference: 0.030
error_eff_area {'relative_errors': array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.01799919e-08,
       2.21965306e-08, 3.88179482e-08, 5.76378038e-08, 7.53500614e-08,
       8.98535771e-08, 1.20081083e-07, 1.46732027e-07, 1.66036331e-07,
       2.16994290e-07, 2.42594940e-07, 2.92140457e-07, 3.35214284e-07,
       3.58574316e-07, 4.90549293e-07, 4.77865852e-07, 6.57241537e-07,
       6.62752408e-07, 6.89972899e-07, 1.03031658e-06, 1.26143022e-06,
       1.53338834e-06, 1.47603219e-06, 1.79685160e-06, 2.19451145e-06,
       1.94774912e-06, 3.34303448e-06, 4.51782360e-06, 3.89789862e-06,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00])}
INFO::production_generate_simulation_config(l180)::main::Simulation parameters: {'core_scatter_area': np.float64(200070.0), 'viewcone': np.float64(1053.0), 'number_of_events': 11000004994}
INFO::production_generate_simulation_config(l183)::main::Simulation parameters saved to: /workdir/external/simtools/simtools-output/production_generate_simulation_config/configured_simulation_params.json

Is a viewcone of 1053 reasonable? I assume it is in degrees.

Here and throughput the added code: there are no units anywhere. I assume that this output is in the units expected by CORSIKA, but I am not entirely sure. The code also relies on certain units of the values read from the DL2 file, and I think at least there the units are given in the header or table columns. I suggest to use units (as we do not know if DL2 files change in future).

@GernotMaier
Copy link
Contributor

Also surprised how small the error on the effective areas area (same for the values in tests/unit_tests/production_configuration/test_generate_simulation_config.py)

@GernotMaier
Copy link
Contributor

First: great work @tobiaskleiner! This adds an almost complete framework for the configuration.

This PR is unusual big with several independent applications added, all including important . Too late to split it up, but we need to do it in pieces (I will sent a review for the first application out soon).

I have looked until now at production_generate_simulation_config only, which determines the configuration parameters for a single grid point.

Quite a few of the statistics questions are open, including the details of the metrics. I would suggest that you add a small discussion note on this topic into the implementation gitlab (UC8) directory addressing the following question:

  • what metrics do we want to consider and how are they calculated
  • how to we determine from the metrics the number of required events
  • how to you plan to calculate the scatter area and view cone

I think it is easier to discuss the methodical approach there and not as part of the code review. Maybe for future we should discuss the methods before the implementation starts.

Additional, we should discuss the concepts / impact of data level and science cases in with implementation gitlab. I think the science cases are not documented anywhere? What are the assumptions / motivation?

Copy link
Contributor

@GernotMaier GernotMaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first part of my review concentrating on simtools-production-generate-simulation-config and code called from this class. Further review will come, but I think it is more efficient to resolve for the issues related to one application (plus agree on the methods, see my comment in the PR) and then go to the next one.

Approve the overall structure and approach, this is good.

Todos:

  • need to agree on the statistical methods and metrics.
  • decide what do do with units (at this point there are no units)
  • decide how to document the methods
  • ...

"error_gamma_ray_psf": 0.01,
"error_image_template_methods": 0.03,}
"""
if file_path and os.path.exists(file_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace by general.collect_data_from_file_or_dict. This is more flexible, as it allows also to load a file using an url, json format, etc (and is used commonly throughout the code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


Example:

metrics = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to have them called uncertainty_eff_area, as these are not errors, Errors are typically due to measurements, not for metrics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

else:
serializable_config[key] = value

logger.info(f"Simulation parameters: {serializable_config}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simulation parameters or Simulation configuration? My understanding is that it is configuration.
(sorry for the language remarks..)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, changed

#!/usr/bin/python3

r"""
Configure and run a simulation based on command-line arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this application actually run simulations? I don't see it, seems like it is configuration only (I think this is what we want)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

generate simulation parameters for a specific grid point in a statistical error
evaluation setup. The class considers various parameters, such as azimuth,
elevation, and night sky background, to compute core scatter areas, viewcones,
and the required number of simulated events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace error by uncertainty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files are small. Could be smaller by simply 'gzipping' them (not important, but easy to do)

OUTPUT_PATH: simtools-output
OUTPUT_FILE: "configured_simulation_params.json"

INTEGRATION_TESTS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add an integration test which compares the output of this application run with an expected output. See this example

"""
return self.evaluator.data["viewcone"]

def calculate_required_events(self) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a unit test for this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to open an issue to add in a follow-up pull request:

  • a description of the assumptions, default values, statistical methods.
  • metric documentation
  • science cases documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here #1233


sim_events_data = hdul["SIMULATED EVENTS"].data # pylint: disable=E1101
bin_edges_low = sim_events_data["MC_ENERG_LO"]
bin_edges_high = sim_events_data["MC_ENERG_HI"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assumption that we want to pass the metric over the energy range given by the DL2 file good? Would it make sense to define a valid energy range for a given metric? e.g. generate a simulation production configuration with a 0.1% statistical uncertainty in the 30 GeV to 300 TeV range (although we simulated from 10 GeV to 500 TeV?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already added a validity range with units to each metric in the yaml files.

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

@tobiaskleiner
Copy link
Collaborator Author

@GernotMaier thanks for the review. I went through the comments and adressed most of them. Few more points need discussion/implementation see #1233, #1227, #1219. Will let you know when I have fixed the unit tests for another review.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@tobiaskleiner
Copy link
Collaborator Author

@GernotMaier thanks again for the review of the first part of the PR. Went through your comments and fixed the unit/integration tests. I factored out the event scaling logic and added a file for helper functions in the production configuration folder. In there we could also move the dl2 reading part in a later stage.
Let me know if the changes make sense and if so you could start reviewing the other parts of the PR.

This comment has been minimized.

1 similar comment

This comment has been minimized.

Copy link
Contributor

@GernotMaier GernotMaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of more comments on the production_generate_simulation_config.

I will talk to you directly about a couple of points.

The data level for the simulation (e.g., 'A', 'B', 'C').
science_case (str, required)
The science case for the simulation.
file_path (str, required)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to change the file_path doc string to

Path to file with MC events at CTAO DL2 data level. Used for statistical uncertainty evaluation.

elevation (float, required)
Elevation angle in degrees.
nsb (float, required)
Night sky background value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the units for the NSB is "Hz" (but please check)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 1/(srnscm**2) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please find it out. I know that in some places we use Hz (which requires the knowledge of the pixel fov)

config.parser.add_argument(
"--data_level", type=str, required=True, help="Data level (e.g., 'A', 'B', 'C')."
)
config.parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, added this to the list of discussions.

"--science_case", type=str, required=True, help="Science case for the simulation."
)
config.parser.add_argument(
"--file_path", type=str, required=True, help="Path to the dl2_mc_events_file FITS file."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Path to MC event file in DL2 format

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments here, but the changes are done in the separate part1 PR.
Changed the comment to your suggestion.

"--metrics_file",
required=True,
type=str,
help="Path to YAML file containing metrics and required precision as values (required).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the (required) from the comment (it is the only parameter with this added, although others are required).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

max_error : float
Maximum relative error.
"""
if "relative_errors" in self.metric_results["error_eff_area"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand a case where relative_errors is not in metric_results by error_eff_area is filled. If I understand it correctly, both variables are always filled in calculate_metrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No currently this depends on the production_simulation_config_metrics file and what metrics are given there (i.e. which metric computation is required).

)
valid = (simulated_event_counts > 0 * u.ct) & (triggered_event_counts > 0 * u.ct)

uncertainties = np.zeros_like(triggered_event_counts) * u.ct**-0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the '-0.5'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, this was wrongly implemented and the errors should be dimensionless. Previously when keeping the units the rel error turns out with this dimension.


return efficiencies, relative_errors

def calculate_energy_threshold(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to replace the hardwired 10% by a variable (which default is 10%)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

bin_edges = np.concatenate([bin_edges_low, [bin_edges_high[-1]]])
return np.unique(bin_edges)

def compute_histogram(self, event_energies_reco, bin_edges):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest 'compute_triggered_event_histogram' to make the purpose of this function clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Parameters
----------
event_energies_reco : array
Array of energies of the observed events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array of reconstructed energy per event

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

This comment has been minimized.

…UC-08-derive-simulation-production-configuration-parameters

This comment has been minimized.

…masim/simtools into add-UC-08-derive-simulation-production-configuration-parameters

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

…UC-08-derive-simulation-production-configuration-parameters

This comment has been minimized.

…UC-08-derive-simulation-production-configuration-parameters
Copy link

Passed

Analysis Details

0 Issues

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 0 Code Smells

Coverage and Duplications

  • Coverage 82.20% Coverage (92.80% Estimated after merge)
  • Duplications 0.00% Duplicated Code (0.00% Estimated after merge)

Project ID: gammasim_simtools_AY_ssha9WiFxsX-2oy_w

View in SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants