-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add uc 08 derive simulation production configuration parameters #1098
base: main
Are you sure you want to change the base?
Add uc 08 derive simulation production configuration parameters #1098
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
Would like start the discussions how to best organize the code. At this point most of it is in the applications, but we want to move it into modules. Suggest to start a module to put those classes in. What would be a good name?
This module would include all classes currently defined in
@orelgueta , @tobiaskleiner : please comment. |
Not much for me to comment. This is the first time I see that those modules are in applications and I definitely agree they should go into simtools instead. In terms of which name to put them under, I would vote for |
I agree with the |
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
6 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
||
# Determine the effective area threshold (50% of max effective area) | ||
max_efficiency = np.max(efficiencies) | ||
threshold_efficiency = 0.1 * max_efficiency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says "exceeds 50%". Do you use actually 10%?
@tobiaskleiner - before I start the review, could you make sure that the integration tests run successfully? |
Running the example production_generate_simulation_config gives:
Is a viewcone of 1053 reasonable? I assume it is in degrees. Here and throughput the added code: there are no units anywhere. I assume that this output is in the units expected by CORSIKA, but I am not entirely sure. The code also relies on certain units of the values read from the DL2 file, and I think at least there the units are given in the header or table columns. I suggest to use units (as we do not know if DL2 files change in future). |
Also surprised how small the error on the effective areas area (same for the values in tests/unit_tests/production_configuration/test_generate_simulation_config.py) |
First: great work @tobiaskleiner! This adds an almost complete framework for the configuration. This PR is unusual big with several independent applications added, all including important . Too late to split it up, but we need to do it in pieces (I will sent a review for the first application out soon). I have looked until now at Quite a few of the statistics questions are open, including the details of the metrics. I would suggest that you add a small discussion note on this topic into the implementation gitlab (UC8) directory addressing the following question:
I think it is easier to discuss the methodical approach there and not as part of the code review. Maybe for future we should discuss the methods before the implementation starts. Additional, we should discuss the concepts / impact of data level and science cases in with implementation gitlab. I think the science cases are not documented anywhere? What are the assumptions / motivation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first part of my review concentrating on simtools-production-generate-simulation-config
and code called from this class. Further review will come, but I think it is more efficient to resolve for the issues related to one application (plus agree on the methods, see my comment in the PR) and then go to the next one.
Approve the overall structure and approach, this is good.
Todos:
- need to agree on the statistical methods and metrics.
- decide what do do with units (at this point there are no units)
- decide how to document the methods
- ...
"error_gamma_ray_psf": 0.01, | ||
"error_image_template_methods": 0.03,} | ||
""" | ||
if file_path and os.path.exists(file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace by general.collect_data_from_file_or_dict
. This is more flexible, as it allows also to load a file using an url, json format, etc (and is used commonly throughout the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
|
||
Example: | ||
|
||
metrics = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to have them called uncertainty_eff_area
, as these are not errors, Errors are typically due to measurements, not for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
else: | ||
serializable_config[key] = value | ||
|
||
logger.info(f"Simulation parameters: {serializable_config}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simulation parameters
or Simulation configuration
? My understanding is that it is configuration.
(sorry for the language remarks..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, changed
#!/usr/bin/python3 | ||
|
||
r""" | ||
Configure and run a simulation based on command-line arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this application actually run
simulations? I don't see it, seems like it is configuration only (I think this is what we want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
generate simulation parameters for a specific grid point in a statistical error | ||
evaluation setup. The class considers various parameters, such as azimuth, | ||
elevation, and night sky background, to compute core scatter areas, viewcones, | ||
and the required number of simulated events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace error by uncertainty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files are small. Could be smaller by simply 'gzipping' them (not important, but easy to do)
OUTPUT_PATH: simtools-output | ||
OUTPUT_FILE: "configured_simulation_params.json" | ||
|
||
INTEGRATION_TESTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to add an integration test which compares the output of this application run with an expected output. See this example
""" | ||
return self.evaluator.data["viewcone"] | ||
|
||
def calculate_required_events(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a unit test for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to open an issue to add in a follow-up pull request:
- a description of the assumptions, default values, statistical methods.
- metric documentation
- science cases documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here #1233
|
||
sim_events_data = hdul["SIMULATED EVENTS"].data # pylint: disable=E1101 | ||
bin_edges_low = sim_events_data["MC_ENERG_LO"] | ||
bin_edges_high = sim_events_data["MC_ENERG_HI"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the assumption that we want to pass the metric over the energy range given by the DL2 file good? Would it make sense to define a valid energy range
for a given metric? e.g. generate a simulation production configuration with a 0.1% statistical uncertainty in the 30 GeV to 300 TeV range (although we simulated from 10 GeV to 500 TeV?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already added a validity range with units to each metric in the yaml files.
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@GernotMaier thanks for the review. I went through the comments and adressed most of them. Few more points need discussion/implementation see #1233, #1227, #1219. Will let you know when I have fixed the unit tests for another review. |
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
@GernotMaier thanks again for the review of the first part of the PR. Went through your comments and fixed the unit/integration tests. I factored out the event scaling logic and added a file for helper functions in the production configuration folder. In there we could also move the dl2 reading part in a later stage. |
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of more comments on the production_generate_simulation_config
.
I will talk to you directly about a couple of points.
The data level for the simulation (e.g., 'A', 'B', 'C'). | ||
science_case (str, required) | ||
The science case for the simulation. | ||
file_path (str, required) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to change the file_path
doc string to
Path to file with MC events at CTAO DL2 data level. Used for statistical uncertainty evaluation.
elevation (float, required) | ||
Elevation angle in degrees. | ||
nsb (float, required) | ||
Night sky background value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the units for the NSB is "Hz" (but please check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 1/(srnscm**2) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please find it out. I know that in some places we use Hz (which requires the knowledge of the pixel fov)
config.parser.add_argument( | ||
"--data_level", type=str, required=True, help="Data level (e.g., 'A', 'B', 'C')." | ||
) | ||
config.parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, added this to the list of discussions.
"--science_case", type=str, required=True, help="Science case for the simulation." | ||
) | ||
config.parser.add_argument( | ||
"--file_path", type=str, required=True, help="Path to the dl2_mc_events_file FITS file." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
Path to MC event file in DL2 format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some comments here, but the changes are done in the separate part1 PR.
Changed the comment to your suggestion.
"--metrics_file", | ||
required=True, | ||
type=str, | ||
help="Path to YAML file containing metrics and required precision as values (required).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the (required)
from the comment (it is the only parameter with this added, although others are required).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
max_error : float | ||
Maximum relative error. | ||
""" | ||
if "relative_errors" in self.metric_results["error_eff_area"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand a case where relative_errors
is not in metric_results by error_eff_area
is filled. If I understand it correctly, both variables are always filled in calculate_metrics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No currently this depends on the production_simulation_config_metrics
file and what metrics are given there (i.e. which metric computation is required).
) | ||
valid = (simulated_event_counts > 0 * u.ct) & (triggered_event_counts > 0 * u.ct) | ||
|
||
uncertainties = np.zeros_like(triggered_event_counts) * u.ct**-0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the '-0.5'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, this was wrongly implemented and the errors should be dimensionless. Previously when keeping the units the rel error turns out with this dimension.
|
||
return efficiencies, relative_errors | ||
|
||
def calculate_energy_threshold(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to replace the hardwired 10% by a variable (which default is 10%)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
bin_edges = np.concatenate([bin_edges_low, [bin_edges_high[-1]]]) | ||
return np.unique(bin_edges) | ||
|
||
def compute_histogram(self, event_energies_reco, bin_edges): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest 'compute_triggered_event_histogram' to make the purpose of this function clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Parameters | ||
---------- | ||
event_energies_reco : array | ||
Array of energies of the observed events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array of reconstructed energy per event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This comment has been minimized.
This comment has been minimized.
…UC-08-derive-simulation-production-configuration-parameters
This comment has been minimized.
This comment has been minimized.
…masim/simtools into add-UC-08-derive-simulation-production-configuration-parameters
This comment has been minimized.
This comment has been minimized.
…rameters' of https://github.com/gammasim/simtools into add-UC-08-derive-simulation-production-configuration-parameters
This reverts commit dd2cc81.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…UC-08-derive-simulation-production-configuration-parameters
This comment has been minimized.
This comment has been minimized.
…UC-08-derive-simulation-production-configuration-parameters
Analysis Details0 IssuesCoverage and DuplicationsProject ID: gammasim_simtools_AY_ssha9WiFxsX-2oy_w |
This PR adds the functionality defined in UC8 with the following four applications:
simtools-production-calculate-resource-estimates
-- calculates compute and storage resources --
simtools-production-generate-grid
-- generates a grid of simulation points --
simtools-production-generate-simulation-config
-- generates simulation parameters for a specific grid point --
simtools-production-scale-events
-- metric evaluation and statistical error calculations --
Modules are stored in
production_configuration
:calculate_statistical_errors_grid_point.py
derive_computing_resources.py
generate_production_grid.py
generate_simulation_config.py
interpolation_handler.py