-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the source of the dt=0 error when using adaptive time step #2103
base: release-v4.6.1
Are you sure you want to change the base?
Conversation
…pted to mitigate this effect. Fixed the source of the error dt=0 in BC and out time steps adjust. The division by 2 of tmpTimeInterval caused the time to be out of precision for odd values of time_to_bc and time_to_output.
The regression test results:
|
To help us review this, can you add more explanation of the fix in the description section? |
Hello. The adaptive time step module operates with a precision of 1/100 seconds, resulting in simulation times with the same precision. The process that determines the dtInterval involves several steps, checking various conditions. One of these conditions is the precision of 1/100 seconds. When adjusting the time step to boundary conditions (BC) and output times, the algorithm divides a temporary time interval (with 1/100 sec precision) by two and assigns it to dtInterval. When this temporary time interval has an odd value, the precision changes to 1/200 sec, and the next simulation time also has a precision of 1/200 sec. In the next step, adjacent to the BC or output time, the algorithm truncates the 1/200 sec precision, setting the simulation time to 1/200 sec before the BC or output time (without the mitigation done in #1398). The following dtInterval, which is 1/200 sec, is then truncated to a precision of 1/100 sec, resulting in a dtInterval of 0. This effect is mitigated in #1398 but does not address the underlying source of the problem. |
@JeroBnd Can you expand IDING SAS? Are you working with Kugler who posted the issue? |
I am not working with Kugler. I am from Córdoba, Argentina. IDING SAS is a startup. We provide services to APRHI (Provincial Administration of Water Resources) for reservoir management. In this context, we are operationally running a high-resolution weather forecast ensemble with WRF. I found this bug while trying to debug an error caused by myself in the namelist.input that did not throw a warning. There are several things to do in the adapt_timestep module... |
@JeroBnd Thanks for the info. I tested one of the cases Kugler had problem with, the em_b_wave case, your change didn't help. Did you test that case, using his namelist.input file? |
@weiwangncar, the adapt_timestep module is not prepared to handle this kind of idealized case. The algorithm uses the remaining time until the boundary condition is applied, with a counter that resets when the boundary condition is used. In this idealized case, the counter starts with a value of 10800 (equivalent to 3 hours), but when the supposed boundary condition should occur, it doesn’t, and the counter continues decreasing into negative values, causing the simulation to terminate prematurely. I wrote a small patch to fix this, but the module should be reconsidered. It uses several variables from the namelist.input without properly checking them. |
@JeroBnd Did you encounter a problem associated with dt=0? If so, what version of the code is it? |
@weiwangncar I am doing one-way nesting using Ndown and adaptive time step. The boundary conditions (BC) are taken from domain 1 to domain 2 every 30 minutes (1800 seconds). However, when running WRF for domain 2 with a mistake in the namelist.input file, specifically with the variable interval_seconds set to 3600 seconds, one random run out of the 24-member ensemble usually fails with a CFL error and a segmentation fault or stops at NOAH MP. For this to happen, two conditions must occur simultaneously: 1 - The time interval between the current simulation time and the next BC time is an odd value (resulting a running time precision of 1/200 sec). 2 - The BC time does not match the BC time derived from interval_seconds, which in my case should occur at times ending in 30 minutes, such as 1:30. When both of these conditions occur, the mitigation algorithm for dt=0 (#1398) sets the dtInterval to match the BC time derived from interval_seconds, generating a time interval of 30 minutes, which crashes the run. |
@brianreen We wonder if you could help review this PR? Thanks. |
Yes, I will help review this PR. |
@JeroBnd Can you address Brian's comment? |
@brianreen, @weiwangncar The reference #154 is incorrect, you are right. It should say #1398, but I actually want to refer to lines 341-354. |
The proposed change appears to make the following changes from the standard version of the code: A) If between one and two (preliminary) timesteps from the target time (BC or output) then set the timestep to: B) If less than one (preliminary) timestep from the target time (BC or output) then set the timestep to: However, while the adaptive timestep code does sometimes round timesteps to hundredths of a second it does not always do so. For example, if two domains are running and adaptation_domain=1, the inner domain timestep will be adjusted so that the next outer domain timestep aligns with an inner domain timestep. In this process there is no guarantee that the inner domain timestep will be representable in hundredths of a second. One can also get timesteps that are not representable in hundredths of a second if the user specifies either the minimum or maximum allowed timesteps using fractions that are not representable in hundredths of a second. Because timesteps do not have to be representable as hundredths of a second, the model time also does not have to be representable as hundredths of a second. This means that proposed change B is problematic because it can result in the timestep not reaching the target time (BC or output). |
@brianreen @weiwangncar |
It does not seem to me that these are sources of numeric error. Lines 166-172 enforce the precision but only for max_increase_factor. In the case where the tentative current timestep is greater than the product of the max_increase_factor (adjusted to precision of hundredths) times the previous timestep, this code updates the tentative current timestep to this product. If the previous timestep is not representable by that precision, there is no guarantee that the updated current timestep will be representable by that precision. This code does not assume that the previous timestep is representable by hundredths of a second. Lines 174-181 do enforce the precision on the tentative current timestep. However, the minimum and maximum timestep criteria are applied after this, as is the adjustment to ensure that outer domain timesteps align with an inner domain timesteps, and thus the final current timestep will not necessarily match this precision. Based on this, I am not yet seeing how time will be "lost" as a result of rounding in lines 166-181. Thus far, it seems that the standard code functions correctly as long as interval_seconds is set correctly. |
Fixed the source of the error dt=0 in BC and out time steps adjust.
TYPE: bug fix
KEYWORDS: time, step, adaptative
SOURCE: Jeronimo Bande (IDING SAS, Córdoba, Argentina)
DESCRIPTION OF CHANGES:
Problem:
When adjusting the time step toward BC and output times, occasionally time step is erroneously set to 0 causing model to stop.
Solution:
The algorithm to determine the time step near BC and output times is modified to correctly set the time step.
ISSUE: Fixes #1560
LIST OF MODIFIED FILES: /dyn_em/adapt_timestep_em.F
TESTS CONDUCTED:
RELEASE NOTE: Corrected adaptive time step being occasionally set to 0 (which causes the model to stop) on BC and output times.