Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Partition reading empty list for remote connections #475

Merged
merged 2 commits into from
Dec 9, 2022

Conversation

hellkite500
Copy link
Member

Fixes bug with parsing empty remote-connection list when running in parallel.

Additions

  • A ptree constructor for Partitions_Parser (primarily for testing purposes)

Removals

Changes

  • Check type when dealing with parsed list to ensure it is a list and not empty (which turns into a string when parsed via boost::ptree)

Testing

  1. Added unit test to test this condition in remote-connections
  2. Disabled the old parser unit tests and added a simple test case to test
  3. Enabled partition parser testing even if mpi isn't explicity active, since it has not direct MPI dependency

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Target Environment support

  • Linux
  • MacOS

@stcui007
Copy link
Contributor

I have applied the patch provided by this PR and built the ngen executable manually. Performing mpirun with this fix using 3 CPUs and a json file of 3 partitions completed the 720 steps without error. Interestingly mpirun with 4 CPUs appears to complete without error. However, increasing the CPU number to 5 and above produces a Segmentation fault error. Going above 3 CPUs is just for my own curiosity.

Building unit test (test_unit) generates a warning message:

ngen/test/utils/Partition_Test.cpp:13: warning: "NGEN_MPI_ACTIVE" redefined
 #define NGEN_MPI_ACTIVE

Nevertheless, build process completes successfully and running the test passes all checks. Building test_all as a target gives the same behavior. Removing line 13 in Partition_Test.cpp seems to get rid of the warning and unit tests also pass.

Not exactly sure what caused the warning. It may well have come from my build environment or other CMakelist.txt.

Combined with a restriction placed in the partition generator code limiting number of partitions, this should completely resolve the problem in 456. I approve this PR with the option either keep or remove line 13 inPartition_Test.cpp.

@hellkite500
Copy link
Member Author

ngen/test/utils/Partition_Test.cpp:13: warning: "NGEN_MPI_ACTIVE" redefined
#define NGEN_MPI_ACTIVE

If you already have NGEN_MPI_ACTIVE defined in your build, then you will see this error. This is in place to ensure that these tests are run regardless of whether or not MPI is active/available since the tests really don't need any MPI in order to run, so they can run regardless of the configuration.

@mattw-nws mattw-nws merged commit 0d10905 into NOAA-OWP:master Dec 9, 2022
@mattw-nws mattw-nws linked an issue Dec 22, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error while executing distributed version of ngen
3 participants