-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenACC Merge #4
Open
rsearles35
wants to merge
40
commits into
wdj:master
Choose a base branch
from
rsearles35:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…inline a bunch of the function calls in order to begin OpenACC implementation
… face-initializations). It is slow right now due to too much data transfer. Will optimize this once all portions of the compute are running on the GPU and producing the correct results.
…alculated. Next step is to figure out how to parallelize over energy groups.
… we only launch one kernel per octant iteration instead of 3
… all 5 loops. Also collapsed some of the inner-computational loops. Still need to resolve the issue of the spacial loops. Can't collapse when using not-equals...
* Could not parallelize spacial dimensions due to the unpredictable direction of the sweep. This change addresses the need by: * Rewrote the sweep to only sweep in one direction. This allowed me to parallelize all 3 loops. Related/future task(s): * Potentially tweaking which loops are collapsed in the gang layer and which ones are collapsed in the vector layer. We are at the tuning/optimization stage now.
…uding the spacial parallelization
…massive data overhead because the local array must increase in size dramatically
…t array access. This should give us better memory coalescence.
…KBA threading pattern
…ll 8 directions asynchronously. Each octant runs a gang-parallel KBA wavefront iteration with vector-parallel in-gridcell computations
… in your cmake file will enable the OpenACC version of the code
…as well as devices within nodes if enough ranks are used.
…g an issue when building for multicore CPU
…ollide. This is not good for performance
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merging the OpenACC version of Minisweep.