Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap space #779

Merged
merged 3 commits into from
Dec 4, 2021
Merged

Heap space #779

merged 3 commits into from
Dec 4, 2021

Conversation

petervdonovan
Copy link
Collaborator

The changes in this PR do not noticeably improve the behavior of Epoch, LFC, or anything else, from the perspective of the user. This is because:

  • The C generator's memory usage is dominated by the quantity of generated code held in StringBuilders, and these changes don't change the quantity of generated code. They will make it possible to write to the file using less space, but this does not matter because after the first file write, the generated code gets duplicated in the process of removing line directives. Then, the second file write is never reached because GCC never finishes compiling.
  • The C++ generator does not use the utility function that is changed in this PR, and although it is affected by the reference held in the validator, in practice this makes no difference because the C++ generator does not seem to use much memory in comparison to the validator. By the time doGenerate is called, it does not matter how much memory can be freed because if an out-of-memory error was ever going to occur, it probably would have happened already in the validator.

Nonetheless, perhaps we should merge this branch anyway. I think the changes are consistent with best practices, and they give us two fewer things to worry about if we have more heap space problems in the future. Also, examination in a profiler seems to indicate that they have the desired effect on memory usage.

Other changes could be made, but I think we might defer them. These include:

  • Writing directly to a file instead of to a StringBuilder. This would allow us to generate arbitrarily large quantities of generated code without running out of memory. I am not sure this is appropriate because:
    • It might constrain the types of string manipulation that we are able to do. For example, it could be hard to copy the style of the C++ generator, where segments of a file get plugged into a template string.
    • We might not even want to be able to produce huge quantities of generated code. In order to cause heap space issues, the size of a generated file would probably have to be a few hundred MB. At that point, the code could be rather difficult to compile.
    • Breaking up generated code into multiple files is an alternative approach that we might want to use in the future anyway, for other reasons.
  • Not holding a copy of the generated code with line directives removed. It is possible that soon, we will not use line directives at all, so this may become a moot point.
  • Improving the validator (maybe the graph algorithms?) to use less memory. Currently, this might help the C++ generator but not the C generator. This seems like a reasonable choice, but from what I have heard we are not very concerned about out-of-memory errors in the C++ generator currently (since Edward mostly solved the problem originally posed in Epoch and lfc become unusable for programs with large (>500) banks and multiports #433). If it turns out that we actually do care about that issue because we want to compile C++ target programs with hundreds of thousands of reactors, then I am happy to work on this.

Copy link
Member

@lhstrh lhstrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should merge this. Looks good!

@petervdonovan petervdonovan merged commit b8533d5 into master Dec 4, 2021
@petervdonovan petervdonovan deleted the heap-space branch December 4, 2021 00:09
@cmnrd
Copy link
Collaborator

cmnrd commented Dec 6, 2021

Thanks @petervdonovan! This also looks like a great summary of the problem. I would like to comment on a few points that you make.

Writing directly to a file instead of to a StringBuilder. This would allow us to generate arbitrarily large quantities of generated code without running out of memory

The preferred way of doing this, would be using a template engine. We could have more or less complete code templates, and let the template engine handle the actual file writing. We would only need to provide the correct snippets to fill in the gaps. We decided against using a template engine, though, in favor of the built-in templates used in Kotlin.

As you point out, I think this is a non-issue, because at the point where the generated code becomes too large, we have many other problems. In the C++ generator, we also only ever have a single file in memory. So splitting up the generated code into multiple files also reduces the memory footprint.

Improving the validator (maybe the graph algorithms?) to use less memory. Currently, this might help the C++ generator but not the C generator. This seems like a reasonable choice, but from what I have heard we are not very concerned about out-of-memory errors in the C++ generator currently (since Edward mostly solved the problem originally posed in Epoch and lfc become unusable for programs with large (>500) banks and multiports #433). If it turns out that we actually do care about that issue because we want to compile C++ target programs with hundreds of thousands of reactors, then I am happy to work on this.

Indeed, #433 could be considered as fixed for C++. Although the runtime is still relatively high. On my laptop, lfc needs 22 seconds to generate the code for the C++ bank transaction benchmark with 1000 accounts. I think improving the algorithms and data structures would be beneficial anyway, but it is probably not most critical at this point. If we compile the same program with 10000 accounts, there is still an out of memory exception. As far as I can see, there is no fundamental reason for which compiling the 10000 accounts version requires more memory than the 10 account version. Perhaps the whole unrolling part in our analysis is the root of all evil and we should come up with a better representation. But I think this is what Edward is still working on.

Some validation steps are actually of limited usefulness for C++ programs. For instance, since main reactor parameters can be overwritten at runtime time, the static detection of cycles could be misleading. Since the C++ runtime also checks for cycles, we might want to consider disabling this check at compile time for large C++ programs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants