Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for changes to source-build to support parallelism #3608

Closed
mmitche opened this issue Aug 30, 2023 · 9 comments · Fixed by dotnet/installer#18824
Closed

Proposal for changes to source-build to support parallelism #3608

mmitche opened this issue Aug 30, 2023 · 9 comments · Fixed by dotnet/installer#18824
Assignees
Labels
area-build Improvements in source-build's own build process Epic Groups multiple user stories. Can be grouped under a theme.

Comments

@mmitche
Copy link
Member

mmitche commented Aug 30, 2023

Describe the Problem

Parallelism should provide serious gains for source-build's build. In a typical build, the end to end is around 50 mins. With parallelism enabled, I got < 30.

Describe the Solution

The are a few problems with enabling parallelism:

  • The repo dependency graph as represented in the repo projects is incorrect and incomplete
  • The way that input versions for a component are determined would potentially yield non-deterministic results in a parallel environment.

How to define the dependency graph correctly

This one is trickier than expected. What we should strive for is that each component project identifies only those dependencies that it directly depends on, and does so completely. Ideally, this information would be generated programmatically off of what the repo depends on in its Version.Details.xml file, as this is the source of information we use to determine what versions should be overridden.

Theoretically, for every component, we could identify the set of SourceBuild dependencies in the Version.Details.xml, and use those as the RepositoryReference items. However, this graph would be both incomplete in some areas, and have too many edges in others. The main problem with this approach is that the SourceBuild elements are often centered around usage in repo-level source build, which is slightly different.

  • Some repositories (e.g. NuGet.client) have no SourceBuild elements.
  • Some components will depend on repos that have no source build intermediate (e.g. NuGet.Client)
  • Some components will overspecify and cause cycles in the graph. For example, arcade depends on sdk, which depends on sdk. These dependencies should come from previously source-built, not the live build. We may be able to tweak cases like this, but it's probably trivially easy to fix.
  • Some components have no sources at all. dotnet.proj and source-build-packages do not correspond to sources in the repo, but do fit into the dependency graph.

All said though, info from traversing the graph is very close to the desired build order. Just need to add a few new edges and remove some existing ones. I propose the following approach:

Approach

  • If a component has sources and a Version.Details.xml file, the baseline set of dependencies is generated from SourceBuild marked dependencies.
  • A component project may define two additional input ItemGroups, AdditionalRepositoryReferences and RemoveRepositoryReferences
    • AdditionalRepositoryReferences is a set of additional RepositoryReference that should be added to the existing dependencies. For sdk.proj, this might include nuget.client, adding to the existing set. For dotnet.proj, this would be installer and source-build-packages, adding to an empty set because there are no sources.
    • RemoveRepositoryReferences is a set of RepositoryReference that should be removed from the generated set. This would include dependencies that cause circular references.

The set of repository references can be calculated as:

RepositoryReferences = <Set of eng/Version.Details.xml SourceBuild dependencies, if available> + AdditionalRepositoryReferences - RemoveRepositoryReferences

The set of final RepositoryReference elements is built before the current project is built.

How to ensure correctness

If the final generated graph has cycles, MSBuild will detect these cycles during evaluation and the build will fail. Ensuring that there are enough edges is slightly more difficult.

The key to ensuring there are enough edges and they are in the right places is to ensure that outputs of a repository only go to separated locations not used for inputs and input locations are not shared between repos. A similar approach to what ProdCon v1 did would work here.

  • Components are given a unique output location. No writing to shared locations.
  • When preparing the inputs for a given component build, the outputs of the final RepositoryReferences set for that component are combined together into a unique feed (including for non-NuGet). To save space, symlinks or hardlinks could theoretically be used.

Because the input version of a package is determined from the previously source-built packages + the input package feed, which would now be unique per component, if an edge is missing, the incorrect input version would be used. This would then usually result in a poison failure (taken from previously source-built), or a prebuilt (no edge at all).

Caveat

There is one case that this approach would not catch. A missing edge (e.g. to arcade) that would not cause a poison failure but could cause unwanted build behavior. In that case, the previously source-built would be used instead of the live arcade.

To fix this, we could go one step further and separate the previously source-built artifacts by component, and then apply the same methodology of restricting the inputs to a given component build only to declared dependencies. In this case, the declared repo references would need to avoid trimming away cycles in the graph (e.g. arcade would need to depend on arcade, sdk and runtime).

I don't think this step is necessary, unless it becomes clear that the number of edges that must be added or removed from the graph to ensure correctness is large (and thus easy to get wrong). My initial investigations suggest it is not, and only a few components need to alter their deps.

T-Shirt Size: Medium

@dotnet-issue-labeler dotnet-issue-labeler bot added area-build Improvements in source-build's own build process untriaged labels Aug 30, 2023
@omajid
Copy link
Member

omajid commented Aug 30, 2023

Do we also need to ensure that any compiler servers that are started by the build of one repo are isolated from other repo-builds and don't accidentally share their state?

@mmitche
Copy link
Member Author

mmitche commented Aug 31, 2023

Do we also need to ensure that any compiler servers that are started by the build of one repo are isolated from other repo-builds and don't accidentally share their state?

Possibly. That needs investigation.

@mthalman
Copy link
Member

Do we also need to ensure that any compiler servers that are started by the build of one repo are isolated from other repo-builds and don't accidentally share their state?

That's a good point. We've run into issues in the past with that: #3233

@mmitche
Copy link
Member Author

mmitche commented Aug 31, 2023

I think we need to follow up on the expected behavior of the compiler servers with the roslyn/msbuild team. Since the repo invocations are separate processes altogether (we invoke a new build.sh), I would not expect them to accidentally share state. If they did, that would be unwanted behavior in many cases (e.g. you run a build with two different .NET SDKs in serial).

@mmitche
Copy link
Member Author

mmitche commented Aug 31, 2023

Note that we do have the option of turning off the compiler server if we want.

@jaredpar
Copy link
Member

If there are issues running compiler server in parallel builds please put the bug in dotnet/roslyn not here. The compiler server fully supports running in parallel and any bugs when it does not function there is something I would really like to look into.

@mmitche
Copy link
Member Author

mmitche commented Aug 31, 2023

Alright, we have the Jared stamp. Good to know that these issues have been resolved.

@jaredpar
Copy link
Member

I looked at the original issue where you ran into parallelism problems. The symptoms on the PR are virtually identical to what we see in dotnet/runtime#85082. That is one of those "the error makes no sense" type of issues. The repro is very sporadic and hence never been able to catch a machine doing it so we could debug. If you have a reliable repro it would be nice.

Note: it's likely with newer compilers (RC1 and later) this issue wouldn't repro. Even the "this can't happen but what if it did" scenarios that produce the behavior were eliminated with a recent compiler change. If it still repros then I'd be even more interested because there is a misconception somewhere that needs to be rectified.

@mthalman
Copy link
Member

I can setup a private branch with our workaround removed to see if it still repros.

@MichaelSimons MichaelSimons moved this from Backlog to Post 8.0 / Pre 9.0 in .NET Source Build Sep 7, 2023
@MichaelSimons MichaelSimons moved this from Post 8.0 / Pre 9.0 to 9.0 in .NET Source Build Nov 14, 2023
@MichaelSimons MichaelSimons moved this from Backlog to 9.0 in .NET Source Build Jan 3, 2024
@mthalman mthalman self-assigned this Jan 5, 2024
@mthalman mthalman moved this from 9.0 to In Progress in .NET Source Build Jan 5, 2024
@tkapin tkapin added the Epic Groups multiple user stories. Can be grouped under a theme. label Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-build Improvements in source-build's own build process Epic Groups multiple user stories. Can be grouped under a theme.
Projects
Archived in project
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants