Decide on linker to use in runtime builds #84794

sbomer · 2023-04-13T17:22:57Z

#84148 changed the linker that we use in runtime builds (including when running NativeAOT on our binaries like ILCompiler and crossgen2) to lld. This was motivated by a desire to move to cross-builds (mariner host OS, targeting old linux rootfs). Using lld simplifies the cross-compilation scenarios because it is a cross-linker, so doesn't require a separate bfd linker with appropriate target prefix. We have also been wanting to move away from binutils as a dependency. Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

There are a few concerns with this approach - see @am11's comments in #84493 (comment), #84493 (comment), #84148 (comment) for full context. In summary:

NativeAOT support for lld was only added recently and has not been well tested.
NativeAOT uses bfd by default in user scenarios, so we should be testing this in ci.
lld introduces size regression compared to bfd. Numbers from Write linker script for lld to enable gc-sections #84493 (comment):

# dotnet new console (~2.2 K diff)
- 1496072
+ 1498192

# dotnet new api -aot (~41 K diff)
- 12743672
+ 12785208

@am11 I hope that's a fair summary - please add to this if I missed anything.

So I'm opening this to discuss the choice of linker when building our product. Options are:

Use lld as we do with Build on CBL-mariner host with rootfs #84148.
Add binutils linker (need separate cross-linker for each supported architecture) to the mariner build images. We will need to pick the right prefixed linker to use based on the target platform. ~~This is not currently supported in NativeAot.~~ (edit: NativeAot will detect and used the right prefixed linker if available).

My preference is to continue using lld, but add separate test legs that validate the customer scenarios using bfd. One challenge is that our NativeAOT tests usually run NativeAOT on the build machine, not on helix - so they are testing a cross-compilation scenario, not the customer scenario. This is simpler than getting all NativeAOT prerequisites onto the target machines to test native compilation. I think it would be worth adding separate tests for NativeAOT compilation that run in an environment that represents customer scenarios (for example, ubuntu with bfd).

I'm opening this issue to continue the discussion and hopefully reach agreement on the approach.

@am11 @janvorli @agocke @MichalStrehovsky @jkotas

The text was updated successfully, but these errors were encountered:

ghost · 2023-04-13T17:23:08Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

#84148 changed the linker that we use in runtime builds (including when running NativeAOT on our binaries like ILCompiler and crossgen2) to lld. This was motivated by a desire to move to cross-builds (mariner host OS, targeting old linux rootfs). Using lld simplifies the cross-compilation scenarios because it is a cross-linker, so doesn't require a separate bfd linker with appropriate target prefix. We have also been wanting to move away from binutils as a dependency. Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

There are a few concerns with this approach - see @am11's comments in #84493 (comment), #84493 (comment), #84148 (comment) for full context. In summary:

NativeAOT support for lld was only added recently and has not been well tested.
NativeAOT uses bfd by default in user scenarios, so we should be testing this in ci.
lld introduces size regression compared to bfd. Numbers from Write linker script for lld to enable gc-sections #84493 (comment):

# dotnet new console (~2.2 K diff)
- 1496072
+ 1498192

# dotnet new api -aot (~41 K diff)
- 12743672
+ 12785208

@am11 I hope that's a fair summary - please add to this if I missed anything.

So I'm opening this to discuss the choice of linker when building our product. Options are:

Use lld as we do with Build on CBL-mariner host with rootfs #84148.
Add binutils linker (need separate cross-linker for each supported architecture) to the mariner build images. We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

My preference is to continue using lld, but add separate test legs that validate the customer scenarios using bfd. One challenge is that our NativeAOT tests usually run NativeAOT on the build machine, not on helix - so they are testing a cross-compilation scenario, not the customer scenario. This is simpler than getting all NativeAOT prerequisites onto the target machines to test native compilation. I think it would be worth adding separate tests for NativeAOT compilation that run in an environment that represents customer scenarios (for example, ubuntu with bfd).

I'm opening this issue to continue the discussion and hopefully reach agreement on the approach.

@am11 @janvorli @agocke @MichalStrehovsky @jkotas

Author:	sbomer
Assignees:	-
Labels:	`untriaged`, `area-NativeAOT-coreclr`
Milestone:	-

agocke · 2023-04-13T17:27:41Z

I thought there was another consideration: we need a relatively recent version of lld to produce good results. How many distros ship with a recent-enough lld for us to use it as the default?

sbomer · 2023-04-13T17:42:09Z

We need at least lld 13 to use @am11's fix that improves output size: #84493. But I am not suggesting to change the default for customers - that would stay bfd either way. We will be able to use @am11's fix in runtime builds as soon as we update to llvm 16 (which I'm working on).

ghost · 2023-04-13T19:17:11Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

#84148 changed the linker that we use in runtime builds (including when running NativeAOT on our binaries like ILCompiler and crossgen2) to lld. This was motivated by a desire to move to cross-builds (mariner host OS, targeting old linux rootfs). Using lld simplifies the cross-compilation scenarios because it is a cross-linker, so doesn't require a separate bfd linker with appropriate target prefix. We have also been wanting to move away from binutils as a dependency. Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

There are a few concerns with this approach - see @am11's comments in #84493 (comment), #84493 (comment), #84148 (comment) for full context. In summary:

NativeAOT support for lld was only added recently and has not been well tested.
NativeAOT uses bfd by default in user scenarios, so we should be testing this in ci.
lld introduces size regression compared to bfd. Numbers from Write linker script for lld to enable gc-sections #84493 (comment):

# dotnet new console (~2.2 K diff)
- 1496072
+ 1498192

# dotnet new api -aot (~41 K diff)
- 12743672
+ 12785208

@am11 I hope that's a fair summary - please add to this if I missed anything.

So I'm opening this to discuss the choice of linker when building our product. Options are:

Use lld as we do with Build on CBL-mariner host with rootfs #84148.
Add binutils linker (need separate cross-linker for each supported architecture) to the mariner build images. We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

My preference is to continue using lld, but add separate test legs that validate the customer scenarios using bfd. One challenge is that our NativeAOT tests usually run NativeAOT on the build machine, not on helix - so they are testing a cross-compilation scenario, not the customer scenario. This is simpler than getting all NativeAOT prerequisites onto the target machines to test native compilation. I think it would be worth adding separate tests for NativeAOT compilation that run in an environment that represents customer scenarios (for example, ubuntu with bfd).

I'm opening this issue to continue the discussion and hopefully reach agreement on the approach.

@am11 @janvorli @agocke @MichalStrehovsky @jkotas

Author:	sbomer
Assignees:	-
Labels:	`area-Infrastructure`, `untriaged`, `area-NativeAOT-coreclr`
Milestone:	-

am11 · 2023-04-14T07:50:38Z

Note that it doesn't change the default linker for customer NativeAOT scenarios, which still use bfd.

We recommend clang compiler in docs, we build NativeAOT runtime with llvm-libunwind, we use llvm-based objwriter to create the linkable objects, so lets make lld as a default and document it as a recommended linker for PublishAot for linux. This will be aligned with other aspects and understandable. The upside is that we will be testing with the same component in CI which we recommend for users in docs.

So lets apply the patch, turn default to lld and cleanup LinkerFlavor=lld sprinkled in various places. That will hopefully resolve this matter satisfying everyone.

Alternatives are making less sense to me:

We have also been wanting to move away from binutils as a dependency.

Why is this? binutils is used since .NET Core 1.0, there is no issue with licensing in using binutils to build any software; proprietary or otherwise.

We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

This is not correct. The "toolchain" does this job. If configured properly, we don't need to manually specify prefixes for cross-compilation.

How many distros ship with a recent-enough lld for us to use it as the default?

You will find it pretty hard to discover a distro which is not using bfd as their default linker, let alone the versioning concerns. bfd, aka GNU linker, has decades of more experience than other linkers, still actively maintained, choice of top distros and beats other linkers in binary-size benchmarks.

@am11 I hope that's a fair summary - please add to this if I missed anything.

we can no longer build runtime repo with binutils alone.

i.e. LinkerFlavor=lld if target==linux constructs added are missing and Compiler!=gcc condition; we set Compiler at the start of build:

runtime/eng/build.sh

Line 432 in ed0f1c5

arguments="$arguments /p:Compiler=$compiler /p:CppCompilerAndLinker=$compiler"

(maps to ./build.sh -gcc)

MichalStrehovsky · 2023-04-14T09:01:17Z

I don't have a strong opinion on this. We're already in a situation where we can't test E2E the thing that will be produced by the toolchain because the version of ld is out of our control. The solution to that could be to just ship lld in the ILCompiler package. But that's a can of worms since constructing the command line to invoke the linker is per-distro bespoke magic and it's also the reason why we use clang as the "linker" instead of invoking ld directly.

Ubuntu 18 comes with lld that is ancient. Ubuntu 22 has a good one. I don't have a good sense of the landscape. If we tell people to use lld but the version we need is hard to obtain, that would be a problem.

I also don't like testing with a completely different linker codebase (bfd vs lld) than what we use on customer machines. We did run into linker issues in the past. At least using the same codebase (even though a different version) would keep things more sane.

kg · 2023-04-15T04:30:42Z

This has broken builds on Debian for me.

ILCompiler -> /home/kate/Projects/dotnet-runtime-wasm/artifacts/bin/coreclr/linux.x64.Release/ilc/ilc.dll
  Generating native code
clang : error : invalid linker name in argument '-fuse-ld=lld' [/home/kate/Projects/dotnet-runtime-wasm/src/coreclr/tools/aot/ILCompiler/ILCompiler.csproj]
/home/kate/.nuget/packages/microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/build/Microsoft.NETCore.Native.targets(335,5): error MSB3073: The command ""/usr/bin/clang-11" "/home/kate/Projects/dotnet-runtime-wasm/artifacts/obj/coreclr/ILCompiler/x64/Release/native/ilc.o" -o "/home/kate/Projects/dotnet-runtime-wasm/artifacts/bin/coreclr/linux.x64.Release/ilc/native/ilc" -fuse-ld=lld /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libbootstrapper.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libRuntime.ServerGC.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libeventpipe-disabled.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libstdc++compat.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/sdk/libnumasupportdynamic.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Globalization.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.IO.Compression.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Net.Security.Native.a /home/kate/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.3.23174.8/framework/libSystem.Security.Cryptography.Native.OpenSsl.a -g -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread -ldl -lz -lrt -lm -pie -Wl,-pie -Wl,-z,relro -Wl,-z,now -Wl,--discard-all" exited with code 1. [/home/kate/Projects/dotnet-runtime-wasm/src/coreclr/tools/aot/ILCompiler/ILCompiler.csproj]

I do have llvm, clang, and lld in my /usr/bin, but they have a version number on the end. I tried helping the build process out by using ln -s to put a 'lld' in my PATH pointing at the right one, but that didn't fix it. What do I do?

am11 · 2023-04-15T07:06:22Z

@kg, this is the situation when we have different versions of clang and lld installed. e.g. if I have clang-16, clang-14 and lld-14, installed, then the following things happen:

./build.sh via init-compiler will select clang-16 (because it is the latest version)
init-compiler will introspect if fuse-ld=lld and based on the result, will select the linker
then changes in Build on CBL-mariner host with rootfs #84148 will "force" fuse-ld=lld unconditionally, disregarding #2's introspection based linker selection, and fail the build because we didn't install lld-16

This is exactly how

we can no longer build runtime repo with binutils alone.

community-driven scenarios are also affected..

Workaround is to apt install lld-<version of clang which build is picking up>

kg · 2023-04-15T08:03:55Z

Workaround is to apt install lld-<version of clang which build is picking up>

Thanks, that indeed worked. I did
/usr/bin/clang -v
which output Debian clang version 11.0.1-2, so sudo apt-get install lld-11 caused the build to succeed.

tmds · 2023-04-17T08:30:45Z

cc @omajid

sbomer · 2023-04-17T17:31:40Z

We have also been wanting to move away from binutils as a dependency.

Why is this?

I think mainly for the same reasoning as with lld lld - to simplify our official (cross) builds and avoid having to detect or install prefixed versions of objcopy and other tools. But I agree with your feedback that we should continue supporting binutils-based builds, at least for native builds.

We will need to pick the right prefixed linker to use based on the target platform. This is not currently supported in NativeAot.

This is not correct. The "toolchain" does this job. If configured properly, we don't need to manually specify prefixes for cross-compilation.

Thanks for the correction - fixed above. I got my wires crossed when I was thinking about the gcc support mentioned in #78559.

we can no longer build runtime repo with binutils alone.

Filed #84934 to track this.

Regarding runtime linker defaults: it sounds like we are all on board with using lld as the default for runtime builds, as long as it's optional.

Regarding customer NativeAot linker defaults: I don't have a strong opinion on what the default should be, but if we consider both of them to be supported, we should have testing in place for both, at least to cover basic scenarios, even if most of our testing uses the default.

agocke · 2023-04-20T20:49:35Z

Regarding runtime linker defaults: it sounds like we are all on board with using lld as the default for runtime builds, as long as it's optional.

I'm not quite sure what this means. I think lld should be the default for CI specifically, but I think ld should be the default in dev builds, same as the Native AOT default.

sbomer · 2023-04-20T21:09:22Z

I was thinking that the defaults used in ci should match those for the local dev scenario, but if there's a desire for the local dev scenario to use ld, that's certainly an option. My read of the discussion here was that we were actually leaning towards changing the NativeAOT defaults to use lld in the customer scenario too.

am11 · 2023-04-20T21:30:18Z

My read of the discussion here was that we were actually leaning towards changing the NativeAOT defaults to use lld in the customer scenario too.

Yup, or more generally, the linker the CI legs are frequently using to test all NativeAOT tests is the linker we should recommend in the docs. In case it wasn't obvious, LinkerFlavor supports all four linkers which clang's/gcc' fuse-ld option recognize: bfd, gold, lld and mold -- and (empirically speaking) they are normally favored in that order by linux devs; gold > lld. Also, fuse-ld=lld is supported by clang7+ and gcc9+, which is old enough to be a viable candidate for default.

agocke · 2023-04-20T21:57:29Z

I think we could credibly test both linkers. It may not be in PR, but it probably is not a big deal to spin up an ld-specific run of the test suite with Ubuntu-latest-LTS.

It seems like there's two separate considerations -- for our official build we want to keep the dependencies as tight as possible, and using LLVM for everything makes sense.

But unlike with coreclr, the vast majority of linking is going to happen on user machines, so it's not particularly relevant what linker the official build machines use.

In other words, if we think that bfd is a better experience for customers, I don't think it's bad to pick that and use it in most of our testing.

am11 · 2023-04-20T22:56:06Z

But unlike with coreclr, the vast majority of linking is going to happen on user machines, so it's not particularly relevant what linker the official build machines use.

Yes, no confusion here that this issue is about which linker is used to exercise the tests in CI; it is not about which linker is used to build official coreclr release.

The linker we use in CI is the linker we should recommend for user. It is a "singular" linker, because we normally do not force non-default NativeAOT options in tests. There are number of options which we don't test and they are left as experimental / community-supported; e.g. static lib/exe, no-pie, static-pie etc. The red flag is our usage of LinkerFlavor outside src/nativeaot/BuildIntegration directory introduced in #84148 which is a clear deviation from what we set as default.

In other words, if we think that bfd is a better experience for customers, I don't think it's bad to pick that and use it in most of our testing.

Makes sense. Alternatively, we can set lld as default in BuildIntegration for linux, update Microsoft docs to include lld in apt-get install clang ... line, cleanup <LinkerFlavor>lld</LinkerFlavor> instances from rest of the repo and continue testing with lld. Personally, alternative is making more sense to me because:

we are using other llvm components in NativeAOT: objwriter at build-time , llvm-objcopy (preferred with clang) at publish-time, llvm-libunwind at run-time and clang is used to compile NativeAOT Runtime as well as recommended linker-driver in docs.
cross-PublishAot is currently only supported with clang.
since we compile NativeAOT Runtime with clang, using bfd has potential to run into this issue on some platforms: Fix redhat arm64 #52244 (the sole reason we switched coreclr to build with lld which may or may not be applicable to NativeAOT.. there is no easy way to tell with 100% certainty)

agocke · 2023-04-21T18:34:56Z

@jkotas @MichalStrehovsky @sbomer How do we feel about changing the default linker to lld?

Presumably we would still use bfd if lld is not installed, but this would become the default.

am11 · 2023-04-21T19:02:17Z

Presumably we would still use ld

nit: its name is bfd because ld can be anything; ld.lld gold.ld etc.

jkotas · 2023-04-21T19:26:35Z

How do we feel about changing the default linker to lld?

Fine with me.

MichalStrehovsky · 2023-04-24T02:44:33Z

How do we feel about changing the default linker to lld?

Sounds good. Only one question:

Presumably we would still use bfd if lld is not installed, but this would become the default.

Is the proposal that we would also check if lld is recent enough and consider it missing if it e.g. doesn't support the command line arguments we want to specify? I want to avoid situations when someone is e.g. on Ubuntu 18, installs lld, and gets worse off than if we used bfd (I might be wrong about this one - I don't know how old distros do we have to care about but do know that lld only became "good" "recently").

am11 · 2023-04-24T07:45:48Z

Is the proposal that we would also check if lld is recent enough and consider it missing if it e.g. doesn't support the command line arguments we want to specify?

If lld is available, we support it. We already have a version check whether or not use a linker script needed by lld-13+. The rest of the arguments are supported by both old and new versions. We just need a condition (like we have for llvm-objcopy) i.e.; if lld is available on linux, use it otherwise fallback to bfd.

MichalStrehovsky · 2023-04-24T09:12:12Z

What I meant is that lld is doing worse than bfd size-wise in general, but if we can't do linker script, it's significantly worse and users would be better off using bfd. I'm okay with the regression with recent lld's, but it's more than 5% for older llds and that makes me raise an eyebrow (correct me if I'm wrong).

am11 · 2023-04-24T11:27:52Z

The size difference with rid: linux-arm64, template: api, configuration: release, stripped: true, executable: true, static-executable: false are mixed (v10 is better than v11):

clang/lld version	published binary size	comments
7	8.8M (9183120)	first version to support `fuse-ld=lld`
8	8.8M (9183120)
9	8.8M (9190264)
10	8.7M (`9058800`)
11	8.7M (9060144)
12	8.7M (9060152)
13	8.7M (9060152)
14	8.7M (9058552)
15	8.7M (9058552)

I think we can keep it simple. The standing logic is that the newer toolchain brings about better optimizers and analyzers, and older libc is best to increase portability. This combination renders best results on any build machine. (for static executables, old libc is not necessary)

sbomer · 2023-05-16T22:00:15Z

Closing this since we decided that:

lld is a good runtime default, but shouldn't be required (fixed in Make dependency on lld optional #85667)
We don't want to change customer NativeAot defaults because of more complex prerequisites and detection required (full context in Prefer lld in NativeAot when using clang #85478).

ghost added the untriaged New issue has not been triaged by the area owner label Apr 13, 2023

dotnet-issue-labeler bot added the area-NativeAOT-coreclr label Apr 13, 2023

sbomer added the area-Infrastructure label Apr 13, 2023

sbomer mentioned this issue Apr 17, 2023

Make dependency on lld optional during runtime build #84934

Closed

agocke added this to AppModel Apr 20, 2023

antonfirsov mentioned this issue Apr 25, 2023

[HttpStress] [SslStress] Linux stress test builds are broken #85338

Closed

sbomer mentioned this issue Apr 27, 2023

Prefer lld in NativeAot when using clang #85478

Closed

sbomer closed this as completed May 16, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label May 16, 2023

ghost locked as resolved and limited conversation to collaborators Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide on linker to use in runtime builds #84794

Decide on linker to use in runtime builds #84794

sbomer commented Apr 13, 2023 •

edited

Loading

ghost commented Apr 13, 2023

agocke commented Apr 13, 2023

sbomer commented Apr 13, 2023

ghost commented Apr 13, 2023

am11 commented Apr 14, 2023 •

edited

Loading

MichalStrehovsky commented Apr 14, 2023

kg commented Apr 15, 2023 •

edited

Loading

am11 commented Apr 15, 2023 •

edited

Loading

kg commented Apr 15, 2023

tmds commented Apr 17, 2023

sbomer commented Apr 17, 2023

agocke commented Apr 20, 2023

sbomer commented Apr 20, 2023

am11 commented Apr 20, 2023

agocke commented Apr 20, 2023

am11 commented Apr 20, 2023 •

edited

Loading

agocke commented Apr 21, 2023 •

edited

Loading

am11 commented Apr 21, 2023

jkotas commented Apr 21, 2023

MichalStrehovsky commented Apr 24, 2023

am11 commented Apr 24, 2023

MichalStrehovsky commented Apr 24, 2023

am11 commented Apr 24, 2023

sbomer commented May 16, 2023

Decide on linker to use in runtime builds #84794

Decide on linker to use in runtime builds #84794

Comments

sbomer commented Apr 13, 2023 • edited Loading

ghost commented Apr 13, 2023

agocke commented Apr 13, 2023

sbomer commented Apr 13, 2023

ghost commented Apr 13, 2023

am11 commented Apr 14, 2023 • edited Loading

MichalStrehovsky commented Apr 14, 2023

kg commented Apr 15, 2023 • edited Loading

am11 commented Apr 15, 2023 • edited Loading

kg commented Apr 15, 2023

tmds commented Apr 17, 2023

sbomer commented Apr 17, 2023

agocke commented Apr 20, 2023

sbomer commented Apr 20, 2023

am11 commented Apr 20, 2023

agocke commented Apr 20, 2023

am11 commented Apr 20, 2023 • edited Loading

agocke commented Apr 21, 2023 • edited Loading

am11 commented Apr 21, 2023

jkotas commented Apr 21, 2023

MichalStrehovsky commented Apr 24, 2023

am11 commented Apr 24, 2023

MichalStrehovsky commented Apr 24, 2023

am11 commented Apr 24, 2023

sbomer commented May 16, 2023

sbomer commented Apr 13, 2023 •

edited

Loading

am11 commented Apr 14, 2023 •

edited

Loading

kg commented Apr 15, 2023 •

edited

Loading

am11 commented Apr 15, 2023 •

edited

Loading

am11 commented Apr 20, 2023 •

edited

Loading

agocke commented Apr 21, 2023 •

edited

Loading