Use binary cache at `store_uri` for certain operations #1360

Ma27 · 2024-02-12T16:36:08Z

Based on #1359. Trivial now that we have Perl bindings that accept a store URI.

This changes the following operations to not use the local store, but the binary cache store instead:

Check if build products were GCed
Check for valid paths when generating a nix channel
Generating the runtime dependencies
Hydra's builtin binary cache

Also switched from a global variable to subs to keep the include test working that needs to read config (for the store_uri) before the files to be included are created.

Closes #1352 as it solves the same issue.

cc @Ericson2314

Replaces / Closes NixOS#1352 Consider the following setup: * `store_uri` points to a "special" store, e.g. `s3://` or `file://`. * Hydra runs inside e.g. an nspawn container and exclusively builds on remote systems. Then, build products are never in the nspawn's `/nix/store`. This in turn means that `$MACHINE_LOCAL_STORE->isValidPath` is always `false` (or `0` 🤷) for output paths and thus Hydra wrongly claims that the output got garbage collected. Instead, use the BINARY_CACHE_STORE to look for the availability in the correct store. Everything touching the `drv` rather than the output paths still uses MACHINE_LOCAL_STORE: this is because `hydra-eval-jobs` does `openStore()` on `auto`, so the derivations end up there rather than on the BINARY_CACHE_STORE.

Now that we don't assume the local store anymore, but use the BINARY_CACHE_STORE which is effectively `openStore(store_uri)`, it doesn't really matter where the store is. Also, this doesn't seem too expensive for at most 5 build outputs.

Otherwise the `include`-test would fail: this test defines a `hydra.conf` with an `include` statement. However, the file to be included is created after the test base is set up. This means that - among others - the `Hydra::Helper::Nix` module is loaded before the file to include exists. Since the global variables are directly initialized and try to read from the config, the test fails, because loading the config at that time doesn't work. Delaying this into a subrouting solves the issue. I decided against messing around with the config mechanism here since I consider this behavior to be correct. The local store gets (i.e. `Hydra::Store->new()` w/o args) cached in the Perl bindings already[1], however this isn't the case for the binary cache store. Because of that, a `state` variable is used to cache the binary cache store. At runtime, the location of that store doesn't change anyways. [1] https://github.com/NixOS/nix/blob/e3ccb7e42a55529d8d50544e7e0cebb5cbc606cf/perl/lib/Nix/Store.xs#L88-L96

Ericson2314 · 2024-02-12T18:07:04Z

I'm still sorta confused how it ever worked before / are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully?) OK.

Ma27 · 2024-02-12T20:08:28Z

I'm still sorta confused how it ever worked before

Given how the Perl bindings used to look like, my best guess is that /nix/store was always used (IIRC there was no real way around that in pre Nix 2.0 unless you change some configure flags) and store_uri was added later on, but a lot of the Perl code was never updated accordingly.

Perhaps @edolstra knows a little more?

are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully

It should be OK for file://, but I'm absolutely not sure about e.g. s3:// (on hydra.nixos.org), that's part of the reason why I put that change into its own commit (also, I don't have an environment to reasonably test this).

Perhaps we want to leave that part out for now or split it up once more even?

src/lib/Hydra/Controller/Build.pm

Ericson2314 · 2024-02-12T20:20:05Z

@Ma27 Right. Yeah the fact that Hydra presumably supported binary caches long before Nix had the store abstraction that included binary caches, is what makes things confusing --- why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while?

This method is applicable on all kinds of stores, so let's give it a store rather than letting it decide which store to use.

Ma27 · 2024-02-13T17:54:49Z

why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while?

Yeah, I tried to use that when I first set up a Hydra and I think I never managed to set up binary caching back then.

Considering that binary_cache_secret_key_file raises a deprecation anyways, I'm wondering if we should rip that out instead and suggest to serve a flat-file binary cache via nginx. See also #1331.

I wouldn't expect local stores to be much of an impact, but you know the implementations better than me. Also, the most used thing is probably the build overview which is basically a few isValidPath() calls and I can't remember derivations with >10 outputs.

OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: NixOS/infra#299
cc @NixOS/infra for opinions as well.

So perhaps we want to exclude 44c98e1 from this PR @Ericson2314 ?

delroth · 2024-02-14T00:05:26Z

OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: NixOS/infra#299

Pay by requester shouldn't be relevant here - I doubt anyone other than us is directly using s3://nix-cache for their Hydra instance, and when the requester is us (NixOS Hydra) we'd be paying either way even if it wasn't pay by requester. Plus, the operations here are all super cheap, it's bandwidth that's the expensive part.

Latency could indeed be a problem: this is switching from an operation that completes in milliseconds (checking the existence of a local path) to an operation that completes in hundreds of milliseconds (cross-atlantic roundtrip), and there's no parallelism or batching anywhere to try and alleviate this. Of course, I'm not saying that it is a problem - I don't know nearly enough about the Hydra codebase to even understand what's being changed here. But if we were before checking 100+ paths in the local store, and we switch to checking 100+ paths in a remote store, we're going from something near-instant (sub-second) to something that needs roughly a minute to run.

Ma27 · 2024-03-04T10:43:42Z

Yep, agreed about the latency part.

the queryPathFromHashPart in Controller/Root.pm is only for Hydra's builtin binary cache, so irrelevant for hno. In fact, if infra-only Hydra becomes a thing, that's something I'd suggest to rip out.
I think the same applies to the NixChannel change, but somebody should probably double-check that.
The change in build_GET in Controller/Build.pm is used to check if a product was GCed. This only affects the outputs of a single derivation, i.e. O(n) with n being number of outputs. Can't recall an instance where this was >10. This will also be helpful for hno I think because then you'll see on Hydra's build overview whether a certain build was part of the upcoming GC.
The output sub in Coontroller/Build.pm also works on a single derivation only.

These are the changes that I'd consider good to go.

While revisiting build_deps & runtime_deps in Controller/Build.pm, I'm not sure anymore if that's something we want, especially on hno which is already rather loaded AFAIK. Perhaps we want to split this off for now @delroth @Ericson2314 ?

lheckemann · 2024-03-07T13:22:51Z

Regarding the latency, I think this should be fine because a lot of the queries are going to be repeats and the narinfo cache prevents repeated checks? Not 100% sure but I'm in favour of merging it without the split and seeing how it impacts performance.

Ma27 · 2024-03-07T17:43:18Z

Fine by me, but that call should be made by the operators of hno

Ma27 added 3 commits February 12, 2024 18:52

Always check for build availability

44c98e1

Now that we don't assume the local store anymore, but use the BINARY_CACHE_STORE which is effectively `openStore(store_uri)`, it doesn't really matter where the store is. Also, this doesn't seem too expensive for at most 5 build outputs.

Ma27 force-pushed the nix-perl-bindings-with-bugfix branch from 1d5b3d5 to 67de7a3 Compare February 12, 2024 17:53

Ericson2314 changed the base branch from nix-next to master February 12, 2024 17:56

Ericson2314 changed the base branch from master to nix-next February 12, 2024 17:57

Ericson2314 reviewed Feb 12, 2024

View reviewed changes

src/lib/Hydra/Controller/Build.pm Outdated Show resolved Hide resolved

Provide store for getDependencyGraph

e69d857

This method is applicable on all kinds of stores, so let's give it a store rather than letting it decide which store to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use binary cache at `store_uri` for certain operations #1360

Use binary cache at `store_uri` for certain operations #1360

Ma27 commented Feb 12, 2024

Ericson2314 commented Feb 12, 2024

Ma27 commented Feb 12, 2024

Ericson2314 commented Feb 12, 2024

Ma27 commented Feb 13, 2024

delroth commented Feb 14, 2024

Ma27 commented Mar 4, 2024

lheckemann commented Mar 7, 2024

Ma27 commented Mar 7, 2024 •

edited

Loading

Use binary cache at store_uri for certain operations #1360

Are you sure you want to change the base?

Use binary cache at store_uri for certain operations #1360

Conversation

Ma27 commented Feb 12, 2024

Ericson2314 commented Feb 12, 2024

Ma27 commented Feb 12, 2024

Ericson2314 commented Feb 12, 2024

Ma27 commented Feb 13, 2024

delroth commented Feb 14, 2024

Ma27 commented Mar 4, 2024

lheckemann commented Mar 7, 2024

Ma27 commented Mar 7, 2024 • edited Loading

Use binary cache at `store_uri` for certain operations #1360

Use binary cache at `store_uri` for certain operations #1360

Ma27 commented Mar 7, 2024 •

edited

Loading