Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dl.google.com is currently a single point of failure #3136

Closed
fishy opened this issue May 3, 2022 · 9 comments · Fixed by #3139
Closed

dl.google.com is currently a single point of failure #3136

fishy opened this issue May 3, 2022 · 9 comments · Fixed by #3139

Comments

@fishy
Copy link
Contributor

fishy commented May 3, 2022

What version of rules_go are you using?

v0.31.0

What version of gazelle are you using?

v0.24.0

What version of Bazel are you using?

5.1.1

Does this issue reproduce with the latest releases of all the above?

Yes

What operating system and processor architecture are you using?

linux/amd64

Any other potentially useful information about your toolchain?

What did you do?

There was (I assume) a bad deploy that broke dl.google.com yesterday afternoon around 3:45pm (pacific time) that was fixed ~15min later. during that ~15min, all builds on our CI/CD system failed with:

190 | WARNING: Download from https://dl.google.com/go/go1.18.1.linux-amd64.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 502 Bad Gateway
191 | ERROR: An error occurred during the fetch of repository 'go_sdk':
192 | Traceback (most recent call last):
193 | File "/root/.cache/bazel/_bazel_root/fc07cdbdb3ccc5391e01bb1a31f63d3c/external/io_bazel_rules_go/go/private/sdk.bzl", line 100, column 16, in _go_download_sdk_impl
194 | _remote_sdk(ctx, [url.format(filename) for url in ctx.attr.urls], ctx.attr.strip_prefix, sha256)
195 | File "/root/.cache/bazel/_bazel_root/fc07cdbdb3ccc5391e01bb1a31f63d3c/external/io_bazel_rules_go/go/private/sdk.bzl", line 205, column 21, in _remote_sdk
196 | ctx.download(
197 | Error in download: java.io.IOException: Error downloading [https://dl.google.com/go/go1.18.1.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/fc07cdbdb3ccc5391e01bb1a31f63d3c/external/go_sdk/go_sdk.tar.gz: GET returned 502 Bad Gateway

We happened to be trying to do a deploy around the time and have to keep retrying it until dl.google.com was fixed to finally be able to continue the deploy.

For context, we have this part related to go_sdk in our WORKSPACE file:

GO_VERSION = "1.18.1"

...

go_register_toolchains(version = GO_VERSION)

Here are some ideas/suggestions/feature requests to make dl.google.com no longer a single point of failure:

  1. Make the go_sdk cache-able by remote cache

We do have a bazel remote cache setup (backed by an s3 bucket) for our CI/CD system. Since we have pinned go version, the download url for go_sdk is fixed, so if rules_go can get that from remote cache instead of the original source that would fix most of the problem.

I assume rules_go might still need the index file containing the checksum, which is dynamic by nature and not cache-able, so that might make this less feasible in reality?

  1. Add urls arg to go_register_toolchains

During the dl.google.com outage I tried to see if I can override the download url from rules_go, and found out that there's urls arg for go_download_sdk, but not for go_register_toolchains. If we add urls arg to go_register_toolchains so I can set it to both https://dl.google.com/go/{} and https://go.dev/dl/{}, it might helped during similar outages (I'm not 100% sure whether go.dev was affected by the same outage, when I tried go.dev and found out that it works, dl.google.com also recovered shortly after), or we can run an internal mirror of it.

  1. Better documentation to sdks arg of go_download_sdk

Alternatively to 2, the only way to avoid using the index file for checksum I can find of is via the sdks arg of go_download_sdk, but the documentation says "see description" and I don't see any example in the description to show how this string_list_dict is supposed to look like. If we can add an example of it to the documentation, I guess I can also switch to use go_download_sdk in order to pin the mirrors and checksums.

So our WORKSPACE would probably look like this:

GO_VERSION = "1.18.1"
GO_LINUX_AMD64_SHA256 = "..."
GO_DARWIN_AMD64_SHA256 = "..."
GO_DARWIN_ARM64_SHA256 = "..."

...

go_download_sdks(
    name = "go_sdk",
    version = GO_VERSIONS,
    urls = [
        "https://dl.google.com/go/{}",
        "https://go.dev/dl/{}",
        # internal mirror here
    ],
    sdks = ...,
)

go_register_toolchains()

What did you expect to see?

What did you see instead?

@uhthomas
Copy link

uhthomas commented May 3, 2022

  1. Make the go_sdk cache-able by remote cache

Repository rules are not cacheable, this is just a Bazel thing unfortunately. (bazelbuild/bazel#6359)

Though there is the remote downloader which may be useful. It won't fully cache repository rules, but it should cache the archive.

  1. Add urls arg to go_register_toolchains

Why not just use go_download_sdk? It doesn't need any additional attributes.

  1. Better documentation to sdks arg of go_download_sdk

I don't believe this is necessary given that sdks is not a required attribute.

@fishy
Copy link
Contributor Author

fishy commented May 3, 2022

I don't believe this is necessary given that sdks is not a required attribute.

I think the sdks attribute could help us avoid the need of the index file for verifying the checksum? which would make the internal mirror easier (we only need to mirror the actual sdk download files)

@uhthomas
Copy link

uhthomas commented May 4, 2022

Hmm interesting, yeah. Given that the fallback is https://dl.google.com per https://github.com/bazelbuild/rules_go/blob/ba4668684b2eb7dbe7f197dc9435a780fecbfbf8/go/private/sdk.bzl#L70-L76

Is there something in particular you feel the documentation is missing?

FWIW we generate our sdks with a one line shell script (because we have a custom version for Linux x86_64):

#!/bin/sh

curl -Lfs https://golang.org/dl/\?mode\=json | jq -r '.[0].files | .[] | select(.kind == "archive") | "\"\(.os)_\(.arch)\": (\"\(.filename)\", \"\(.sha256)\"),"'

@fishy
Copy link
Contributor Author

fishy commented May 4, 2022

Is there something in particular you feel the documentation is missing?

Basically an example of a working sdks attribute would be helpful.

Currently the documentation regarding that attribute just says:

  • Type: string_list_dict
  • Default value: see description

which is not very helpful for people trying to set this to figure out what they need to put there (e.g. what exactly are the expected keys in the strring_list_dict?)

Based on the one-line script you gave, looks like this works (I only kept the os_arch combinations we care about):

sdks = {
"darwin_amd64": ("go1.18.1.darwin-amd64.tar.gz", "3703e9a0db1000f18c0c7b524f3d378aac71219b4715a6a4c5683eb639f41a4d"),
"darwin_arm64": ("go1.18.1.darwin-arm64.tar.gz", "6d5641a06edba8cd6d425fb0adad06bad80e2afe0fa91b4aa0e5aed1bc78f58e"),
"linux_amd64": ("go1.18.1.linux-amd64.tar.gz", "b3b815f47ababac13810fc6021eb73d65478e0b2db4b09d348eefad9581a2334"),
}

so an example like that in the doc would be helpful :)

@fishy
Copy link
Contributor Author

fishy commented May 4, 2022

also can you say more about the custom version of linux x86_64? I'm wondering if I should also do that.

@sluongng
Copy link
Contributor

sluongng commented May 4, 2022

Alright it seems like https://github.com/bazelbuild/rules_go/blob/master/go/toolchains.rst#go-download-sdk could use an update with an example including usage of urls attribute.

@fishy seems like you are almost there, do you want to make this doc contribution?

fishy added a commit to fishy/rules_go that referenced this issue May 4, 2022
@fishy
Copy link
Contributor Author

fishy commented May 4, 2022

Alright it seems like https://github.com/bazelbuild/rules_go/blob/master/go/toolchains.rst#go-download-sdk could use an update with an example including usage of urls attribute.

@fishy seems like you are almost there, do you want to make this doc contribution?

Sure! See #3139

@uhthomas
Copy link

uhthomas commented May 4, 2022

also can you say more about the custom version of linux x86_64? I'm wondering if I should also do that.

We're using Go+BoringCrypto for compliance reasons. I wouldn't recommend it unless you're looking for FIPS compliance.

@fishy
Copy link
Contributor Author

fishy commented May 4, 2022

@uhthomas oh thanks. I thought by "we" you meant "rules_go default" :)

fishy added a commit to fishy/rules_go that referenced this issue May 4, 2022
linzhp pushed a commit that referenced this issue Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants