Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action outputs with fixed timestamps cause problems #14723

Closed
miscott2 opened this issue Feb 4, 2022 · 11 comments
Closed

Action outputs with fixed timestamps cause problems #14723

miscott2 opened this issue Feb 4, 2022 · 11 comments
Assignees
Labels
more data needed P1 I'll work on this now. (Assignee required) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug

Comments

@miscott2
Copy link

miscott2 commented Feb 4, 2022

Description of the problem / feature request:

If an output file from a Bazel action has a fixed timestamp then if the content of the file change but the size of the file doesn't it won't trigger the re-running of any actions that depends on this file.

Feature requests: what underlying problem are you trying to solve with this feature?

Some of the inputs to our build are archives containing template files that are expanded during the build. The archives are produced by another build system and to ensure that build system is reproducible the timestamps of the template files are set to 1970 in the archive. During the build we extract the templates files from the archive (retaining their timestamp as this is the default behavior of tar), expand the template and then use the expanded template in a further action. When we checkout a different refpoint of our code sometimes the archive has changed but the templates are still the same size and this results in actions not being re-run when they should be.

I understand that Bazel can't rehash all input files but I was a bit surprised that when an action is run that all output files are not rehashed. Perhaps this is too much of an edge case to worry about but I thought it was worth raising.

We've worked around the issue by not retaining timestamps when we extract files from the archives but wanted to check if Bazel should handle this.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a workspace with the following BUILD and .bzl files

$cat BUILD
load("rules.bzl", "my_expand")

genrule(
    name = "my_templates",
    srcs = ["template_archive.tar"],
    outs = ["template1"],
    cmd = "tar -C $(@D) -xf template_archive.tar",
)

my_expand(
    name = "expand1",
    input = "template1",
    output = "expanded1",
    to_sub = {"test":"foo"}
)
$cat rules.bzl
def _my_expand_impl(ctx):
    ctx.actions.expand_template(
        template = ctx.file.input,
        output = ctx.outputs.output,
        substitutions = ctx.attr.to_sub
    )

my_expand = rule(
    implementation = _my_expand_impl,
    attrs = {
        "input": attr.label(allow_single_file=True),
        "output": attr.output(),
        "to_sub" : attr.string_dict(),
    }
)

Now create a set of archives with different template files

echo "test : alpha" > template1
tar -cf template_archive_alpha.tar --mtime='1970-01-01' template1
echo "test : delta" > template1
tar -cf template_archive_delta.tar --mtime='1970-01-01' template1
echo "test : beta" > template1
tar -cf template_archive_beta.tar --mtime='1970-01-01' template1

Put the alpha archive in place as the input, build and verify that the expanded template is as expected.

$cp template_archive_alpha.tar template_archive.tar 
$bazel ...
INFO: Analyzed 2 targets (0 packages loaded, 0 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 0.106s, Critical Path: 0.03s
INFO: 2 processes: 1 internal, 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
$cat bazel-bin/expanded1 
foo : alpha

Switch to the delta archive as the input, build and verify that the expanded template has not been updated.

$cp template_archive_delta.tar template_archive.tar 
$bazel ...
INFO: Analyzed 2 targets (0 packages loaded, 0 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 0.095s, Critical Path: 0.03s
INFO: 2 processes: 1 internal, 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
$cat bazel-bin/expanded1 
foo : alpha

Even though the extracted template has been updated

$cat bazel-bin/template1 
test : delta

Verify that performing a Bazel clean and building results in the correct output.

$bazel clean
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
$bazel build ...
INFO: Analyzed 2 targets (5 packages loaded, 10 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 0.186s, Critical Path: 0.03s
INFO: 3 processes: 2 internal, 1 linux-sandbox.
INFO: Build completed successfully, 3 total actions
$cat bazel-bin/expanded1 
foo : delta

Switching to/from the alpha/delta archives and the beta archive works fine presumably because the size of the template files change (beta being 4 characters and alpha/delta being 5 characters).

What operating system are you running Bazel on?

Linux

What's the output of bazel info release?

release 4.1.0- (@non-git)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

I don't have full details to hand about how we built Bazel but I'm confident that any changes have not impacted the behavior described above.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

No. I'm not even confident if Bazel should handle this scenario but I think it's worth asking.

Any other information, logs, or outputs that you want to share?

No

@gregestren gregestren added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc untriaged type: bug labels Mar 7, 2022
@gregestren
Copy link
Contributor

I just skimmed this, but at first glance that surprises me. Bazel should, generally, be correct for any changes. team-Core should be able to give you better insight.

@janakdr
Copy link
Contributor

janakdr commented Mar 8, 2022

Are you invoking Bazel with any special flags? --cache_computed_file_digests could cause this problem, although I would expect the node id to still catch that the file has changed. I used your test case on both Mac and Bazel, but couldn't reproduce:

$ bazel build //:expand1
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:expand1 (5 packages loaded, 10 targets configured).
INFO: Found 1 target...
Target //:expand1 up-to-date:
  bazel-bin/expanded1
INFO: Elapsed time: 2.257s, Critical Path: 0.03s
INFO: 3 processes: 2 internal, 1 linux-sandbox.
INFO: Build completed successfully, 3 total actions
$ cat bazel-bin/expanded1
foo : alpha
$ cp template_archive_delta.tar template_archive.tar
$ bazel build //:expand1
INFO: Analyzed target //:expand1 (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:expand1 up-to-date:
  bazel-bin/expanded1
INFO: Elapsed time: 0.124s, Critical Path: 0.01s
INFO: 3 processes: 2 internal, 1 linux-sandbox.
INFO: Build completed successfully, 3 total actions
$ cat bazel-bin/expanded1
foo : delta
$ ls -l bazel-bin/
total 8
-r-xr-xr-x 1 janakr primarygroup 12 Mar  8 12:25 expanded1
-r-xr-xr-x 1 janakr primarygroup 13 Jan  1  1970 template1

@janakdr janakdr added more data needed P1 I'll work on this now. (Assignee required) and removed untriaged labels Mar 8, 2022
@janakdr janakdr self-assigned this Mar 8, 2022
@fmeum
Copy link
Collaborator

fmeum commented Mar 8, 2022

@janakdr On Linux, I was able to reproduce the issue with Bazel 4.1.0 and 4.2.0, but not 5.0.0.

@janakdr
Copy link
Contributor

janakdr commented Mar 8, 2022

Thanks @fmeum . Your original comment seems to be deleted, but you also correctly pointed out there that --cache_computed_file_digests has a default non-zero value in Bazel, which would mostly explain the issue (I just verified that the inodes are the same). @miscott2 is upgrading to 5.0 a reasonable solution for you? Separately we should understand what the difference between 4.2 and 5.0 is, to make sure that we're doing the right thing now.

I see a potential area for improvement regardless: in most of Bazel, we use the ctime as part of our file data but for the digest cache we only use mtime. Adding ctime should make this cache more robust.

@fmeum
Copy link
Collaborator

fmeum commented Mar 8, 2022

@janakdr I deleted my original comment because I noticed that setting --cache_computed_file_digests=0 did not seem to reliably fix the issue for me - I didn't want to drop a red herring. This makes me all the more interested in finding out which change between 4.2 and 5.0 ended up fixing this. Do you plan to run a bisect?

@miscott2
Copy link
Author

miscott2 commented Mar 9, 2022

Thanks everyone for your comments/efforts. Yes upgrading to Bazel 5.0 is absolutely fine as a solution for us as we were planning to do so imminently anyway.

@janakdr
Copy link
Contributor

janakdr commented Mar 29, 2022

Since I won't be able to work on this, I'm going to close, since a workaround exists. It would be nice to understand what went wrong in case it's still lurking, but I can't say the team should prioritize it.

cc @alexjski for interest

@janakdr janakdr closed this as completed Mar 29, 2022
@jeremiahar
Copy link

I ran into a correctness issue under similar circumstances as the original issue with bazel 7.0.0-pre.20230316.2 + Linux 6.2.8 + sandbox_base pointing to tmpfs + ext4 . I built a tarball in a rule with a fixed mtime and then extracted it in another rule which lead to this issue when changing constants in a cpp file. Sometimes it would take 2 or 3 changes of the constant before the issue occurred. In my case, touching the extracted output files was enough to fix the correctness issue. I tested with --cache_computed_file_digests=0 and this also fixed the issue. Bazel relying on the mtime in this way is extremely surprising behaviour and should really be better documented (#14401) and honestly probably disabled by default.
cache_computed_file_digests

@fmeum
Copy link
Collaborator

fmeum commented Apr 6, 2023

I looked into this again and was able to reproduce the issue with Bazel 6.1.1 and last_green, but not with the original reproducer: In order to reproduce, I need to mv instead of cp the tar files between builds.

It looks like this is fixed by adding ctime to the cache key as suggested by @janakdr in #14723 (comment). I submitted #18003, which includes an integration test, but will have to investigate other test failures this causes.

@meteorcloudy As this is an incremental correctness issue, a fix should probably go into the planned 5.4.1 and 6.1.2 releases.

@fmeum
Copy link
Collaborator

fmeum commented Apr 6, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 6, 2023
@ShreeM01
Copy link
Contributor

ShreeM01 commented Apr 6, 2023

@bazel-io fork 6.2.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 6, 2023
keertk added a commit that referenced this issue Apr 14, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

Closes #18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85

Co-authored-by: Fabian Meumertzheim <[email protected]>
keertk added a commit that referenced this issue Apr 14, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

Closes #18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85

Co-authored-by: Fabian Meumertzheim <[email protected]>
fmeum added a commit to fmeum/bazel that referenced this issue Apr 17, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes bazelbuild#14723

Closes bazelbuild#18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85
keertk pushed a commit that referenced this issue Apr 17, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

Closes #18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes bazelbuild#14723

Closes bazelbuild#18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more data needed P1 I'll work on this now. (Assignee required) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants