Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random checksum mismatch error with data.artifactory_file #858

Closed
4 of 5 tasks
AbirHamzi opened this issue Dec 13, 2023 · 21 comments
Closed
4 of 5 tasks

Random checksum mismatch error with data.artifactory_file #858

AbirHamzi opened this issue Dec 13, 2023 · 21 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@AbirHamzi
Copy link

AbirHamzi commented Dec 13, 2023

Describe the bug
Randomly we get a checksum mismatch error.

Requirements for and issue

  • A description of the bug
  • A fully functioning terraform snippet that can be copy&pasted (no outside files or ENV vars unless that's part of the issue). If this is not supplied, this issue will likely be closed without any effort expended.
  • Your version of artifactory (you can curl it at $host/artifactory/api/system/version
  • Your version of terraform
  • Your version of terraform provider

Expected behavior
data.artifactory_file successfully downloads a file.

Additional context
Terraform: v1.4.2
Artifactory: v7.55.10
Artifactory provider: latest version
Steps to reproduce the issue:

  1. I forked the Artifactory provider here to gather more details about the issue.
  2. I got the following:
    Screenshot from 2023-12-10 21-14-55
    It looks like this function here is not stable.
@AbirHamzi AbirHamzi added the bug Something isn't working label Dec 13, 2023
@alexhung
Copy link
Member

@AbirHamzi Thanks for the report. I'll investigate. My initial guess is that something changed between different version of Go, as this data source checksum verification was working before.

@alexhung
Copy link
Member

@AbirHamzi Can you share how your file is stored in Artifactory and an example of your HCL?

We have acceptance test for verifying checksum (https://github.com/jfrog/terraform-provider-artifactory/blob/master/pkg/artifactory/datasource/artifact/datasource_artifactory_file_test.go#L88) and this has not failed.

@alexhung alexhung added the question Further information is requested label Dec 14, 2023
@AbirHamzi
Copy link
Author

AbirHamzi commented Dec 15, 2023

@AbirHamzi Can you share how your file is stored in Artifactory and an example of your HCL?

We have acceptance test for verifying checksum (https://github.com/jfrog/terraform-provider-artifactory/blob/master/pkg/artifactory/datasource/artifact/datasource_artifactory_file_test.go#L88) and this has not failed.

@alexhung as I mentioned, this is a random error that occurs when we run a pipeline. The first time the pipeline runs, it fails, and when we re-run it, the problem is resolved.

An example:

data "artifactory_file" "lambda_source_code" {
  repository      = "repo"
  path            = "/path/build.zip"
  output_path     = "./../../../../payloads/build.zip"
  force_overwrite = true
}
output "checksum_sha256" {
  value = data.artifactory_file.lambda_source_code.sha256
}
output "checksum_sha1" {
  value = data.artifactory_file.lambda_source_code.sha1
}
output "checksum_md5" {
  value = data.artifactory_file.lambda_source_code.md5
}

@alexhung
Copy link
Member

@AbirHamzi I see that you are using the latest version of the provider. Is it 9.9.1 or later? If so, I wonder if this is related to this Resty PR. Since 2.10.0 has the CVE issue, I'm planning to downgrade Resty from 2.10.0 to 2.9.1.

Can you downgrade your provider to 9.9.0 and see if your issue persists?

alexhung added a commit that referenced this issue Dec 15, 2023
@AbirHamzi
Copy link
Author

AbirHamzi commented Dec 16, 2023

@AbirHamzi I see that you are using the latest version of the provider. Is it 9.9.1 or later? If so, I wonder if this is related to this Resty PR. Since 2.10.0 has the CVE issue, I'm planning to downgrade Resty from 2.10.0 to 2.9.1.

Can you downgrade your provider to 9.9.0 and see if your issue persists?

@alexhung I tried provider v9.9.0 and v10.0.2, but we are still getting the same checksum error.

@alexhung
Copy link
Member

@AbirHamzi Strange. Unfortunately I have not been able to reproduce it yet.

@alexhung
Copy link
Member

@AbirHamzi The sha256.New() ( https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/crypto/sha256/sha256.go;l=150) or sha256.Sum() (https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/crypto/sha256/sha256.go;l=203) are not new or changed recently so I doubt they are the cause of this issue.

The VerifySha256Checksum code is pretty much a copy from the Golang sha256 example: https://pkg.go.dev/crypto/sha256#example-New-File so I doubt VerifySha256Checksum is the cause either.

Couple these with my inability to reproduce this locally, and this has not been reported until now, suggests to me this may be specific to your environment/setup?

@AbirHamzi
Copy link
Author

@AbirHamzi The sha256.New() ( https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/crypto/sha256/sha256.go;l=150) or sha256.Sum() (https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/crypto/sha256/sha256.go;l=203) are not new or changed recently so I doubt they are the cause of this issue.

The VerifySha256Checksum code is pretty much a copy from the Golang sha256 example: https://pkg.go.dev/crypto/sha256#example-New-File so I doubt VerifySha256Checksum is the cause either.

Couple these with my inability to reproduce this locally, and this has not been reported until now, suggests to me this may be specific to your environment/setup?

@alexhung Is it possible that the checksum is being calculated before the file is completely downloaded?

@alexhung
Copy link
Member

@AbirHamzi Very unlikely as VerifySha256Checksum is called after the file downloading is completed at: https://github.com/jfrog/terraform-provider-artifactory/blob/master/pkg/artifactory/datasource/artifact/datasource_artifactory_file.go#L246

If you set env var TF_LOG=DEBUG then you will see the debug log messages.

@AbirHamzi AbirHamzi closed this as not planned Won't fix, can't repro, duplicate, stale Dec 27, 2023
@AbirHamzi
Copy link
Author

@alexhung Thank you for your help.I am closing this issue because it is not a bug. I found another Terraform code that downloads the same file to the same path simultaneously, causing the problem ..

@bd-rkillkelley
Copy link

@alexhung we are also experiencing a similar problem where we have intermittent checksum failures

│ Error: Checksums for file tmp/artifact.jar and https://artifactory.com/artifactory/gradle/apache/spark/spark-protobuf_2.12/3.5.0/spark-protobuf_2.12-3.5.0.jar do not match, expected c3bd20799973ffb61a1e1dd0c8f40e7d7e9e0f68f38ec39851f6d23aaf6f342a
│
│   with module.prod.data.artifactory_file.protobuf_jar,
│   on ../modules/gcp_dataproc_cluster/main.tf line 90, in data "artifactory_file" "protobuf_jar":
│   90: data "artifactory_file" "protobuf_jar" {

where line 90 of our module is the artifactory_file definition

data "artifactory_file" "protobuf_jar" {
  repository      = "gradle"
  path            = "apache/spark/spark-protobuf_2.12/3.5.0/spark-protobuf_2.12-3.5.0.jar"
  output_path     = "tmp/artifact.jar"
  force_overwrite = true
}

I even calculated the checksum of the local file after the failure and got a matching checksum

c3bd20799973ffb61a1e1dd0c8f40e7d7e9e0f68f38ec39851f6d23aaf6f342a  tmp/artifact.jar

@alexhung
Copy link
Member

alexhung commented Dec 3, 2024

@bd-rkillkelley Have you tried using a different output_path value to ensure there's no collision with another file? Perhaps event suffix it with a random number?

@bd-nweinstein
Copy link

bd-nweinstein commented Dec 3, 2024

We hypothesize that this is an issue with disk flush. Here, in resty, there is no outfile.Sync() https://github.com/go-resty/resty/blob/c34e4605f4d5a775371be242ba0debb2705eb6d2/middleware.go#L585

Similarly, here:

_, err = m.(util.ProviderMetadata).Client.R().SetOutput(outputPath).Get(fileInfo.DownloadUri)

If we add a call to sync after that line, then the problem goes away:

	cmd := exec.Command("sync")

	// Run the command
	err := cmd.Run()
	if err != nil {
		fmt.Println("Error executing sync command:", err)
		return
	}

Probably for most users this doesn't arise, but since we are running in a kubernetes environment with a network mounted file system, flushing to disk is not guaranteed to happen right away and we trigger this race condition when checking the checksum.

@alexhung
Copy link
Member

alexhung commented Dec 3, 2024

@bd-rkillkelley Ah, this makes sense.

@alexhung
Copy link
Member

alexhung commented Dec 3, 2024

@bd-rkillkelley You should create a feature request for Resty to add the Sync() call to the middleware code.

@bd-nweinstein
Copy link

@bd-rkillkelley You should create a feature request for Resty to add the Sync() call to the middleware code.

Filed go-resty/resty#927

@bd-nweinstein
Copy link

The problem occurred even after calling sync on the file system, so something else is afoot.

@alexhung
Copy link
Member

alexhung commented Dec 5, 2024

@bd-nweinstein I think you're on the right path w.r.t. file handle not flushed. Can you change the output_path to somewhere in local volume (if possible) to isolate this issue to network volume?

@bd-nweinstein
Copy link

bd-nweinstein commented Dec 5, 2024

I think you're on the right path w.r.t. file handle not flushed. Can you change the output_path to somewhere in local volume (if possible) to isolate this issue to network volume?

I've now convinced myself it's probably not file flushing. Instead, it's a race condition between two modules that both define the same artifactory file.

module "foo_staging" {
  source                     = "../modules/my_module"
  environment_name           = "staging"
}

module "foo_production" {
  source                     = "../modules/my_module"
  environment_name           = "production"
// modules/my_module.tf

data "artifactory_file" "my_jar" {
  repository      = "my-repo"
  path            = "path/to/my/artifact.jar"
  output_path     = "tmp/artifact.jar" --> THIS IS THE PROBLEM
  force_overwrite = true
}

Both staging and prod are racing to download and check the checksum for /tmp/artifact.jar. One might be checking while the other is overwriting / downloading. Therefore, one module might be in the middle of downloading while the other is checking the checksum.

One fix would be to name the output path something environment specific, e.g.

data "artifactory_file" "my_jar" {
  repository      = "my-repo"
  path            = "path/to/my/artifact.jar"
  output_path     = "tmp/${var.environment_name}-artifact.jar" --> ELIMINATE RACE CONDITION
  force_overwrite = true
}

@bd-nweinstein
Copy link

@alexhung Another thought - one could probably prevent this issue, even if the tmp file has the same path in both modules, by locking before downloading.

Lock
Download file
Check file's checksum
Release lock

This way, multiple threads and projects could still save the file without stomping on each other.

@alexhung
Copy link
Member

alexhung commented Dec 6, 2024

@bd-nweinstein Thanks for the suggestion. Use of mutex may prevent this concurrency issue. I can look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants