Nobeid/docker healthcheck #227

NajiObeid · 2023-04-25T20:47:04Z

fixes #225

Update the docker healthcheck script to pickup the correct time the last update was attempted.
Made sure the lock file access and modification time is updated when an update is attempted.

docker/healthcheck.sh

faktas2

Hi Naji,

I noticed that tests were not included in the pull request. What are your thoughts on adding some tests, making sure the health check works as expected? I think it would be good to see some cases where it fails and succeeds. I just want to ensure that we are maintaining a high standard of quality for our codebase.
I also wonder if we should cover cases when a database is removed from the config. I think we should cover that as well while we are at it. According to my understanding it will just keep failing.

Thanks. 👍

docker/healthcheck.sh

faktas2 · 2023-04-26T03:59:06Z

docker/healthcheck.sh

+# Without the LockFile in the database directory, this check is not going to be working
+# since database files are not going to be modified when there are no updates.
+cutoff_date=$(($(date +%s) - $GEOIPUPDATE_FREQUENCY * 60 * 61 ))
+modified_at=$(find $datbase_dir -type f -exec stat -c "%Y" {} + | sort -nr | head -n 1)


Should $datbase_dir be $database_dir?

I think we could edit the find to only look for the database files with a name convention like -name '*.mmdb'. This way it will be a lot more reliable.

If you only look for the database files, you aren't considering the lock file.

Yes. I don't think the purpose of the health check is to check the lock file as that can be set to a different location out of the database directory as well. I could be missing any cases where not checking the modification date of the lock file could cause a problem. Do you have anything in mind ? Thanks

The purpose of the health check is to make sure the service is functioning correctly. This includes ensuring that it is checking for updated databases on the defined schedule. Since the databases aren't necessarily updated every time the service checks, the lock file was (or it appears that it was intended to be) acting as a proxy for indicating that the service executed on the schedule as expected. With this in mind, I'd suggest that rather than looking for the most recently updated file in some directory, the health check script should be updated to check the lock file explicitly, regardless of its location, so that HEALTHY == I checked for updates.

Sorry, purpose was a badly chosen word at my end. I agree that it is important to ensure that the service is functioning correctly, and checking for updated databases on the defined schedule is a critical part of it. 👍

Thank you for the case you provided, I agree with you. I think the lock file should be checked explicitly. I was thinking of a case of when the lock file presents and the update process fails, which could result in false positives. But, with the case you provided, when there are no updates, that does make us check the lock file. 👍 Thank you again for your input.

Sorry, purpose was a badly chosen word at my end.

Fair enough. I just didn't want to assume anything, so I thought a comprehensive response would remove some uncertainty and ambiguity.

I was thinking of a case of when the lock file presents and the update process fails, which could result in false positives.

This is something I considered as well. It will just come down to the definition of HEALTHY, and that's certainly up for debate. Maybe it should mean more than just I checked for updates, but it certainly can't only mean I received updated databases. However, it is important to remember that this health check is used by Docker to report on the status of the container, which narrowly scopes what HEALTHY should or should not mean. Since there are external reasons why the databases may not be updated, using them to determine the health of the container gets a little more tricky. If that's the route someone would like to take, perhaps the mod time of the lock file should only be updated if no errors occur during the process of updating the databases. This could then account for getting HTTP errors (specifically 4xx), I/O errors, etc. However, this does make some assumptions about the purpose of the lock file, losing some generality, which may cause problems elsewhere if the lock file is used for any other purpose. Maybe, then, that necessitates some other approach to determining health. Though, as someone who uses this app, I would appreciate an initial approach of using the lock file as it appears to have been originally intended in order to resolve the outstanding issue, and then whoever wants to make it more involved and more robust can bring forward other proposals for determining health.

Fair enough. I just didn't want to assume anything, so I thought a comprehensive response would remove some uncertainty and ambiguity.

Thank you! I appreciate your input.

I wonder if we could go with another approach and store the individual times that the databases in the config are updated/checked. And, make the health check use that file.

Or, make a daemon mode for geoipupdate and have an internal health check endpoint for the health check.

I'm sure those suggestions could be fleshed out to provide a much more robust health check, and I'm happy to see improvements to the product. However, if I could be a little self-serving for a moment 😅, I would prefer to see this change just use the lock file as originally intended so that my open issue can be resolved. From a more objective perspective, though, I would still make the same suggestion to my own team if this was one of our services.

Thank you both for providing excellent feedback!
After some internal discussions we've decided that there were lots of rough edges in relying on the lock file, and the solution somewhat didn't feel very robust.
I agree that having a long lived process with a healthcheck endpoint is a great solution but unfortunately It's a much bigger change, so as a compromise we've opted for having geoipupdate log it's download/update logs in a json format that can be parsed and analyzed by the healthcheck script.

docker/healthcheck.sh

oschwald · 2023-05-01T22:18:47Z

CHANGELOG.md

+* Output a json log to stdout describing the result of the download
+  operation if the verbose flag is set to true.


I think we should add a new flag for this rather than adding it to the existing verbose flag so that we don't significantly change the output for existing users that are running it with the verbose flag.

Yes I can do that! But honestly anyone relying on the verbose flag to have a specific or consistent output should reevaluate, or probably redirect stdout to dev/null to get rid of the new output.

I think a minor change to the output would be fine, but suddenly introducing a bunch of JSON that is sent to stdout is a pretty big change. The existing output is intended to be human-readable. This tool is used by tens of thousands of customers, many of whom have been using it for more than a decade. Introducing such a big change, would likely cause breakage and support requests, especially outside of a major version release.

oschwald · 2023-05-01T22:19:08Z

CHANGELOG.md

@@ -1,5 +1,12 @@
 # CHANGELOG

+## 5.0.5


At this point, this is more than just a simple bugfix. We should bump this to 5.1.0.

oschwald · 2023-05-01T22:25:53Z

pkg/geoipupdate/geoip_updater.go

+	if config.Verbose {
+		output.SetOutput(os.Stdout)
+		log.SetOutput(os.Stderr)
+	}


These sorts of mutations of the package loggers when calling these functions seems unexpected. I think ideally the loggers wouldn't be package variables. For NewClient in particular, maybe we could allow passing the logger in as part of *Config so that users of the library could use whatever logger is appropriate for their use case.

faktas2

Hi Naji, would it be possible to split the Logging changes into a different PR ?

faktas2 · 2023-05-01T17:11:33Z

cmd/geoipupdate/main.go

+	log.Printf("Usage: %s <arguments>\n", os.Args[0])
+	flag.PrintDefaults()
+	//nolint: revive // deep exit from main package
+	os.Exit(1)


Maybe not in this story, but I think we should simplify the log calls in main in another story. What do you think?

log.Print(msg) statements followed by os.Exit(exitCode) should be replaced with log.Fatal() or log.Fatalf(), which I think should be refactored to be called in main. This way we don’t have to use nolints or concatenate the errors for fatalLogger.

faktas2 · 2023-05-01T18:05:43Z

cmd/geoipupdate/main.go

+		log.Print(err.Error())
+	}
+	log.Printf("Usage: %s <arguments>\n", os.Args[0])
+	flag.PrintDefaults()


The flag package here is different than the one used in args.go. This isn't going to print anything. I think we want to use the same package in both places.

It seems the flag.Parse() call also prints out the defaults by the way. I think it would be enough to just to wrap the two errors coming from getArgs() with Log.Fatalf in main. With something like,

log.Fatalf("parsing arguments: %v", err)

What do you think?

faktas2 · 2023-05-01T20:29:27Z

cmd/geoipupdate/main.go

@@ -17,11 +17,10 @@ var (
 	version                  = "unknown"


I think the comment above might need a change that it only applies to the 3 variables above.

faktas2 · 2023-05-01T20:48:01Z

cmd/geoipupdate/main.go

@@ -17,11 +17,10 @@ var (
 	version                  = "unknown"
 	defaultConfigFile        string
 	defaultDatabaseDirectory string
+	log                      = vars.NewDiscardLogger("main")


I don't think we need to create a new Logger instance. We could set the existing one using SetOutput, SetPrefix in main.

So, the reason why we added this was to check that we have a sanity test for what we are logging right? Have you confirmed if all the outputs are the same? What would your thoughts be on having a LogDebug() function that is behind the verbose flag?

I think it is hard to tell these for example will be logged only at verbose. What are your thoughts @oschwald ?

faktas2 · 2023-05-01T22:14:01Z

pkg/geoipupdate/vars/logger.go

+// It takes a string reference as an argument to be used as the prefix for
+// all log entries.
+func NewDiscardLogger(s string) *log.Logger {
+	return log.New(ioutil.Discard, prefix(s), log.LstdFlags)


I think we could modify the existing logger instead of creating a new one.

faktas2 · 2023-05-01T22:15:56Z

cmd/geoipupdate/main.go

 )

 func main() {
-	log.SetFlags(0)


I think the default output format will change with the flags you have in NewDiscardLogger. I don't think we want that.

faktas2 · 2023-05-01T22:37:33Z

pkg/geoipupdate/database/http_reader.go

@@ -47,6 +48,11 @@ func NewHTTPReader(
 	retryFor time.Duration,
 	verbose bool,
 ) Reader {
+	logger := vars.NewDiscardLogger("reader")


I'm not sure if we want to have prefixes. @oschwald What are your thoughts?

I think ideally the current verbose output would be mostly the same.

NajiObeid added 2 commits April 25, 2023 16:39

fix docker healthcheck

70d85f2

add changelog

183caa1

cford1080 reviewed Apr 26, 2023

View reviewed changes

docker/healthcheck.sh Outdated Show resolved Hide resolved

faktas2 requested changes Apr 26, 2023

View reviewed changes

NajiObeid added 8 commits April 27, 2023 00:23

geoipupdate output download log to stdout

01cabfe

add changelog

a71ced5

change healthcheck log time format to seconds

974f00e

docker healthcheck now reads from geoipupdate log output

178472e

linting

30bd301

replace io.Discard with ioutil.Discard

84411a3

require does not compare time in old go versions

fdfdab3

wrong autoimport flag package

2e4a63c

NajiObeid requested a review from faktas2 April 27, 2023 17:29

faktas2 reviewed Apr 27, 2023

View reviewed changes

docker/healthcheck.sh Outdated Show resolved Hide resolved

fix healthcheck cutoff interval

3c7e877

oschwald requested changes May 1, 2023

View reviewed changes

faktas2 requested changes May 1, 2023

View reviewed changes

NajiObeid closed this May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nobeid/docker healthcheck #227

Nobeid/docker healthcheck #227

NajiObeid commented Apr 25, 2023

faktas2 left a comment

faktas2 Apr 26, 2023

faktas2 Apr 26, 2023

cford1080 Apr 26, 2023

faktas2 Apr 26, 2023

cford1080 Apr 26, 2023

faktas2 Apr 26, 2023 •

edited

Loading

cford1080 Apr 26, 2023 •

edited

Loading

faktas2 Apr 26, 2023 •

edited

Loading

cford1080 Apr 26, 2023

NajiObeid Apr 27, 2023

oschwald May 1, 2023

NajiObeid May 2, 2023

oschwald May 2, 2023

oschwald May 1, 2023

oschwald May 1, 2023

faktas2 left a comment

faktas2 May 1, 2023

faktas2 May 1, 2023

faktas2 May 1, 2023

faktas2 May 1, 2023

faktas2 May 1, 2023

faktas2 May 1, 2023

faktas2 May 1, 2023

oschwald May 2, 2023

		* Output a json log to stdout describing the result of the download
		operation if the verbose flag is set to true.

Nobeid/docker healthcheck #227

Nobeid/docker healthcheck #227

Conversation

NajiObeid commented Apr 25, 2023

faktas2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faktas2 Apr 26, 2023 • edited Loading

Choose a reason for hiding this comment

cford1080 Apr 26, 2023 • edited Loading

Choose a reason for hiding this comment

faktas2 Apr 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faktas2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faktas2 Apr 26, 2023 •

edited

Loading

cford1080 Apr 26, 2023 •

edited

Loading

faktas2 Apr 26, 2023 •

edited

Loading