Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postpone GetLite2 downloading task #1528

Closed
lookis opened this issue Aug 30, 2022 · 2 comments · Fixed by #1548
Closed

Postpone GetLite2 downloading task #1528

lookis opened this issue Aug 30, 2022 · 2 comments · Fixed by #1548
Labels
Milestone

Comments

@lookis
Copy link

lookis commented Aug 30, 2022

Summary

I am deploing shlink with knative, and will auto scaling to zero pod if possible.

So It will be very slow when knative auto scaling up because it will create new pod and wait geolite2 downloading finished.

I am suggesting make downloading job async and geolocate the early data after that

@lookis lookis added the feature label Aug 30, 2022
@acelaya
Copy link
Member

acelaya commented Aug 30, 2022

I have thought about this many times. I will see how hard would be to do it and get it done if possible.

@acelaya acelaya added this to the 3.3.0 milestone Aug 30, 2022
@acelaya acelaya added this to Shlink Sep 1, 2022
@acelaya acelaya moved this to Todo 🗒️ in Shlink Sep 1, 2022
@acelaya acelaya moved this from Todo 🗒️ to In Progress 📝 in Shlink Sep 3, 2022
@acelaya acelaya moved this from In Progress 📝 to Todo 🗒️ in Shlink Sep 11, 2022
@acelaya acelaya moved this from Todo 🗒️ to In Progress 📝 in Shlink Sep 17, 2022
@acelaya
Copy link
Member

acelaya commented Sep 18, 2022

So, after considering a couple of options, the way I have decided to approach this is the next.

It would already be possible to disable the initial Geolite db download, without a big impact in Shlink's behavior (this was implemented in case the download fails, to prevent the container start-up to fail). By doing so, Shlink would skip locating the first visit, and trigger the downloading in background, so that next visits can be located.

What I'm going to do is the next:

  • Create a new env var which will allow the initial download to be skipped: SKIP_INITIAL_GEOLITE_DOWNLOAD=true.
  • Add a new logic that will make Shlink try to locate all un-located visits after the first GeoLite DB has been downloaded. That way you don't miss the location for the visits that specific container received while the GeoLite db file was not yet available.

This has one side effect though. When locating visits right away, Shlink knows the full non-anonymized IP address, which makes the geolocation slightly more precise.

With this approach, all visits that happen before the initial GeoLite download, will be located with their anonymized IP address.

It shouldn't have a big impact, as the download is usually reasonably fast, and the deviation between regular IP and anonymized one is not always that big.

It might have a relatively bigger impact if you have a huge amount of traffic, but this is the best approach I can come up with.

@acelaya acelaya moved this from In Progress 📝 to In review 👀 in Shlink Sep 18, 2022
Repository owner moved this from In review 👀 to Done ✅ in Shlink Sep 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants