Skip to content
This repository has been archived by the owner on May 11, 2021. It is now read-only.

Local cache is never reloaded if remote cache is cleared, expires, or crashes #176

Open
nickwilliams-eventbrite opened this issue Mar 26, 2019 · 2 comments

Comments

@nickwilliams-eventbrite

If the local ModelDict cache is populated (either from the database [Django] or from the remote cache; the source does not affect this bug) and then, after this, the remote cache is cleared, or expires, or somehow crashes, such that the values at self.remote_cache_last_updated_key and self.remote_cache_key do not exist, the local cache will never get reloaded unless somebody changes some value that causes the remote cache to get repopulated.

Compare this original code from right after forking from Disqus:

elif self.local_cache_has_expired():
local_cache_is_invalid = self.local_cache_is_invalid()
# If local_cache_is_invalid is None, that means that there was no
# data present, so we assume we need to add the key to cache.
if local_cache_is_invalid is None:
self.remote_cache.add(self.remote_cache_last_updated_key, now)
# Now, if the remote has changed OR it was None in the first place,
# pull in the values from the remote cache and set it to the
# local_cache
if local_cache_is_invalid or local_cache_is_invalid is None:
self._local_cache = self.remote_cache.get(self.remote_cache_key)
# No matter what, we've updated from remote, so mark ourselves as
# such so that we won't expire until the next timeout
self._local_last_updated = now
# Update from cache if local_cache is still empty
if self._local_cache is None:
self._update_cache_data()

To the current code from this project:

elif self.local_cache_has_expired():
local_cache_is_invalid = self.local_cache_is_invalid()
# If local_cache_is_invalid is None, that means that there was no
# data present, so we assume we need to add the key to cache.
if local_cache_is_invalid is None:
self.remote_cache.add(self.remote_cache_last_updated_key, now)
# Now, if the remote has changed OR it was None in the first place,
# pull in the values from the remote cache and set it to the
# local_cache
if local_cache_is_invalid or local_cache_is_invalid is None:
remote_value = self.remote_cache.get(self.remote_cache_key)
if remote_value is not None:
self._local_cache = remote_value
# We've updated from remote, so mark ourselves as
# such so that we won't expire until the next timeout
self._local_last_updated = now
# We last checked for remote changes just now
self._last_checked_for_remote_changes = now
# Update from cache if local_cache is still empty
if self._local_last_updated is None:
self._update_cache_data()

Now follow along with what happens when the remote cache has been cleared after the local cache has been populated. First, local_cache_is_invalid() returns None. This causes self.remote_cache_last_updated_key to be set in the cache. Then we obtain remote_value from the remote cache using self.remote_cache_key. But this will still be None because the remote cache has been cleared. As such, the if remote_value is not None block is not executed. self._local_cache and self._local_last_updated remain unchanged, and since the local cache had already been populated (self._local_last_updated is no longer None), self._update_cache_data() also does not get called. The local cache is never updated.

In the original code, self._local_cache would have been set to None, which would have caused self._update_cache_data() to be called.

At first glance, it seems like the easiest fix is simply to revert the changes to this method, but I'm sure there were good reasons for this change. So, instead, I propose an alternate fix:

                if remote_value is not None:
                    self._local_cache = remote_value
                    # We've updated from remote, so mark ourselves as
                    # such so that we won't expire until the next timeout
                    self._local_last_updated = now
                else:
                    self._local_cache = None
                    self._local_last_updated = None

This just adds three new lines to clear self._local_cache and self._local_last_updated, which will cause self._update_cache_data() to be called.

To understand the source/impact of this bug, consider our specific environment. We have hundreds of severs spread across multiple AWS availability zones. Each server hosts dozens of Django workers, all using Gargoyle feature flags. We don't use a single, central Memcached server to back Gargoyle feature flags, because that would be a SPOF. Instead, each server has its own, local Memcached that serves as the Gargoyle cache for all of the Django workers on that server. As such, when we update a feature flag, it does not update the Memcached that serves all workers with the updated Gargoyle ModelDict, just the one that servers the workers on the same machine that accepted the request to update the feature flag. To get around this, we have the Gargoyle cache expire every five minutes. This worked well when we were using Disqus, but when we switched to YPlan in order to upgrade to Python 3, it broke due to this bug. The workers only get an updated Gargoyle cache if we restart them.

I'll post a pull request after testing it thoroughly.

nickwilliams-eventbrite added a commit to nickwilliams-eventbrite/django-modeldict that referenced this issue Mar 26, 2019
…e cache is cleared

Per adamchainz#176, we need to make sure that the local cache gets reset if the remote cache is cleared. This commit accomplishes that.
@nickwilliams-eventbrite
Copy link
Author

Pull request #177 submitted.

@nickwilliams-eventbrite
Copy link
Author

[bump] Could I get some 👀 on the above pull request, please? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant