-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add datasource.changed_on to cache_key #8901
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8901 +/- ##
=======================================
Coverage 58.97% 58.97%
=======================================
Files 359 359
Lines 11333 11333
Branches 2787 2787
=======================================
Hits 6684 6684
Misses 4471 4471
Partials 178 178 Continue to review full report at Codecov.
|
Thanks for this @villebro! My recommendation would be to go with the |
Thanks for chiming in @willbarrett ; personally I also started leaning towards using |
As there doesn't seem to be any objections to adding |
FYI working on this fix+unit tests surfaced some other issues I will be addressing with this PR, i.e. will take slightly longer to complete than anticipated. |
def cache_key(self, query_obj: QueryObject, **kwargs) -> Optional[str]: | ||
extra_cache_keys = self.datasource.get_extra_cache_keys(query_obj.to_dict()) | ||
cache_key = ( | ||
query_obj.cache_key( | ||
datasource=self.datasource.uid, | ||
extra_cache_keys=extra_cache_keys, | ||
changed_on=self.datasource.changed_on, | ||
**kwargs | ||
) | ||
if query_obj | ||
else None | ||
) | ||
return cache_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache key calculation logic was broken into a method of its own to enable easier unit testing.
metric if "expressionType" in metric else metric["label"] # type: ignore | ||
for metric in metrics | ||
] | ||
self.metrics = [utils.get_metric_name(metric) for metric in metrics] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During testing I noticed that the existing logic was incomplete; utils.get_metric_name
on the other hand is used elsewhere and handles all metric types correctly (legacy and ad-hoc).
# TODO: update once get_data is implemented for QueryObject | ||
with self.assertRaises(Exception): | ||
self.get_resp("/api/v1/query/", {"query_context": data}) | ||
qc_dict = self._get_query_context_dict() | ||
data = json.dumps(qc_dict) | ||
resp = json.loads(self.get_resp("/api/v1/query/", {"query_context": data})) | ||
self.assertEqual(resp[0]["rowcount"], 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old unit test seemed to be incomplete, so fixed a few bugs in the body (limit
-> row_limit
and removed time_range
) to make it work properly.
This is ready for review. |
@villebro do you think there is merit in adding a line item to UPDATING.md to mention that this PR will invalidate the entire cache given that the keys used for hashing have changed? |
@john-bodley oh absolutely, good idea. Will add a note in UPDATING. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookin' good!
(cherry picked from commit dc60db2)
extra_cache_keys = self.datasource.get_extra_cache_keys(query_obj.to_dict()) | ||
cache_key = ( | ||
query_obj.cache_key( | ||
datasource=self.datasource.uid, | ||
extra_cache_keys=extra_cache_keys, | ||
changed_on=self.datasource.changed_on, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro apologies for the late comment. In your PR description you mention that the cache key should be a function on whether the column or metric definitions associated with a datasource are changed.
On line #166 you merely use the datasource change_on
and thus I was wondering whether a changes to the columns and/or metrics cascade, i.e., trigger an update to the datasource changed_on
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my understanding that changed_on
is updated every time any change is applied, i.e. does not check if only relevant metrics/expressions have changed. While this can cause unnecessary cache misses, I felt the added complexity of checking only for relevant changes was not warranted unless the ultimately proposed simpler solution was found to be too generic (I tried to convey this in the unit test which only changed the description). If this does cause unacceptable amounts of cache misses I think we need to revisit this logic; until then I personally think this is a good compromise. However, I'm happy to open up the discussion again if there are opinions to the contrary.
CATEGORY
Choose one
SUMMARY
When changing the SQL query in a datasource, the cache key doesn't reflect this change, as only the datasource id is added to the cache key. The same problem also applies to metrics and expressions. This PR adds the
changed_on
property of the datasource to the dict on whichcache_key
is based, ensuring that any change to the datasource will invalidate cached results. While this might cause unnecessary cache misses if datasources are automatically refreshed periodically, this at least ensures that no updates to datasources (sql query, metrics, expressions) go unnoticed.TEST PLAN
Tested locally by replicating the problem in the issue, and ensuring that the bug goes away with this diff. New unit tests added to ensure that the new
QueryContext
/QueryObject
classes work as intended. A small bug was also fixed inQueryContext
.ADDITIONAL INFORMATION
REVIEWERS
@durchgedreht