Race condition in`getitem` #40

Jaidan · 2016-10-13T23:01:17Z

This should fix #18. I wasn't able to find a reasonable test to reproduce the behavior unfortunately, but this should cause __getitem__to recurse if a cross thread reset is detected.

Make local_cache threadsafe

adamchainz · 2016-10-26T13:35:38Z

I'm not entirely sure about this change, it's moving a lot of the internals around. Also a recursive call to __getitem__ naturally scares me since what if it infinitely recurses? 😫

Can you explain the problem in a bit more depth? It should go in the HISTORY file at least 😸

Whilst looking at this I accidentally found your gargoyle fork at https://github.com/roverdotcom/gargoyle . If you're using that with django-modeldict-yplan I can't make any guarantees about compatibility.

Jaidan · 2016-10-26T16:40:56Z

Happy to give you a run down. We're talking about a threading issue in Python so sorry if this is a bit complex and long winded =)

We ran into this issue because our production environment is a multithreaded and multiprocess uwsgi configuration. Multiprocess to do most of the heavy concurrency lifting, but also multithreaded to limit worker exhaustion when blocking on IO (primarily database calls). When modeldict is used in a multithreaded environment the retrieval behavior was subject to a race condition if the _local_cache was cleared by one thread while another thread checked a value.

Previously setting _local_cache to None was used to "clear" the cache. This was a problem as __getattr__ assumed it could never be None after _populate was called. It was possible for the cache to be cleared by another thread during a context switch of the GIL that occurred after _populate was called. For example:

In our environment we make heavy use of gargoyle. This means the majority of requests check many flags and as a result frequently accesses the modeldict _local_cache. Every time that we changed flag's setting that would result in the _post_save handler being called by one thread (which calls _populate(reset=True)), and frequently this occurred while another thread blocked after _populate. The result is that changing a flag would frequently cause an AttributeError due to an attempt to lookup a key on a NoneType object.

This addresses the issue by ensuring that the _local_cache is always a dictionary object, and a flag (_local_last_updated) is separately maintained that tracks the initialization state of _local_cache. Threading issues are avoided by always changing the flag and the _local_cache in a safe order and always clearing via the clear_cache method.

Infinite recursion is theoretically a possibility with this implementation, though the conditions required to cause it would be pretty far fetched for this use case. We only recurse if we get a KeyError while accessing the _local_cache and _local_last_updated is None. This state only occurs if one thread clears the cache while another thread is checking a value. If one was programmatically repeatedly triggering updates one could cause a another thread to recurse. In reality, this never will recurse more than once (at least in the Gargoyle use case).

We could alternatively fix this by doing a deepcopy on the _local_cache and returning that from _populate. This would be a lot simpler and require a lot less internals to be adjusted. However, it has the downside of requiring a deepcopy on every _populate call, which could be performance intensive in a project entirely designed to improve performance, so we elected to go with the more complicated solution

That fork was previously used as a quick fix workaround the previous (much larger and worse) threading issue where updates to the cache where not properly being invalidated before updates. We are no longer using it since that issue was already fixed in #25 (I think that was the one).

adamchainz · 2016-10-28T12:56:36Z

Thanks for the long explanation! Since the test changes are pretty minor I'm also confident this should work.

Merged in 7491f4a.

A note: at YPlan we had a similar setup to what you describe, but when we dropped the multiple threads and moved to 2 processes per core, it actually improved our performance. Worth a try! Threads are complicated.

adamchainz · 2016-10-28T13:00:14Z

Deployed on PyPI at 1.5.4: https://pypi.python.org/pypi/django-modeldict-yplan

Jaidan and others added 2 commits October 13, 2016 13:47

Make local_cache threadsafe

92d009f

Merge pull request #3 from roverdotcom/threadsafe-populate

8423b6c

Make local_cache threadsafe

adamchainz self-assigned this Oct 24, 2016

adamchainz closed this Oct 28, 2016

adamchainz mentioned this pull request Nov 13, 2016

Race condition in CachedDict.__getitem__ #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in`getitem` #40

Race condition in`getitem` #40

Jaidan commented Oct 13, 2016

adamchainz commented Oct 26, 2016

Jaidan commented Oct 26, 2016

adamchainz commented Oct 28, 2016

adamchainz commented Oct 28, 2016

Race condition in__getitem__ #40

Race condition in__getitem__ #40

Conversation

Jaidan commented Oct 13, 2016

adamchainz commented Oct 26, 2016

Jaidan commented Oct 26, 2016

adamchainz commented Oct 28, 2016

adamchainz commented Oct 28, 2016

Race condition in`getitem` #40

Race condition in`getitem` #40