Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Table.get by doc_id to have O(1) behavior #460

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

richardsheridan
Copy link
Contributor

@richardsheridan richardsheridan commented Jan 30, 2022

Table.get is accidentally O(len(Table)) because of the conversion of doc_id in Table._read_table. However, in the few places _read_table is used, there are opportunities to convert locally. Indeed, I discovered that Table._get_next_id was redundantly converting them already. Converting the user-input doc_id to string just before doing the dict.get call makes the overall lookup O(1). Also this change permits a bit of code sharing with Table._update_table. (nope)

Of course, reading from the underlying storage is more like O(nTables*avgLenTables) in the standard case, so any speedup would only be realized when using CachingMiddleware or the like.

Looking forward to seeing if this passes the test suite! 😅

@richardsheridan
Copy link
Contributor Author

Important note: I finally had a chance to test this locally in application and _update_table doesn't work... in fact it can't work because I never assign the name tables, oops. It's trivial to revert the changes to that method, but if it is the case I broke the method, it's somewhat disturbing that the test suite passed!

@richardsheridan
Copy link
Contributor Author

OK the problem seems to be inside actions/checkout. It is putting the head of master into the runner for testing instead of this PR's code, see here. And the resulting git log checkout is 967dbf9 which is an existing commit (currently master:HEAD).

This is super weird. The checkout command should be

/usr/bin/git checkout --progress --force refs/remotes/pull/{PRnum}/merge

A little more research shows that this is an effect of triggering the CI runs from pull_request_target, which always runs against the PR merge base since it has elevated permissions to change the repo. Unfortunately I think it cannot be fixed until master reflects the change in trigger to pull_request.

I've put this mess into another PR #462 and will reset this one to a simpler commit reflecting the original purpose, but hopefully not broken.

@msiemens
Copy link
Owner

Thanks @richardsheridan, I expect to have a look at this soon 🙂

@msiemens
Copy link
Owner

Great work @richardsheridan, I really appreciate it! Reading through your PR and then through TinyDB's code I even found that we can skip converting most document IDs for search operations and also for get operations that use a query by only applying the document ID conversion to the filter/search result. But I can add that in a further commit 🙂

A futher note: There was a question in #459 (since moved to #466) that notes that one could skip the opposite document ID conversion, namely converting document IDs to string during write operations. However, as there are some storages that silently convert dict keys to strings (most prominently JSON), TinyDB needs some place to perform some sanitization of document IDs to deal with this issue. And I think, doing this at during the write operation in order to be able to skip the conversion for most documents during the read operation is the bigger performance benefit that the opposite. All this is to say that implementing this change will make it impossible to skip the document ID to string conversion at write-time but I think this is the right call 🙂

@msiemens msiemens merged commit 7dcb755 into msiemens:master Feb 15, 2022
@msiemens
Copy link
Owner

By the way, I'll try to get a release of TinyDB that contains this change published the next weekend or next week.

@richardsheridan
Copy link
Contributor Author

Awesome! I'm so happy to have contributed to the only database I may ever use :)

@richardsheridan richardsheridan deleted the patch-1 branch February 15, 2022 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants