Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary information in query hints to improve query cache hit ratio #8797

Merged
merged 2 commits into from
Feb 15, 2023

Conversation

mpdude
Copy link
Contributor

@mpdude mpdude commented Jun 28, 2021

I've noticed that over time my query caches fill up with redundant queries, i. e. different cache entries for the DQL -> SQL translation that are exactly the same. For me, it's an issue because the cache entries fill up precious OPcache memory.

Further investigation revealed that the queries themselves do not differ, but only the query hints – that are part of the computed cache key – do.

In particular, only the value for the WhereInWalker::HINT_PAGINATOR_ID_COUNT query hint are different. Since WhereInWalker only needs to know if there are matching IDs but not how many, we could avoid such cache misses by using just a boolean value as cache hint.

@mpdude
Copy link
Contributor Author

mpdude commented Jun 28, 2021

Technically yes, a possible BC break. I don't know if it is considered relevant, though. (Every change breaks someone's workflow, I know.)

Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sum up or reuse your PR message body as a commit message body, it's very clear and would be a great addition to the git log.

lib/Doctrine/ORM/Tools/Pagination/Paginator.php Outdated Show resolved Hide resolved
@mpdude mpdude force-pushed the query_count_hint branch from a45fcdf to de4134d Compare June 30, 2021 13:01
@mpdude
Copy link
Contributor Author

mpdude commented Jun 30, 2021

@greg0ire Requested changes made.

Should we rename the HINT_PAGINATOR_ID_COUNT constant since it is no longer a count?

If this constant is considered an internal implementation detail (and this fix here not a relevant BC break), then renaming it might make the change even more obvious for people relying on the details.

@greg0ire
Copy link
Member

Renaming should be fine indeed: https://github.com/search?q=HINT_PAGINATOR_ID_COUNT&type=code

@mpdude
Copy link
Contributor Author

mpdude commented Jul 8, 2021

@greg0ire You mean "fine indeed" like "There's about 21,927 code results, but I've checked them all and its not an issue"?

@mpdude
Copy link
Contributor Author

mpdude commented Jul 8, 2021

Strange search result when you exclude files named WhereInWalker[Test].php:

https://github.com/search?p=1&q=HINT_PAGINATOR_ID_COUNT+-filename%3AWhereInWalkerTest.php+-filename%3AWhereInWalker.php&type=Code

@greg0ire
Copy link
Member

greg0ire commented Jul 8, 2021

@greg0ire You mean "fine indeed" like "There's about 21,927 code results, but I've checked them all and its not an issue"?

I mean: the first that come up are repos that have doctrine/orm versioned in them, so poorly coded repos, unlikely to be starred or forked by anyone, and if there was any useful package with a mention of HINT_PAGINATOR_ID_COUNT, I think Github would return that before. Just a hunch though.

Strange search result when you exclude files named WhereInWalker[Test].php:

Indeed, I think you found a bug in Github!

greg0ire
greg0ire previously approved these changes Jul 8, 2021
Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me despite the fact that we technically have a BC-break here 👍

@greg0ire
Copy link
Member

greg0ire commented Jul 8, 2021

My only doubt now is on the target branch. Not sure if this should be considered a bug.

@SenseException
Copy link
Member

It doesn't look like a bugfix to me. It's more of a feature that replaces an older one for improvement. Maybe 3.0.x would be a more fitting target.

@greg0ire
Copy link
Member

greg0ire commented Jul 9, 2021

Or 2.10.x? Since the BC-break is small…

@greg0ire
Copy link
Member

greg0ire commented Jul 9, 2021

Or, let's just avoid the BC-break entirely:

    /**
     * @deprecated use self::HINT_PAGINATOR_HAS_IDS instead
     * 
     * ID Count hint name.
     */
    public const HINT_PAGINATOR_ID_COUNT = 'doctrine.id.count';

    /**
     * ID existence hint name.
     */
    public const HINT_PAGINATOR_HAS_IDS = 'doctrine.paginator_has_ids';

(later, check both hints, but only set one ourselves)

@mpdude
Copy link
Contributor Author

mpdude commented Jul 9, 2021

I think the BC break is not so much the renaming, but the fact that the query hint no longer contains a count but a boolean instead.

Renaming the constant only makes this more and directly obvious to people relying on the value, since their code should (hopefully) fail.

So, under the assumption that nobody should be using/referring to this hint or constant in the first place, I'd rather suggest a "hard" cutover.

@mpdude
Copy link
Contributor Author

mpdude commented Jul 9, 2021

If you agree that HINT_PAGINATOR_ID_COUNT was never documented as part of the API or propagated in some kind of documentation (my guess: it wasn't), then it's an internal optimization only and not a BC break of the "public contract" in terms of API changes.

So 2.10?

@greg0ire
Copy link
Member

greg0ire commented Jul 9, 2021

I agree, let's target 2.10, and revert in 2.10.1 in the unlikely event people complain.

@SenseException
Copy link
Member

Does it make sense to mark parts of this as @internal?

@mpdude
Copy link
Contributor Author

mpdude commented Jul 9, 2021

@SenseException does Doctrine have something like a policy document describing e. g. when to use @internal?

@SenseException
Copy link
Member

None that I know of, but some BC questions can be handled when something is marked as internal.

@beberlei
Copy link
Member

Its a bugfix to me, because it hurts system scalability and performance and uses caches wrong

@mpdude mpdude changed the base branch from 2.9.x to 2.11.x February 17, 2022 09:58
@mpdude
Copy link
Contributor Author

mpdude commented Feb 17, 2022

Rebased onto 2.11.x and kindly asking for a merge ;-)

@derrabus
Copy link
Member

Please take a look at the PHPCS/Psalm failures. The build has to be green for a merge.

Also, would it be able to write a test for this issue? I'd like to avoid a regression here.

@derrabus derrabus added Bug and removed Improvement labels Feb 17, 2022
@mpdude
Copy link
Contributor Author

mpdude commented Feb 17, 2022

@derrabus I am not really sure what a reasonable test might look like...?

@derrabus
Copy link
Member

Probably some kind of functional test with two queries having caching enabled, where the second query should hit the cache but doesn't without your changes. I mean, you somehow must've tested your change to see if it solves your problem. If you can create an automated test from that, you would help us very much.

@derrabus derrabus changed the base branch from 2.11.x to 2.12.x April 24, 2022 19:56
@mpdude mpdude changed the base branch from 2.12.x to 2.14.x January 13, 2023 08:44
@mpdude
Copy link
Contributor Author

mpdude commented Jan 13, 2023

Rebased onto 2.14.x

@mpdude
Copy link
Contributor Author

mpdude commented Jan 13, 2023

@derrabus I added the requested functional test, and also verified that the test fails without the changes being suggested here.

@greg0ire greg0ire requested a review from derrabus January 16, 2023 20:39
@mpdude
Copy link
Contributor Author

mpdude commented Jan 21, 2023

@lcobucci I saw you worked a lot on the Paginator code a while back – maybe you could review this?

@lcobucci
Copy link
Member

I'm on holidays in Brazil and won't be able to look at it. I'm back in the second week of February and can review.

…it ratio

I've noticed that over time my query caches fill up with redundant queries, i. e. different cache entries for the DQL -> SQL translation that are exactly the same. For me, it's an issue because the cache entries fill up precious OPcache memory.

Further investigation revealed that the queries themselves do not differ, but only the query hints – that are part of the computed cache key – do.

In particular, only the value for the `WhereInWalker::HINT_PAGINATOR_ID_COUNT` query hint are different. Since `WhereInWalker` only needs to know _if_ there are matching IDs but not _how many_, we could avoid such cache misses by using just a boolean value as cache hint.
lcobucci
lcobucci previously approved these changes Feb 13, 2023
Copy link
Member

@lcobucci lcobucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greg0ire @beberlei @mpdude I believe that converting from int to bool won't have any impact on the lib 👍

We might be affecting users, though, as we don't know how query hints are used in the wild. AbstractQuery#getHint(), AbstractQuery#getHints(), and AbstractQuery#hasHint() are public methods that could be used for code instrumentation, for example.

It feels like we're abusing query hints a bit to communicate state between the Paginator and the WhereInWalker buuuut it's been a feature/bug/trait/peculiarity for a long time (also used in other places) 😄

tests/Doctrine/Tests/ORM/Functional/PaginationTest.php Outdated Show resolved Hide resolved
@lcobucci
Copy link
Member

@mpdude I see you're improving the caching capabilities on the ORM... it might be good to test the combination of the cacheable queries (L2C) and the Paginator (whenever you have the time).

... so we can be sure that in fact the second result has a different size.

Co-authored-by: Luís Cobucci <[email protected]>
@mpdude
Copy link
Contributor Author

mpdude commented Feb 14, 2023

@lcobucci Thank you for the suggestion to make the test more specific!

In fact, that change uncovered that both result sets still had the same size, since the IDs were autogenerated and did not start from 1 when multiple tests were executed together. So, I had to tweak it a bit to take the actual IDs into account, but now it works.

Please re-approve if you don't mind.

@greg0ire greg0ire added this to the 2.14.2 milestone Feb 15, 2023
@greg0ire greg0ire merged commit c26c559 into doctrine:2.14.x Feb 15, 2023
@greg0ire
Copy link
Member

Thanks @mpdude

@mpdude mpdude deleted the query_count_hint branch February 15, 2023 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants