Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize schema managers' ::listTables() methods #5268

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

morozov
Copy link
Member

@morozov morozov commented Feb 12, 2022

Q A
Type improvement

Fixes #2676
Closes #2766
Closes #4882

Change summary

If an entire schema needs to be introspected, instead of listing the tables and then introspecting each table individually, the new implementation fetches all schema objects of each type in one query and then groups them into tables. This reduces the number of queries used for introspection from O(N) (where N is the number of tables in the schema) to O(1).

Design considerations

From the design standpoint, schema introspection deserves a separate API. Similar to the wrapper connection and the platform classes, the schema manager is already a God object.

Introducing a schema introspection API is the next logical step in improving the quality of the codebase but I decided not to do it now:

  1. The performance aspect (which is the problem the API consumers are facing) could be addressed without changing the API.
  2. There is a lot of technical debt that makes the introduction of such an API non-trivial (although there's nothing impossible, here's a PoC).
  3. The time I can spend on this right now is quite limited.

Proposed design

All common logic for introspecting the entire schema in one shot is implemented in the abstract schema manager class. In order to be able to introduce new abstract methods, we introduce a temporary internal AbstractIntrospectingSchemaManager and extend all concrete schema managers from it. This temporary class will be merged into AbstractSchemaManager in the next major release. See #5268 (comment).

Instead of using the now deprecated AbstractPlatform::getList*SQL() methods, the schema managers build the queries themselves. This has the following advantages:

  1. The logic of building queries and fetching their results is now more coherent (located within the schema manager).
  2. This allows for using prepared statements in the schema introspection queries.
  3. In the future, it will allow removing the column aliases like SELECT COLUMN_NAME AS field from the schema introspection queries since each schema manager will process the result of the query that it builds itself for itself.

Performance considerations

From my previous experience with Oracle, the time a query against a schema introspection view takes doesn't depend that much on whether a single table or the entire schema is introspected. Introspecting N tables via an individual query each would take N times longer than introspecting them all at once.

In my development environment, the integration test suite takes ~40% less time to run with this change:

$ git checkout 3.4.x
$ phpunit -c oci8.phpunit.xml tests/Functional
PHPUnit 9.5.11 by Sebastian Bergmann and contributors.

...

Time: 01:50.276, Memory: 34.00 MB

$ git checkout optimize-list-tables
$ phpunit -c oci8.phpunit.xml tests/Functional

PHPUnit 9.5.11 by Sebastian Bergmann and contributors.

...

Time: 01:05.771, Memory: 34.00 MB

Compatibility considerations

Depending on whether the AbstractPlatform::getList*SQL() methods are considered part of the SPI, the proposed changes may imply a BC break. If an API consumer uses a bundled platform and a schema manager and overrides one of these methods, the overridden logic will be no longer taken into account since the schema managers now use their own implementation.

Until the next major release, the implementation of the AbstractPlatform::getList*SQL() methods remains intact, tested but no longer used.

Deprecations

The public Platform methods that generated SQL for introspecting a single table have been deprecated.

Implementation details

Changes in schema introspection logic:

  1. IBM DB2:
    1. Compare COLSEQ when joining referencing and referenced columns of a foreign key constraint. Otherwise, the query would return a product of the two sets and result in columns being returned out of order:
      1) Doctrine\DBAL\Tests\Functional\Schema\Db2SchemaManagerTest::testCreatedCompositeForeignKeyOrderIsCorrectAfterCreation
      Failed asserting that two arrays are identical.
      --- Expected
      +++ Actual
      @@ @@
       Array &0 (
      -    0 => 'col2'
      -    1 => 'col1'
      +    0 => 'col1'
      +    1 => 'col2'
       )
      
      Before:
      JOIN SYSCAT.KEYCOLUSE AS pkcol
      ON fk.REFKEYNAME = pkcol.CONSTNAME
      AND fk.REFTABSCHEMA = pkcol.TABSCHEMA
      AND fk.REFTABNAME = pkcol.TABNAME

      After:
      JOIN SYSCAT.KEYCOLUSE AS PKCOL
      ON PKCOL.CONSTNAME = R.REFKEYNAME
      AND PKCOL.TABSCHEMA = R.REFTABSCHEMA
      AND PKCOL.TABNAME = R.REFTABNAME
      AND PKCOL.COLSEQ = FKCOL.COLSEQ
    2. Added the WHERE TABSCHEMA = <database> clause to the column, index, etc. introspection queries. Otherwise, they might return objects from other schemas.
    3. Removed the "We do the funky subquery […]" logic. It existed to have a DISTINCT clause in the query (which is rarely a good idea) which in turn was used to select the columns that weren't used anywhere.

TODO:

  • Document semantics of AbstractIntrospectingSchemaManager::normalizeIdentifier().
  • Document the new API and the design decisions.

@morozov morozov force-pushed the optimize-list-tables branch 3 times, most recently from 0121628 to 274830a Compare February 12, 2022 18:29
@morozov morozov force-pushed the optimize-list-tables branch 4 times, most recently from 6fb2357 to 699aad8 Compare February 13, 2022 02:29
@morozov morozov marked this pull request as ready for review February 13, 2022 03:14
@morozov morozov requested review from greg0ire and derrabus February 13, 2022 17:52
Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great 👍

Please document deprecations in UPGRADE.md

use function array_shift;

/**
* Base class for schema managers that improves database schema introspection performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of improvements makes sense in a commit message or PR, but in code, one might wonder what is the slow version of this.

Suggested change
* Base class for schema managers that improves database schema introspection performance.
* Base class for schema managers that provides good database schema introspection performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, just avoid qualitative statements at all

Base class for schema managers that provides database schema introspection.

Copy link
Member Author

@morozov morozov Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the original "improves" is more appropriate than "provides good". But I believe I want to get rid of this class entirely as follows:

  1. Declare the new abstract methods in the existing class as non-abstract with the throw new NotImplemented() body.
  2. Create temporary deprecated methods that leverage the new API (e.g. protected doListTableDetails()).
  3. In those classes that want to opt into the new API (all bundled schema managers), implement the abstract methods (already done) and instead of inheriting the default implementation of listTableDetails() from the base class, call doListTableDetails().

Otherwise, if we want to do more rework like this in the next minor release (e.g. listing sequences, users, etc.), we'll have to introduce another temporary abstract class turning it into a mess.

@morozov morozov force-pushed the optimize-list-tables branch from 699aad8 to 47132e6 Compare February 14, 2022 23:14
@morozov morozov force-pushed the optimize-list-tables branch from 47132e6 to 194c6eb Compare February 14, 2022 23:17
@morozov morozov added this to the 3.4.0 milestone Feb 14, 2022
@morozov morozov requested a review from greg0ire February 14, 2022 23:26
@morozov morozov merged commit 1135a79 into doctrine:3.4.x Feb 15, 2022
@morozov morozov deleted the optimize-list-tables branch February 15, 2022 15:17
@mondrake
Copy link
Contributor

Well, this is an advancement! 🚀🚀🚀

Thank you

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants