Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index should be fulfilled if its columns match the leftmost part of a longer compound index #3145

Closed
fabriciojs opened this issue May 13, 2018 · 3 comments

Comments

@fabriciojs
Copy link

fabriciojs commented May 13, 2018

DBAL does not consider KEY(A, B) to fulfill KEY(A), assuming that it should, since the leftmost part matches the whole later index.

Q A
BC Break no
Version 2.6.x

Context

I am working in a project where I stumbled upon a DBAL (2.6) behavior that I couldn't understand why it was built that way (the situation happens at least since 2.5 where duplicated indexes started being identified up to newer versions and master).

In this case we use Mysql and I was trying to remove redundant indexes (indexes with columns that match the leftmost part of other indexes).

I noticed that we had a lot of implicit indexes automatically added for foreign keys by Doctrine ORM that became redundant by custom indexes that started with the same column as the foreign key field. So I tried removing those indexes, which Mysql accepts (since we have another index that can be used for the foreign key lookups just the same).

After doing so I noticed that doctrine:schema:validate no longer validated my database as in sync with my mapping. Doctrine auto auto-schema mapping is trying to add again the indexes for the foreign keys, even though there were compound indexes that work just the same (all mapped as well).

Since we wanted to help RDMS performance by removing redundant indexes, now the auto schema tools (doctrine:migration:diff, doctrine:schema:update, doctrine:schema:validate) cannot be properly used since they try every time to add dozens of indexes we know we don't want or need.

How to Reproduce

I am gonna provide and link here a PR with a failing test.

PR: #3146

Investigation

So I narrowed the issue to this check (added back in 2010):

https://github.com/doctrine/dbal/blob/master/lib/Doctrine/DBAL/Schema/Index.php#L209-L213

Background

That appears to be there since the beginning of duplicate/overruling indexes: bc9c8c2

Work has been done since then on best identifying duplicated/redundant indexes (custom or implict), but never touched the compound indexes issue I noticed now.

Specifically for FK naming (not our case here, but touches the indexing as well): doctrine/orm#3753

The only report of this same concern I found was the one here: #1195 (comment)

Justification

Longer indexes (KEY(A, B)) make shorter (KEY(A)), leftmost equivalent indexes, redundant. Having many indexes can hurt performance in large tables with a high writing rate, and keeping redundant indexes certainly does not help on that.

The check found in the investigation prevents longer indexes to fulfill others, even though they should. Although the reason stated in the comment 8 years ago seemed like a special case treatment and not a performance-feature blocker.

So redundant indexes should be avoided and that should be true across platforms:

Mysql

https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html

MySQL requires indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. In the referencing table, there must be an index where the foreign key columns are listed as the first columns in the same order. Such an index is created on the referencing table automatically if it does not exist. This index might be silently dropped later, if you create another index that can be used to enforce the foreign key constraint. index_name, if given, is used as described previously.

Postgresql

https://www.postgresql.org/docs/current/static/ddl-constraints.html

A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

MariaDB

https://mariadb.com/kb/en/library/foreign-keys/

The columns in the child table must be an index, or the leftmost part of an index. [...]

Oracle

(sorry, I failed to find it clear in the documentation)
https://richardfoote.wordpress.com/2014/04/02/indexing-foreign-keys-helden/

The answer is basically the same as when using an index to police a Primary Key or Unique Key constraint. An index can be used providing the leading columns match those of the constraint (in any order). The index can indeed potentially have additional columns appended (or overloaded) to it.

Conclusion

So given the situation I have written a couple of tests and changed the Index::isFulfilleBy method to incorporate this behavior, while specially treating the original concern that made this feature not available originally (as per the comment made 8 years ago).

I am providing a PR to better illustrate the issue and a hopefully serve as a seed for discussing the approach towards it (if agreed DBAL should fix this behavior).

PR: #3147

That PR should be a work in progress (I assume more tests and ideas should be discussed first).
I was not able to test in all platforms (possibly will have some broken tests) and also I understand it is possible that different databases can have different requirements for indexes becoming redundant as well as foreign key special considerations.
If that turns out to be really important and we are unable to safely confirm general compatibility, we should then look at having some platform specific behavior, so where we know it is true we make it happen and ensure best performance practice all the way along regarding redundant indexes.

@digilist
Copy link
Contributor

Great work, thank you 👍 would love to see this fixed

@morozov
Copy link
Member

morozov commented Sep 7, 2022

Defining proper indexes looks like a job for the database designer, not for the DBAL. Implementing it properly in a portable way is a non-trivial work that requires a ton of time a dedication. The initial assumption that the longer index fulfills the one that it has as a prefix doesn't work for hash indexes while the DBAL is currently unaware of the underlying data structure of indexes.

@morozov morozov closed this as not planned Won't fix, can't repro, duplicate, stale Sep 7, 2022
@github-actions
Copy link

github-actions bot commented Oct 8, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants