Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a safeguard against multiple objects competing for the same identity map entry #10785

Merged

Conversation

mpdude
Copy link
Contributor

@mpdude mpdude commented Jun 22, 2023

While trying to understand #3037, I found that it may happen that we have more entries in \Doctrine\ORM\UnitOfWork::$entityIdentifiers than in \Doctrine\ORM\UnitOfWork::$identityMap.

The former is a mapping from spl_object_id values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances.

(Basically, "ID hash" is a concatenation of all field values making up the @Id for a given entity.)

This means that at some point, we must have different objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its ::$entityIdentifiers structure.

I don't think it makes sense to overwrite an entity in the identity map, since that means a currently MANAGED entity is replaced with something else.

If it makes sense at all to replace an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.

@mpdude mpdude changed the base branch from 2.15.x to 2.16.x June 22, 2023 09:48
@mpdude mpdude changed the title guard duplicate identity map entries Add a safeguard against multiple objects competing for the same identity map entry Jun 22, 2023
@mpdude mpdude force-pushed the guard-duplicate-identity-map-entries branch from c4e2e84 to 1221add Compare June 22, 2023 10:00
@@ -56,7 +56,7 @@ public function testIssueProxyClear(): void

$user2 = $this->_em->getReference(DDC1238User::class, $userId);

$user->__load();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beberlei Do you still remember what you intended to do here? Why should it be possible or allowed to load an entity through a reference when the entity manager has been clear()ed in the meantime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this tests that the proxy generation has a shortcut for getId methods that returns the value passed to getReference(.., $id);

I am not sure this needs the calls to __load anymore, but really don't know anymore.

Copy link
Contributor Author

@mpdude mpdude Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it should be possible to initialize a proxy after the EM it was obtained from has been cleared?

I don’t think so, the EM has forgotten about that proxy instance.

@mpdude mpdude force-pushed the guard-duplicate-identity-map-entries branch 2 times, most recently from 72bba01 to 8cd6a40 Compare June 22, 2023 10:12
@mpdude mpdude marked this pull request as draft June 22, 2023 19:13
@flack
Copy link
Contributor

flack commented Jun 23, 2023

just a quick note: I tried running this PR against some of my code, and I can hit the exception by roughly doing the following:

  1. create/persist entity $a and detach it
  2. create/persist entity $b which has an association to entity $a and detach it
  3. call $em->find() with entity $a's ID
  4. this returns a Proxy, I call __load on the proxy (and then detach it)
  5. I call $em->merge with entity $b ==> Exception is thrown

If I skip step 4 (specifically, the __load call), then everything works.

I know this is not really a great analysis, but maybe it'll give someone an idea what goes wrong here. I find this proxy stuff really hard to debug, but it looks like it's causing this issue

@mpdude
Copy link
Contributor Author

mpdude commented Jun 23, 2023

@flack thank you for the observation!

Regarding 2) – what kind of association does B have to A, and does it cascade-persist A as well?

@flack
Copy link
Contributor

flack commented Jun 23, 2023

@mpdude it looks like this in ClassMetadata's associationMappings for B:

"linkToA" => array:19 [
      "fieldName" => "linkToA"
      "targetEntity" => "a_entity"
      "joinColumns" => array:1 [
        0 => array:2 [
          "name" => "linkToA"
          "referencedColumnName" => "id"
        ]
      ]
      "type" => 2
      "mappedBy" => null
      "inversedBy" => null
      "isOwningSide" => true
      "sourceEntity" => "b_entity"
      "fetch" => 2
      "cascade" => []
      "isCascadeRemove" => false
      "isCascadePersist" => false
      "isCascadeRefresh" => false
      "isCascadeMerge" => false
      "isCascadeDetach" => false
      "sourceToTargetKeyColumns" => array:1 [
        "linkToA" => "id"
      ]
      "joinColumnFieldNames" => array:1 [
        "linkToA" => "linkToA"
      ]
      "targetToSourceKeyColumns" => array:1 [
        "id" => "linkToA"
      ]
      "orphanRemoval" => false
    ]

mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jun 23, 2023
…ing merge()

This PR tries to improve the situation/problem explained in doctrine#3037:

Under certain conditions – there may be multiple and not all are known/well-understood – we may get inconsistencies between the `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` and `\Doctrine\ORM\UnitOfWork::$identityMap` arrays.

Since the `::$identityMap` is a plain array holding object references, objects contained in it cannot be garbage-collected.
`::$entityIdentifiers`, however, is indexed by `spl_object_id` values. When those objects are destructed and/or garbage-collected, the OID may be reused and reassigned to other objects later on.

When the OID re-assignment happens to be for another entity, the UoW may assume incorrect entity states and, for example, miss INSERT or UPDATE operations.

One cause for such inconsistencies is _replacing_ identity map entries with other object instances: This makes it possible that the old object becomes GC'd, while its OID is not cleaned up. Since that is not a use case we need to support (IMHO), doctrine#10785 is about adding a safeguard against it.

In this test shown here, the `merge()` operation is currently too eager in creating a proxy object for another referred-to entity. This proxy represents an entity already present in the identity map at that time, potentially leading to this problem later on.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jun 23, 2023
…ing merge()

This PR tries to improve the situation/problem explained in doctrine#3037:

Under certain conditions – there may be multiple and not all are known/well-understood – we may get inconsistencies between the `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` and `\Doctrine\ORM\UnitOfWork::$identityMap` arrays.

Since the `::$identityMap` is a plain array holding object references, objects contained in it cannot be garbage-collected.
`::$entityIdentifiers`, however, is indexed by `spl_object_id` values. When those objects are destructed and/or garbage-collected, the OID may be reused and reassigned to other objects later on.

When the OID re-assignment happens to be for another entity, the UoW may assume incorrect entity states and, for example, miss INSERT or UPDATE operations.

One cause for such inconsistencies is _replacing_ identity map entries with other object instances: This makes it possible that the old object becomes GC'd, while its OID is not cleaned up. Since that is not a use case we need to support (IMHO), doctrine#10785 is about adding a safeguard against it.

In this test shown here, the `merge()` operation is currently too eager in creating a proxy object for another referred-to entity. This proxy represents an entity already present in the identity map at that time, potentially leading to this problem later on.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jun 23, 2023
…ing merge()

This PR tries to improve the situation/problem explained in doctrine#3037:

Under certain conditions – there may be multiple and not all are known/well-understood – we may get inconsistencies between the `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` and `\Doctrine\ORM\UnitOfWork::$identityMap` arrays.

Since the `::$identityMap` is a plain array holding object references, objects contained in it cannot be garbage-collected.
`::$entityIdentifiers`, however, is indexed by `spl_object_id` values. When those objects are destructed and/or garbage-collected, the OID may be reused and reassigned to other objects later on.

When the OID re-assignment happens to be for another entity, the UoW may assume incorrect entity states and, for example, miss INSERT or UPDATE operations.

One cause for such inconsistencies is _replacing_ identity map entries with other object instances: This makes it possible that the old object becomes GC'd, while its OID is not cleaned up. Since that is not a use case we need to support (IMHO), doctrine#10785 is about adding a safeguard against it.

In this test shown here, the `merge()` operation is currently too eager in creating a proxy object for another referred-to entity. This proxy represents an entity already present in the identity map at that time, potentially leading to this problem later on.
@mpdude mpdude force-pushed the guard-duplicate-identity-map-entries branch 3 times, most recently from c453b74 to 9f40d74 Compare June 23, 2023 22:21
@mpdude mpdude marked this pull request as ready for review June 24, 2023 07:56
@mpdude mpdude marked this pull request as draft June 24, 2023 07:58
@mpdude mpdude force-pushed the guard-duplicate-identity-map-entries branch from 9f40d74 to 36240b6 Compare June 26, 2023 07:10
@mpdude mpdude marked this pull request as ready for review June 26, 2023 07:11
@mpdude mpdude force-pushed the guard-duplicate-identity-map-entries branch 4 times, most recently from 2b4d1a8 to 0d56f09 Compare June 27, 2023 21:11
@mpdude mpdude force-pushed the guard-duplicate-identity-map-entries branch from 0d56f09 to 0c8cb84 Compare June 28, 2023 07:53
@mpdude
Copy link
Contributor Author

mpdude commented Aug 2, 2023

Haven’t run the code, but my guess is:

The EM now notices the ID collision already when you pass the second entity to persist(), since the ID is already in the identity map. Previously, it would have replaced the identity map entry (thereby corrupting other internal data structures) and only figure out during flush() that the ID has already been taken.

@derrabus Maybe it would make sense to use something more specific than RuntimeException for this safeguard. Do you have any thoughts of what the exception hierarchy should look like? I don’t think it makes sense to have a common base class with UniqueConstraintViolationException though..

@derrabus
Copy link
Member

derrabus commented Aug 2, 2023

This is indeed a behavior change that might be unexpected for existing apps that trusted that the database would reject a colliding ID. We might need to add a feature flag for that. 😕

@mpdude
Copy link
Contributor Author

mpdude commented Aug 2, 2023

But that means with the flag disabled, we risk the corruption again, right? Because we accept to have two different object instances representing the same thing.

@flack
Copy link
Contributor

flack commented Aug 2, 2023

The database still will reject a colliding ID, it's just that now in most cases, you don't even get that far, because the ORM layer rejects it first. So now your code would need to catch both exceptions.

Ideally, this duplication of functionality should be avoided, and as the DB backend is kind of the final source of truth the check should ideally be there, but then, the identyMap logic would need to be different to still be able to avoid corruption

gbirke added a commit to wmde/fundraising-payments that referenced this pull request Aug 3, 2023
This is a temporary workaround for the changes made in
Doctrine ORM 2.16, discussed in
doctrine/orm#10785

When that discussion is resolved, we should remove the check for
`RuntimeException`, because it might be thrown by other errors that have
nothing to do with duplicate IDs, which would lead to our
`PaymentOverrideException` hinting at the wrong thing.

Also remove the PHPStan rule that had an exception for this check.
When the Exception comes back, the method should have an annotation that
it's thrown. Otherwise we'll have to put the PHPStan rule back or try
out phpstan-doctrine where this has been solved:
phpstan/phpstan-doctrine#315
@akadlec
Copy link

akadlec commented Aug 3, 2023

That was my point, to change RuntimeException to something more specific to be able to catch it and handle it.

@derrabus
Copy link
Member

derrabus commented Aug 3, 2023

But that means with the flag disabled, we risk the corruption again, right? Because we accept to have two different object instances representing the same thing.

Yes, but apps could live with this kind of corruption previously. If the flush resulted in a UniqueConstraintViolationException, the developer knows that the entity graph needs to be trashed, which is acceptable in a short-lived web process.

I believe, we should do both:

  • feature-flag this detection and
  • introduce a more specific exception.

@mpdude
Copy link
Contributor Author

mpdude commented Aug 3, 2023

Feature flag – that needs to go into a 2.17 release?

@derrabus
Copy link
Member

derrabus commented Aug 3, 2023

No, let's treat this as a bugfix.

@flack
Copy link
Contributor

flack commented Aug 3, 2023

Yes, but apps could live with this kind of corruption previously. If the flush resulted in a UniqueConstraintViolationException, the developer knows that the entity graph needs to be trashed, which is acceptable in a short-lived web process.

Out of curiosity: Is there a design document somewhere that outlines the rationale for the entity graph/identity map? Because if the face of concurrent requests it seems to me that the graph should always be treated as a slightly out-of-date snapshot of the actual DB.

mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 3, 2023
In doctrine#10785, a check was added that prevents entity instances from getting into the identity map when another object for the same ID is already being tracked.

This caused regressions for users that work with application-provided IDs and expect this condition to fail with `UniqueConstraintViolationExceptions` when flushing to the database.

Thus, this PR turns the exception into a deprecation notice. Users can opt-in to the new behavior. In 3.0, the exception will be used.

Implements doctrine#10871.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 4, 2023
In doctrine#10785, a check was added that prevents entity instances from getting into the identity map when another object for the same ID is already being tracked.

This caused regressions for users that work with application-provided IDs and expect this condition to fail with `UniqueConstraintViolationExceptions` when flushing to the database.

Thus, this PR turns the exception into a deprecation notice. Users can opt-in to the new behavior. In 3.0, the exception will be used.

Implements doctrine#10871.
derrabus pushed a commit that referenced this pull request Aug 4, 2023
…0878)

In #10785, a check was added that prevents entity instances from getting into the identity map when another object for the same ID is already being tracked.

This caused regressions for users that work with application-provided IDs and expect this condition to fail with `UniqueConstraintViolationExceptions` when flushing to the database.

Thus, this PR turns the exception into a deprecation notice. Users can opt-in to the new behavior. In 3.0, the exception will be used.

Implements #10871.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 4, 2023
This adds a dedicated exception for the case that objects with colliding identities are to be put into the identity map.

Implements doctrine#10872.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 4, 2023
This adds a dedicated exception for the case that objects with colliding identities are to be put into the identity map.

Implements doctrine#10872.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 6, 2023
…ies loaded through fetch=EAGER + using inheritance

 doctrine#10880 reports a case where the changes from doctrine#10785 cause entity updates to be missed.

Upon closer inspection, this change seems to be causing it:

https://github.com/doctrine/orm/pull/10785/files#diff-55a900494fc8033ab498c53929716caf0aa39d6bdd7058e7d256787a24412ee4L2990-L3003

The code was changed to use `registerManaged()` instead, which basically does the same things, but (since doctrine#10785) also includes an additional check against duplicate entity instances.

But, one detail slipped through tests and reviews: `registerManaged()` also updates `\Doctrine\ORM\UnitOfWork::$originalEntityData`, which is used to compute entity changesets. An empty array `[]` was passed for $data here.

This will make the changeset computation assume that a partial object was loaded and effectively ignore all field updates here:

https://github.com/doctrine/orm/blob/a616914887ea160db4158d2c67752e99624f7c8a/lib/Doctrine/ORM/UnitOfWork.php#L762-L764

I think that, effectively, it is sufficient to call `registerManaged()` only in the two cases where a proxy was created.

Calling `registerManaged()` with `[]` as data for a proxy object is consistent with e. g. `\Doctrine\ORM\EntityManager::getReference()`.

In the case that a full entity has to be loaded, we need not call `registerManaged()` at all, since that will already happen inside `EntityManager::find()` (or, more specifically, `UnitOfWork::createEntity()` called inside it).

Note that the test case has to make some provisions so that we actually reach this case:
* Load an entity that uses `fetch="EAGER"` on a to-one association
* That association being against a class that uses inheritance (why's that?)
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 6, 2023
…ies loaded through fetch=EAGER + using inheritance

 doctrine#10880 reports a case where the changes from doctrine#10785 cause entity updates to be missed.

Upon closer inspection, this change seems to be causing it:

https://github.com/doctrine/orm/pull/10785/files#diff-55a900494fc8033ab498c53929716caf0aa39d6bdd7058e7d256787a24412ee4L2990-L3003

The code was changed to use `registerManaged()` instead, which basically does the same things, but (since doctrine#10785) also includes an additional check against duplicate entity instances.

But, one detail slipped through tests and reviews: `registerManaged()` also updates `\Doctrine\ORM\UnitOfWork::$originalEntityData`, which is used to compute entity changesets. An empty array `[]` was passed for $data here.

This will make the changeset computation assume that a partial object was loaded and effectively ignore all field updates here:

https://github.com/doctrine/orm/blob/a616914887ea160db4158d2c67752e99624f7c8a/lib/Doctrine/ORM/UnitOfWork.php#L762-L764

I think that, effectively, it is sufficient to call `registerManaged()` only in the two cases where a proxy was created.

Calling `registerManaged()` with `[]` as data for a proxy object is consistent with e. g. `\Doctrine\ORM\EntityManager::getReference()`.

In the case that a full entity has to be loaded, we need not call `registerManaged()` at all, since that will already happen inside `EntityManager::find()` (or, more specifically, `UnitOfWork::createEntity()` called inside it).

Note that the test case has to make some provisions so that we actually reach this case:
* Load an entity that uses `fetch="EAGER"` on a to-one association
* That association being against a class that uses inheritance (why's that?)
nicolas-grekas pushed a commit to nicolas-grekas/doctrine-orm that referenced this pull request Aug 7, 2023
…ctrine#10878)

In doctrine#10785, a check was added that prevents entity instances from getting into the identity map when another object for the same ID is already being tracked.

This caused regressions for users that work with application-provided IDs and expect this condition to fail with `UniqueConstraintViolationExceptions` when flushing to the database.

Thus, this PR turns the exception into a deprecation notice. Users can opt-in to the new behavior. In 3.0, the exception will be used.

Implements doctrine#10871.
derrabus pushed a commit to mpdude/doctrine2 that referenced this pull request Aug 9, 2023
…ies loaded through fetch=EAGER + using inheritance

 doctrine#10880 reports a case where the changes from doctrine#10785 cause entity updates to be missed.

Upon closer inspection, this change seems to be causing it:

https://github.com/doctrine/orm/pull/10785/files#diff-55a900494fc8033ab498c53929716caf0aa39d6bdd7058e7d256787a24412ee4L2990-L3003

The code was changed to use `registerManaged()` instead, which basically does the same things, but (since doctrine#10785) also includes an additional check against duplicate entity instances.

But, one detail slipped through tests and reviews: `registerManaged()` also updates `\Doctrine\ORM\UnitOfWork::$originalEntityData`, which is used to compute entity changesets. An empty array `[]` was passed for $data here.

This will make the changeset computation assume that a partial object was loaded and effectively ignore all field updates here:

https://github.com/doctrine/orm/blob/a616914887ea160db4158d2c67752e99624f7c8a/lib/Doctrine/ORM/UnitOfWork.php#L762-L764

I think that, effectively, it is sufficient to call `registerManaged()` only in the two cases where a proxy was created.

Calling `registerManaged()` with `[]` as data for a proxy object is consistent with e. g. `\Doctrine\ORM\EntityManager::getReference()`.

In the case that a full entity has to be loaded, we need not call `registerManaged()` at all, since that will already happen inside `EntityManager::find()` (or, more specifically, `UnitOfWork::createEntity()` called inside it).

Note that the test case has to make some provisions so that we actually reach this case:
* Load an entity that uses `fetch="EAGER"` on a to-one association
* That association being against a class that uses inheritance (why's that?)
derrabus pushed a commit that referenced this pull request Aug 9, 2023
…GER + using inheritance (#10884)

#10880 reports a case where the changes from #10785 cause entity updates to be missed.

Upon closer inspection, this change seems to be causing it:

https://github.com/doctrine/orm/pull/10785/files#diff-55a900494fc8033ab498c53929716caf0aa39d6bdd7058e7d256787a24412ee4L2990-L3003

The code was changed to use `registerManaged()` instead, which basically does the same things, but (since #10785) also includes an additional check against duplicate entity instances.

But, one detail slipped through tests and reviews: `registerManaged()` also updates `\Doctrine\ORM\UnitOfWork::$originalEntityData`, which is used to compute entity changesets. An empty array `[]` was passed for $data here.

This will make the changeset computation assume that a partial object was loaded and effectively ignore all field updates here:

https://github.com/doctrine/orm/blob/a616914887ea160db4158d2c67752e99624f7c8a/lib/Doctrine/ORM/UnitOfWork.php#L762-L764

I think that, effectively, it is sufficient to call `registerManaged()` only in the two cases where a proxy was created.

Calling `registerManaged()` with `[]` as data for a proxy object is consistent with e. g. `\Doctrine\ORM\EntityManager::getReference()`.

In the case that a full entity has to be loaded, we need not call `registerManaged()` at all, since that will already happen inside `EntityManager::find()` (or, more specifically, `UnitOfWork::createEntity()` called inside it).

Note that the test case has to make some provisions so that we actually reach this case:
* Load an entity that uses `fetch="EAGER"` on a to-one association
* That association being against a class that uses inheritance (why's that?)
derrabus pushed a commit that referenced this pull request Aug 9, 2023
This adds a dedicated exception for the case that objects with colliding identities are to be put into the identity map.

Implements #10872.
@ps-mattstuart
Copy link

I don't mean to drag up an old issue but we battled this for a while.

While processing thousands of the same entity types we'd have issues with regenerating the same ID. I'm not remembering perfectly and I'll go back to try and resurrect the exact issue flow that would cause this. What was happening was the UOW was either not finding the entity because it had a persistent record of the OID having been seen and the newly asked for entity generated the same OID. I believe we're releasing the entity objects which based on the PHP definition of the function can lead to a re-use of the ID and we were running into this.

From php.net:

This function returns a unique identifier for the object. The object id is unique for the lifetime of the object. Once the object is destroyed, its id may be reused for other objects. This behavior is similar to spl_object_hash().

Does this address that case? We're currently testing the latest version to see if it resolves all of the bugs that we have internally resolved but looking at the commit for this ticket it doesn't appear to cover the issue we were seeing. Given that the OID could conflict naturally (there's no guarantee of unique once objects are getting destroyed) throwing an exception I'm unsure was the right idea here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants