-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDC-2332: [UnitOfWork::doPersist()] The spl_object_hash() generate not unique hash! #3037
Comments
Comment created by @Ocramius: Please provide either a code example or a test case. As it stands, this issue is incomplete |
Comment created by @beberlei: Are you calling EntityManager#clear() inbetween? Because PHP reuses the hashes. The ORM accounts for this. |
Comment created by fchris82: Hi! http://diginin.hu/spl*object_hash*bug.zip
Chris |
Comment created by @beberlei: This is not a reproduce case, i don't want to execute your whole project. I want to know, what is the actual bug that you see? Can you just print a list of all the hashes? Because the hashes dont differ at the end, bu tjust somewhere in the middle. |
Comment created by fchris82: I attached a hashlogs.txt file. The last Task class hash is 0000000050ab4aba0000000058e1cb12 ( line 3 129 ) This is not unique, view the line 2 760 . The Task is not being saved and the program don't call the prePersist listener. The "UnitOfWork" believe the entity has been saved because the |
Comment created by fchris82: The |
Comment created by @Ocramius: One possible issue here is that a listener registers an entity as managed while a proxy is being loaded. The given data is still insufficient to actually verify the problem. |
I can confirm having this issue. My case is working on quite big collection of entities, and code (simplified) looks like this:
Class A definition:
Class B definition
While trying to persist B instance
I can also confirm that calling |
As I already stated above, this issue needs a reproducible test case. |
the thing is it looks like |
@Ocramius I believe I have a reproducible test case here ~~> https://github.com/thebuccaneersden/doctrine-bug |
@thebuccaneersden no, that is not a test case, that is an application that I'd have to set up and run, and that's not going to happen, sorry :-\ Please refer to https://github.com/doctrine/doctrine2/tree/1c2b7c968561f2e319117d7860c88e8c0cad3c7a/tests/Doctrine/Tests/ORM/Functional/Ticket for a list of existing functional test cases. |
I just had the same issue: the Calling |
This test triggers the problem consistently for me. namespace Doctrine\Tests\ORM\Functional;
use Doctrine\Tests\Models\CMS\CmsArticle;
use Doctrine\Tests\Models\CMS\CmsUser;
use Doctrine\Tests\OrmFunctionalTestCase;
class HashCollisionTest extends OrmFunctionalTestCase
{
protected function setUp()
{
$this->useModelSet('cms');
parent::setUp();
}
public function testHashCollision() {
$user = new CmsUser();
$user->username = "Test";
$user->name = "Test";
$this->_em->persist($user);
$this->_em->flush();
$articles = [];
for($i=0;$i<100;$i++) {
$article = new CmsArticle();
$article->topic = "Test";
$article->text = "Test";
$article->setAuthor($this->_em->merge($user));
$this->_em->persist($article);
$this->_em->flush();
$this->_em->clear();
$articles [] = $article;
}
$user = $this->_em->merge($user);
foreach($articles as $article) {
$article = $this->_em->merge($article);
$article->setAuthor($user);
}
unset($article);
gc_collect_cycles();
$keep = [];
for($x=0;$x<1000;$x++) {
$keep[] = $article = new CmsArticle();
$article->topic = "Test";
$article->text = "Test";
$article->setAuthor($this->_em->merge($user));
$this->_em->persist($article);
$this->_em->flush();
$this->assertNotNull($article->id, "Article wasn't persisted on iteration $x");
}
}
} The problem is in the loop that merges the articles. Every article->user is its own seperate instance of a proxy object. The spl_object_hash of these objects get added to entityStates/entityIdentifiers in UnitOfWork->registerManaged(). However, no reference to the object is kept here. RegisterManaged calls UnitOfWork->addToIdentityMap(). The identityMap does keep a reference to the object, but only for the first object with the same id. This means that for most of the CmsUser Proxy objects, no reference is kept from within doctrine, so if I break the last reference in my code by calling $article->setAuthor($user), the proxy object may be garbage collected, but UnitOfWork->entityStates and UnitOfWork->entityIdentifiers still contain the spl_object_hash of the now non existent object. If I now try to create a new CmsArticle, php may use the same memory location, generating the same object hash. In this case that causes doctrine to think it's already managing the entity, and not persisting it. |
We're also reproducing the issue. From our point of view, it appears to be because entities are kept into $identityMap using a strategy that is different than others maps ($entityIdentifiers, $originalEntityData, etc. which are oid based). When an entity is created using the same ids as a previous one, the new entity is tracked by maps oid-based, but it's NOT tracked by the $identityMap (because the check is based on keys, not on oid) . If the previous one is garbage collected (this is eventually happening anytime), the $identityMap will be confused when it comes to find the associated entity, resulting into a problematic skip. We're about to provide a fix, if someone can validate the way we're understanding the issue, this could save us time before writing code. We're about to create something like an IdentityMapManager that will track ALL entities: something like that in addToIdentityMap(): $this->identityMap[$className][$idHash][$oid] = $entity; |
@sandvige We worked around it by implementing an EntityMangerDecorator that keeps a reference to every entity that passes through it; only releasing it when clear() is called. I'd guess that ends up being functionally the same as your suggestion. Keeping a reference to all entities in this way is far from ideal but it's the best we could do. An anonimized version of our implementation is: class OurEntityManager extends EntityManagerDecorator {
/** @var array */
private $references = [];
public function __construct(EntityManagerInterface $wrapped) {
parent::__construct($wrapped);
}
public function clear($objectName = null) {
if($objectName === null) {
$this->references = [];
}
parent::clear($objectName);
}
public function detach($object){
unset($this->references[spl_object_hash($object)]);
parent::detach($object);
}
public function remove($object) {
$this->references[spl_object_hash($object)] = $object;
parent::remove($object);
}
public function persist($object) {
$this->references[spl_object_hash($object)] = $object;
parent::persist($object);
}
public function merge($object) {
$newObject = parent::merge($object);
$this->references[spl_object_hash($newObject)] = $newObject;
foreach($this->getClassProperties($newObject) as $property) {
$value = $property->getValue($newObject);
if(is_object($value) && !($value instanceof \DateTime)) {
$this->references[spl_object_hash($value)] = $value;
}
}
return $newObject;
}
private $classPropertiesCache = [];
private function getClassProperties($entity){
$class = get_class($entity);
unset($this->classPropertiesCache[$class]);
if(!isset($this->classPropertiesCache[$class])) {
$arr = [];
$this->__getClassproperties(new \ReflectionClass($entity), $arr);
$this->classPropertiesCache[$class] = $arr;
}
return $this->classPropertiesCache[$class];
}
private function __getClassproperties(\ReflectionClass $reflectionClass, &$results) {
$metaData = null;
try {
$metaData = $this->wrapped->getMetadataFactory()->getMetadataFor($reflectionClass->getName());
}catch(MappingException $mex) {
// Skip when a subclass is not a valid entity
}
if($metaData !== null) {
foreach($reflectionClass->getProperties() as $property) {
$propertyName = $property->getName();
if(isset($metaData->associationMappings[$propertyName])) {
if(isset($metaData->associationMappings[$propertyName])) {
$property->setAccessible(true);
$results[] = $property;
}
}
}
}
if($reflectionClass->getParentClass() !== false) {
$this->__getClassProperties($reflectionClass->getParentClass(), $results);
}
return $results;
}
} |
If I understand the problem correctly, we are getting an entity garbage collected while we still have a reference to its OID: does the problem present itself also on |
@petervf Thanks for the code! Indeed, this is also a valid approach. @Ocramius As long as we're only keeping one entity here: https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/UnitOfWork.php#L1355 By the way, the function returns false when this is happening, but every piece of code using this method is giving approximately 0 sh*ts :D |
@Ocramius I added code using the provided unit tests (this is failing without a fix), and I also added a dumb fix showing unit tests going green. I will work on it in the next day, I just wanted to bump this issue as I'll provide a way to handle the issue by keeping an array that maps entities with oids (the missing part I was explaining a few comments above) |
Yes, it's a fresh object
Nope. I mean I could clear it in
All IDs are auto increments, yes. There is no clearing of the em, the only cleanup between tests is to delete/detach everything that is not needed anymore. |
No, I don't think so. The testsuite DB (SQLite in memory) gets setup once in the suite's bootstrap code, and after, there are no structural changes, all DB operations are done through |
At the time this exeception is raised, two object instances of the same class exist that compete for the same At that time, where does the second object come from? How has it been created, where did it get its ID? And, can you find where the first object was created and where it comes from? Set a breakpoint with a condition at the place where the objects are added to the identity map and watch out for that ID – so you should find the origin of the first object as well. How come both are using the same identifier values? |
It took a while, but I figured it out now: It turns out there was one test that constructs and entity from an XML string. The XML comes from a hardcoded test asset and it contains a value of So basically it is the same situation as in the testcase in #843. You can argue that this is a user error, but (I was meaning to write that in #843 as well), I have to say that |
@mpdude I had originally responded that I did not get an exception, but I was mistaken... (messed something up in my original run...) Your code does indeed throw an error message for me. |
@mpdude diving a bit deeper my problem seems to start with a class I have that extends another class. In this case I have a "Role" class and then an "ApiRole" class that extends the "Role" class. Your error says I am adding an entity of the class "App\Entity\Role" with an ID of 1, but I think I am trying add an ApiRole, not a Role... Now the new exception complains because my ApiRole has an ID of 1, which would collide with my Role that has an ID of 1 (different tables in the database...) However, most interesting to me is, that it was an ApiRole that was colliding with a completely different Object later in the process... My original ApiRole was just this: If I remove the "extends" and copy all the code from the Roles Object the exception and collisions disappear. I was trying to avoid having duplicate code for two objects that essentially are the same but I wanted to store in different places. I'm not very good with how to use extends, so I'll look at that, maybe I was doing something dumb... I think this proves that your addition is working and providing good feedback to the users 😁 |
@Matt-PMCT could you share the code for both classes? I only need to see the “head” where |
ApiRole:
Role:
|
Thanks! Checking with https://www.doctrine-project.org/projects/doctrine-orm/en/2.15/reference/inheritance-mapping.html#entity-inheritance, you’re using entity inheritance without declaring it. That means the ORM will assume instances of both classes share the ID space and no two instances use the same id. But, in fact, both get their ID from different table auto-increments and so collisions may occur. I am afraid the best we could do is to find out why the ORM did not reject your configuration in the first place, and I’ll take a look at that. |
@Matt-PMCT could you please confirm that the exception added in #10463 would have spotted your entity configuration as invalid? It will be an exception in 3.0, and is a deprecation warning starting in 2.15. |
@Matt-PMCT try using a mapped superclass to hold the commonalities of both your |
To summarize, we have
as actionable items to follow up on, and no other reproducible examples, leads or hints for this issue here. |
As one takeaway from doctrine#3037 (comment) and doctrine#843, we should look into better explaining the `EntityManager::getReference()` method, it’s semantics, caveats and potential responsibilities placed on the user. This PR tries to do that, so it fixes doctrine#10797.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity. I don't think it makes sense to overwrite an entity in the identity map, since that may leave `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` in an inconsistent state. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
* Explain `EntityManager::getReference()` peculiarities As one takeaway from #3037 (comment) and #843, we should look into better explaining the `EntityManager::getReference()` method, it’s semantics, caveats and potential responsibilities placed on the user. This PR tries to do that, so it fixes #10797. * Update docs/en/reference/advanced-configuration.rst Co-authored-by: Grégoire Paris <[email protected]> --------- Co-authored-by: Grégoire Paris <[email protected]>
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
…ity map entry While trying to understand doctrine#3037, I found that it may happen that we have more entries in `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` than in `\Doctrine\ORM\UnitOfWork::$identityMap`. The former is a mapping from `spl_object_id` values to ID hashes, the latter an array first of entity class names and then from ID hash to entity object instances. (Basically, "ID hash" is a concatenation of all field values making up the `@Id` for a given entity.) This means that at some point, we must have _different_ objects representing the same entity, or at least over time different objects are used for the same entity without the UoW properly updating its `::$entityIdentifiers` structure. I don't think it makes sense to overwrite an entity in the identity map, since that means a currently `MANAGED` entity is replaced with something else. If it makes sense at all to _replace_ an entity, that should happen through dedicated management methods to first detach the old entity before persisting, merging or otherwise adding the new one. This way we could make sure the internal structures remain consistent.
With
being solved/merged/closed, I don't see what else we could do here to improve the situation. My suggestion is to close this issue. |
Jira issue originally created by user fchris82:
I created fixtures and some data was inserted many times without calling the
Task
entity PrePersist event listener.I printed the used and generated hash and I saw a
Proxies\*_CG_*\Asitly\ProjectManagementBundle\Entity\User
hash equal aTask
entity hash!The text was updated successfully, but these errors were encountered: