-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary entity identifier resource not reset, fails to populate second collection, PDOStatement #5895
Comments
If you include a functional test that reproduces your problem, we shall see what other drivers are affected. Anyway, I'm unsure if this is really a bug in the DBAL or if it's the ORM that uses the DBAL in a wrong way. I don't think those resources produced by the DBAL for the binary field are meant to be read more than once. |
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. This might also be an issue with other driver's Statement implementations. This request just handles the PDO driver. I have also not attempted to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because of the binding to variables that may or may not be populated when they are bound. doctrine#5895
Hi Alexander, thank you for the quick response. I think it's clear that the ORM is assuming the data passed to the DBAL layer is immutable. Even the parameter names ($value for a resource) indicate the expectation of an immutable value. Apart from initializing a resource (including the rewind when the data is in) the ORM layer doesn't have any handling for individual data types. So, I'm not sure that this is truly a DBAL bug either, but asking for the ORM layer to switch from treating column values as inherently immutable is a huge ask and will produce a major code change. Basically, by propagating binary data as a resource (instead of, for example, a byte string or byte array) you're in a situation where reading the data modifies it. I don't necessarily disagree with this choice (stream data can be large, so you don't want to copy it arbitrarily into a less stateful structure), but it clearly causes problems. Any time I read it the line before return is a call to rewind. For my codebase, the DDL, Doctrine classes, and an I/O layer to read client data (change or query) and translate it to the O/RM requests is all generated. There is no custom interaction with the Doctrine entities and the queries and changes are JSON-driven, so the idea that I can't retrieve the population of two collections with one request is really bad (like target another framework bad). In this case, the generated I/O layer can be re-created (through a settings file) to handle UUID values as native/character/binary. The settings are meant to match the expected back end uuid support. All I'm doing is swapping out the type of the identifier. It shouldn't break the code. I wrote a stripped-down model (pasted below) and generated some basic entity classes for you to work with, but having burned several days on this already I just don't have the bandwidth to spend any more getting the full repository building and testable. Hopefully this helps and will let you generate and run a db to any driver. <?php
namespace App\Test\BinRef;
use Doctrine\Common\Collections\ArrayCollection; {
/**
* @Entity
* @Table(name="FirstType", uniqueConstraints={@UniqueConstraint(name="FirstType_PK", columns={"firstTypeId"})})
*/
class FirstType {
/**
* @Column(name="firstTypeId", type="integer", nullable=false)
* @Id
* @GeneratedValue
*/
protected $firstTypeId;
/**
* @Column(name="data", type="string", length=48, nullable=true)
*/
protected $data;
/**
* @ManyToOne(targetEntity="TargetType", inversedBy="firstTypeCollection")
* @JoinColumn(name="targetTypeUuid", referencedColumnName="targetTypeUuid")
*/
protected $targetType;
public function getFirstTypeId() {
return $this->firstTypeId;
}
public function setFirstTypeId($value) {
$this->firstTypeId = $value;
}
public function getData() {
return $this->data;
}
public function setData($value) {
$this->data = $value;
}
public function getTargetType() {
return $this->targetType;
}
public function setTargetType($value) {
$current = &$this->targetType;
if ($current !== $value) {
if ($current !== null) {
$current->_syncFirstTypeCollection($this, true);
}
if ($value !== null) {
$value->_syncFirstTypeCollection($this, false);
}
}
unset($current);
$this->targetType = $value;
}
public function __toString() {
return spl_object_hash($this);
}
}
/**
* @Entity
* @Table(name="SecondType", uniqueConstraints={@UniqueConstraint(name="SecondType_PK", columns={"secondTypeId"})})
*/
class SecondType {
/**
* @Column(name="secondTypeId", type="integer", nullable=false)
* @Id
* @GeneratedValue
*/
protected $secondTypeId;
/**
* @Column(name="data", type="string", length=48, nullable=true)
*/
protected $data;
/**
* @ManyToOne(targetEntity="TargetType", inversedBy="secondTypeCollection")
* @JoinColumn(name="targetTypeUuid", referencedColumnName="targetTypeUuid")
*/
protected $targetType;
public function getSecondTypeId() {
return $this->secondTypeId;
}
public function setSecondTypeId($value) {
$this->secondTypeId = $value;
}
public function getData() {
return $this->data;
}
public function setData($value) {
$this->data = $value;
}
public function getTargetType() {
return $this->targetType;
}
public function setTargetType($value) {
$current = &$this->targetType;
if ($current !== $value) {
if ($current !== null) {
$current->_syncSecondTypeCollection($this, true);
}
if ($value !== null) {
$value->_syncSecondTypeCollection($this, false);
}
}
unset($current);
$this->targetType = $value;
}
public function __toString() {
return spl_object_hash($this);
}
}
/**
* @Entity
* @Table(name="TargetType", uniqueConstraints={@UniqueConstraint(name="TargetType_PK", columns={"targetTypeUuid"})})
*/
class TargetType {
/**
* @Column(name="targetTypeUuid", type="binary", length=16, nullable=false)
* @Id
*/
protected $targetTypeUuid;
/**
* @Column(name="targetTypeName", type="string", length=48, nullable=true)
*/
protected $name;
/**
* @OneToMany(targetEntity="FirstType", mappedBy="targetType")
*/
protected $firstTypeCollection;
/**
* @OneToMany(targetEntity="SecondType", mappedBy="targetType")
*/
protected $secondTypeCollection;
public function getTargetTypeUuid() {
return $this->targetTypeUuid;
}
public function setTargetTypeUuid($value) {
$this->targetTypeUuid = $value;
}
public function getName() {
return $this->name;
}
public function setName($value) {
$this->name = $value;
}
public function addToFirstTypeCollection($value) {
$value->setTargetType($this);
}
public function removeFromFirstTypeCollection($value) {
$value->setTargetType(null);
}
public function getFirstTypeCollection() {
return $this->ensureFirstTypeCollection()->toArray();
}
public function _syncFirstTypeCollection($value, $remove) {
$this->ensureFirstTypeCollection()->{$remove ? 'removeElement' : 'add'}($value);
}
public function addToSecondTypeCollection($value) {
$value->setTargetType($this);
}
public function removeFromSecondTypeCollection($value) {
$value->setTargetType(null);
}
public function getSecondTypeCollection() {
return $this->ensureSecondTypeCollection()->toArray();
}
public function _syncSecondTypeCollection($value, $remove) {
$this->ensureSecondTypeCollection()->{$remove ? 'removeElement' : 'add'}($value);
}
private function ensureFirstTypeCollection() {
if (null === ($coll = $this->firstTypeCollection)) {
$this->firstTypeCollection = $coll = new ArrayCollection();
}
return $coll;
}
private function ensureSecondTypeCollection() {
if (null === ($coll = $this->secondTypeCollection)) {
$this->secondTypeCollection = $coll = new ArrayCollection();
}
return $coll;
}
public function __toString() {
return spl_object_hash($this);
}
}
}
?> If you don't have a UUID creation function handy this set will do (obviously put in a class, expected string format is like 'af8034e4-c954-40a6-b02f-4b224364c3a2'): private static function createUuid() {
$data = openssl_random_pseudo_bytes(16);
// Set version to 0100
$data[6] = chr(ord($data[6]) & 0x0f | 0x40);
// Set bits 6-7 to 100
$data[8] = chr(ord($data[8]) & 0x3f | 0x80);
return self::resourceFromByteString($data);
}
private static function uuidResourceFromString($uuidString) {
return $uuidString ? self::resourceFromByteString(hex2bin(str_replace('-', '', $uuidString))) : null;
}
private static function resourceFromByteString($byteString) {
$res = fopen('php://temp', 'rb+');
fwrite($res, $byteString);
rewind($res);
return $res;
}
private static function uuidStringFromResource($uuidResource) {
if ($uuidResource) {
$bytes = stream_get_contents($uuidResource);
rewind($uuidResource);
return self::uuidStringFromByteString($bytes);
}
return null;
}
private static function uuidStringFromByteString($bytes) {
if ($bytes && strlen($bytes) === 16) {
$hexBytes = bin2hex($bytes);
return substr($hexBytes, 0, 8) . '-' . substr($hexBytes, 8, 4) . '-' . substr($hexBytes, 12, 4) . '-' . substr($hexBytes, 16, 4) . '-' . substr($hexBytes, 20, 12);
}
return null;
}
You'll also see failures persisting if you create the objects and try to persist them--you'll get an FK violation for the second serialization ref to the same TargetType entity. |
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. Resource offsets are retrieved immediately before execute in case they change between the bindValue and execute calls. The request was originally for the PDO driver but IBMDB2, Mysql, and PgSQL drivers are also covers. Other drivers will likely also need work. No attempt has been made to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because the reference variables can be replaced during execute, so the original resources may no longer be available to restore after the call. A functional test was added for bindValue and a resource with a seekable position. doctrine#5895
Havent tested yet your commit but I am having a similar issue with the secondary entities. |
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. Resource offsets are retrieved immediately before execute in case they change between the bindValue and execute calls. The request was originally for the PDO driver but IBMDB2, Mysql, and PgSQL drivers are also covered. Other drivers will likely also need work. No attempt has been made to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because the reference variables can be replaced during execute, so the original resources may no longer be available to restore after the call. A functional test was added for bindValue and a resource with a seekable position. doctrine#5895
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. Resource offsets are retrieved immediately before execute in case they change between the bindValue and execute calls. The request was originally for the PDO driver but IBMDB2, Mysql, and PgSQL drivers are also covered. Other drivers will likely also need work. No attempt has been made to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because the reference variables can be replaced during execute, so the original resources may no longer be available to restore after the call. A functional test was added for bindValue and a resource with a seekable position. doctrine#5895
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. Resource offsets are retrieved immediately before execute in case they change between the bindValue and execute calls. The request was originally for the PDO driver but IBMDB2, Mysql, and PgSQL drivers are also covered. Other drivers will likely also need work. No attempt has been made to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because the reference variables can be replaced during execute, so the original resources may no longer be available to restore after the call. A functional test was added for bindValue and a resource with a seekable position. doctrine#5895
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. Resource offsets are retrieved immediately before execute in case they change between the bindValue and execute calls. The request was originally for the PDO driver but IBMDB2, Mysql, and PgSQL drivers are also covered. Other drivers will likely also need work. No attempt has been made to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because the reference variables can be replaced during execute, so the original resources may no longer be available to restore after the call. A functional test was added for bindValue and a resource with a seekable position. doctrine#5895
Bug Report
If an entity has a binary identifier field then the id value is managed as a PHP resource. Of course, a resource is only meaningful if it has a reasonable current position. Specifically, if the value is read more than once then the stream position needs to be restored in between reads.
In my case (Postgres or Oracle over PDO) PDOStatement->execute moves the resource pointer, but this likely fails for other drivers as well. The bottom line is that a resource is not quite an immutable value, so the position needs to be restored between reads to get consistent data.
Clearly, the resource is being used and not rewound to its original position, so the second use fails. There is absolutely nothing fancy here, I'm just retrieving an entity with two one-to-many lazy load collections. However, similar behavior is seen with a write operation, where the first join succeeds and the second shows a foreign-key violation on the empty binary string data.
I tracked the resource pointer movement to the PDOStatement->execute call in doctrine\dbal\src\Driver\PDO\statement.php. Looking further into the PHP sources (https://github.com/php/php-src/blob/master/ext/pdo/pdo_stmt.c) shows
Which reads the stream but does not restore it. So, short of changing the PDOStatement behavior in PHP, the resources either need to be copied before the call (during parameter binding) or reset to their prior position (usually 0, but not guaranteed) after the call. There is little point in copying resources since the copy also moves the original pointer, which leads to a solution of simply restoring the position after the execute call.
I'll make a pull request with the suggested changes to PDO\statement.php, but I'm not claiming this is the only driver that experiences this issue. The collaborator team will need to decide if this is a dbal issue or needs to be handled higher in the stack (BasicEntityPersister, UnitOfWork->loadCollection, etc). There are obviously a ton of both read and write scenarios that funnel down into PDOStatement->execute, so fixing it higher up the stack would be significantly more code (and might be unnecessary for statement implementations associated with other drivers, TBD).
I'm currently using the 2.8 bundle installed with Symfony.
Summary
Binary identifier in entity can be used once as a parameter to PDOStatement->execute but fails on all subsequent operations.
Current behaviour
How to reproduce
Expected behaviour
This should work the same as any other identifier and allow multiple internal uses.
The text was updated successfully, but these errors were encountered: