Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary entity identifier resource not reset, fails to populate second collection, PDOStatement #5895

Open
mcurland opened this issue Feb 2, 2023 · 3 comments
Labels

Comments

@mcurland
Copy link

mcurland commented Feb 2, 2023

Bug Report

If an entity has a binary identifier field then the id value is managed as a PHP resource. Of course, a resource is only meaningful if it has a reasonable current position. Specifically, if the value is read more than once then the stream position needs to be restored in between reads.

In my case (Postgres or Oracle over PDO) PDOStatement->execute moves the resource pointer, but this likely fails for other drivers as well. The bottom line is that a resource is not quite an immutable value, so the position needs to be restored between reads to get consistent data.

  1. When the entity is populated the id resource is at position 0.
  2. When the first lazy load collection (likely eager as well, I just haven't tested) the correct id is produced, so the sql log (postgres, but this is not a postgres-specific issue) looks something like $1='\xabcdef'.
  3. When the second lazy load collection is queried the sql log shows $1='\x'. This should be the same as the first value. It obviously never returns data.

Clearly, the resource is being used and not rewound to its original position, so the second use fails. There is absolutely nothing fancy here, I'm just retrieving an entity with two one-to-many lazy load collections. However, similar behavior is seen with a write operation, where the first join succeeds and the second shows a foreign-key violation on the empty binary string data.

I tracked the resource pointer movement to the PDOStatement->execute call in doctrine\dbal\src\Driver\PDO\statement.php. Looking further into the PHP sources (https://github.com/php/php-src/blob/master/ext/pdo/pdo_stmt.c) shows

			case PDO_PARAM_LOB:
				if (Z_TYPE_P(dest) == IS_STRING) {
					php_stream *stream =
						php_stream_memory_open(TEMP_STREAM_READONLY, Z_STR_P(dest));
					zval_ptr_dtor_str(dest);
					php_stream_to_zval(stream, dest);
				}
				break;

Which reads the stream but does not restore it. So, short of changing the PDOStatement behavior in PHP, the resources either need to be copied before the call (during parameter binding) or reset to their prior position (usually 0, but not guaranteed) after the call. There is little point in copying resources since the copy also moves the original pointer, which leads to a solution of simply restoring the position after the execute call.

I'll make a pull request with the suggested changes to PDO\statement.php, but I'm not claiming this is the only driver that experiences this issue. The collaborator team will need to decide if this is a dbal issue or needs to be handled higher in the stack (BasicEntityPersister, UnitOfWork->loadCollection, etc). There are obviously a ton of both read and write scenarios that funnel down into PDOStatement->execute, so fixing it higher up the stack would be significantly more code (and might be unnecessary for statement implementations associated with other drivers, TBD).

I'm currently using the 2.8 bundle installed with Symfony.

Summary

Binary identifier in entity can be used once as a parameter to PDOStatement->execute but fails on all subsequent operations.

Current behaviour

  • The first OneToMany or ManyToMany collection populated after retrieving an entity identified by a binary field succeeds, all subsequent collections fail to populate.
  • Similarly, the first joined write operation that references the entity succeeds, subsequent writes fail.

How to reproduce

  1. Define an entity with a binary identifier column
  2. Define second and third entities that references the first (OneToMany is sufficient, ManyToMany will fail as well).
  3. Add some data to the db (it won't work through Doctrine until this is fixed, so do it manually).
  4. Fetch the first entity and call the two lazy load collections==>The first collection populates, the second does not.

Expected behaviour

This should work the same as any other identifier and allow multiple internal uses.

@derrabus
Copy link
Member

derrabus commented Feb 2, 2023

I'll make a pull request with the suggested changes to PDO\statement.php, but I'm not claiming this is the only driver that experiences this issue.

If you include a functional test that reproduces your problem, we shall see what other drivers are affected.

Anyway, I'm unsure if this is really a bug in the DBAL or if it's the ORM that uses the DBAL in a wrong way. I don't think those resources produced by the DBAL for the binary field are meant to be read more than once.

mcurland added a commit to mcurland/doctrine-dbal that referenced this issue Feb 2, 2023
Entity identifiers that are PHP resource streams need to work for all
references, not just the first one. PDOStatement->execute is reading
the binary resources but not restoring the original offsets. No other
code is restoring these streams to their original position so that
they can be reused.

Examples of failures include empty collections on read (the first lazy
load collection on an entity populates correctly, the second is empty)
and foreign-key violations while persisting changes (the first entity
join produces the correct SQL, the second has no data to read and the
FK is violated with the missing binary data).

Making this change as close as possible to the external code that
moves the stream pointer eliminates the need to do this in calling
code.

This might also be an issue with other driver's Statement
implementations. This request just handles the PDO driver. I have also
not attempted to fix the deprecated bindParam code path. I do not
believe this is called by the current Doctrine code, and is regardless
much harder to patch because of the binding to variables that may or
may not be populated when they are bound.

doctrine#5895
@mcurland
Copy link
Author

mcurland commented Feb 3, 2023

Hi Alexander, thank you for the quick response.

I think it's clear that the ORM is assuming the data passed to the DBAL layer is immutable. Even the parameter names ($value for a resource) indicate the expectation of an immutable value. Apart from initializing a resource (including the rewind when the data is in) the ORM layer doesn't have any handling for individual data types. So, I'm not sure that this is truly a DBAL bug either, but asking for the ORM layer to switch from treating column values as inherently immutable is a huge ask and will produce a major code change.

Basically, by propagating binary data as a resource (instead of, for example, a byte string or byte array) you're in a situation where reading the data modifies it. I don't necessarily disagree with this choice (stream data can be large, so you don't want to copy it arbitrarily into a less stateful structure), but it clearly causes problems. Any time I read it the line before return is a call to rewind.

For my codebase, the DDL, Doctrine classes, and an I/O layer to read client data (change or query) and translate it to the O/RM requests is all generated. There is no custom interaction with the Doctrine entities and the queries and changes are JSON-driven, so the idea that I can't retrieve the population of two collections with one request is really bad (like target another framework bad). In this case, the generated I/O layer can be re-created (through a settings file) to handle UUID values as native/character/binary. The settings are meant to match the expected back end uuid support. All I'm doing is swapping out the type of the identifier. It shouldn't break the code.

I wrote a stripped-down model (pasted below) and generated some basic entity classes for you to work with, but having burned several days on this already I just don't have the bandwidth to spend any more getting the full repository building and testable. Hopefully this helps and will let you generate and run a db to any driver.

<?php
namespace App\Test\BinRef;
use Doctrine\Common\Collections\ArrayCollection; {
	/**
	* @Entity
	* @Table(name="FirstType", uniqueConstraints={@UniqueConstraint(name="FirstType_PK", columns={"firstTypeId"})})
	*/
	class FirstType {
		/**
		* @Column(name="firstTypeId", type="integer", nullable=false)
		* @Id
		* @GeneratedValue
		*/
		protected $firstTypeId;
		/**
		* @Column(name="data", type="string", length=48, nullable=true)
		*/
		protected $data;
		/**
		* @ManyToOne(targetEntity="TargetType", inversedBy="firstTypeCollection")
		* @JoinColumn(name="targetTypeUuid", referencedColumnName="targetTypeUuid")
		*/
		protected $targetType;
		public function getFirstTypeId() {
			return $this->firstTypeId;
		}
		public function setFirstTypeId($value) {
			$this->firstTypeId = $value;
		}
		public function getData() {
			return $this->data;
		}
		public function setData($value) {
			$this->data = $value;
		}
		public function getTargetType() {
			return $this->targetType;
		}
		public function setTargetType($value) {
			$current = &$this->targetType;
			if ($current !== $value) {
				if ($current !== null) {
					$current->_syncFirstTypeCollection($this, true);
				}
				if ($value !== null) {
					$value->_syncFirstTypeCollection($this, false);
				}
			}
			unset($current);
			$this->targetType = $value;
		}
		public function __toString() {
			return spl_object_hash($this);
		}
	}
	/**
	* @Entity
	* @Table(name="SecondType", uniqueConstraints={@UniqueConstraint(name="SecondType_PK", columns={"secondTypeId"})})
	*/
	class SecondType {
		/**
		* @Column(name="secondTypeId", type="integer", nullable=false)
		* @Id
		* @GeneratedValue
		*/
		protected $secondTypeId;
		/**
		* @Column(name="data", type="string", length=48, nullable=true)
		*/
		protected $data;
		/**
		* @ManyToOne(targetEntity="TargetType", inversedBy="secondTypeCollection")
		* @JoinColumn(name="targetTypeUuid", referencedColumnName="targetTypeUuid")
		*/
		protected $targetType;
		public function getSecondTypeId() {
			return $this->secondTypeId;
		}
		public function setSecondTypeId($value) {
			$this->secondTypeId = $value;
		}
		public function getData() {
			return $this->data;
		}
		public function setData($value) {
			$this->data = $value;
		}
		public function getTargetType() {
			return $this->targetType;
		}
		public function setTargetType($value) {
			$current = &$this->targetType;
			if ($current !== $value) {
				if ($current !== null) {
					$current->_syncSecondTypeCollection($this, true);
				}
				if ($value !== null) {
					$value->_syncSecondTypeCollection($this, false);
				}
			}
			unset($current);
			$this->targetType = $value;
		}
		public function __toString() {
			return spl_object_hash($this);
		}
	}
	/**
	* @Entity
	* @Table(name="TargetType", uniqueConstraints={@UniqueConstraint(name="TargetType_PK", columns={"targetTypeUuid"})})
	*/
	class TargetType {
		/**
		* @Column(name="targetTypeUuid", type="binary", length=16, nullable=false)
		* @Id
		*/
		protected $targetTypeUuid;
		/**
		* @Column(name="targetTypeName", type="string", length=48, nullable=true)
		*/
		protected $name;
		/**
		* @OneToMany(targetEntity="FirstType", mappedBy="targetType")
		*/
		protected $firstTypeCollection;
		/**
		* @OneToMany(targetEntity="SecondType", mappedBy="targetType")
		*/
		protected $secondTypeCollection;
		public function getTargetTypeUuid() {
			return $this->targetTypeUuid;
		}
		public function setTargetTypeUuid($value) {
			$this->targetTypeUuid = $value;
		}
		public function getName() {
			return $this->name;
		}
		public function setName($value) {
			$this->name = $value;
		}
		public function addToFirstTypeCollection($value) {
			$value->setTargetType($this);
		}
		public function removeFromFirstTypeCollection($value) {
			$value->setTargetType(null);
		}
		public function getFirstTypeCollection() {
			return $this->ensureFirstTypeCollection()->toArray();
		}
		public function _syncFirstTypeCollection($value, $remove) {
			$this->ensureFirstTypeCollection()->{$remove ? 'removeElement' : 'add'}($value);
		}
		public function addToSecondTypeCollection($value) {
			$value->setTargetType($this);
		}
		public function removeFromSecondTypeCollection($value) {
			$value->setTargetType(null);
		}
		public function getSecondTypeCollection() {
			return $this->ensureSecondTypeCollection()->toArray();
		}
		public function _syncSecondTypeCollection($value, $remove) {
			$this->ensureSecondTypeCollection()->{$remove ? 'removeElement' : 'add'}($value);
		}
		private function ensureFirstTypeCollection() {
			if (null === ($coll = $this->firstTypeCollection)) {
				$this->firstTypeCollection = $coll = new ArrayCollection();
			}
			return $coll;
		}
		private function ensureSecondTypeCollection() {
			if (null === ($coll = $this->secondTypeCollection)) {
				$this->secondTypeCollection = $coll = new ArrayCollection();
			}
			return $coll;
		}
		public function __toString() {
			return spl_object_hash($this);
		}
	}
}
?>

If you don't have a UUID creation function handy this set will do (obviously put in a class, expected string format is like 'af8034e4-c954-40a6-b02f-4b224364c3a2'):

		private static function createUuid() {
			$data = openssl_random_pseudo_bytes(16);
			// Set version to 0100
			$data[6] = chr(ord($data[6]) & 0x0f | 0x40);
			// Set bits 6-7 to 100
			$data[8] = chr(ord($data[8]) & 0x3f | 0x80);
			return self::resourceFromByteString($data);
		}
		private static function uuidResourceFromString($uuidString) {
			return $uuidString ? self::resourceFromByteString(hex2bin(str_replace('-', '', $uuidString))) : null;
		}
		private static function resourceFromByteString($byteString) {
			$res = fopen('php://temp', 'rb+');
			fwrite($res, $byteString);
			rewind($res);
			return $res;
		}
		private static function uuidStringFromResource($uuidResource) {
			if ($uuidResource) {
				$bytes = stream_get_contents($uuidResource);
				rewind($uuidResource);
				return self::uuidStringFromByteString($bytes);
			}
			return null;
		}
		private static function uuidStringFromByteString($bytes) {
			if ($bytes && strlen($bytes) === 16) {
				$hexBytes = bin2hex($bytes);
				return substr($hexBytes, 0, 8) . '-' . substr($hexBytes, 8, 4) . '-' . substr($hexBytes, 12, 4) . '-' . substr($hexBytes, 16, 4) . '-' . substr($hexBytes, 20, 12);
			}
			return null;
		}
  1. Use the doctrine classes to generate a db to any driver.
  2. Populate the db with data (one row in each of the 3 tables will do, with FirstType and SecondType referencing Target). The extra 'data' and 'name' fields are just to make it easier to distinguish items.
  3. Attach an entity manager to the db through a pdo connection.
  4. Get a TargetType entity with $em->getRepository('App\\Test\\BinRef\\TargetType')->findOneBy(['targetTypeUuid' => self::uuidResourceFromString($id)]);
  5. Now call getFirstTypeCollection on the returned entity to trigger the first SQL call (successful) and getSecondTypeCollection to trigger the second (not succesful).

You'll also see failures persisting if you create the objects and try to persist them--you'll get an FK violation for the second serialization ref to the same TargetType entity.

@derrabus derrabus added the Bug label Mar 6, 2023
mcurland added a commit to mcurland/doctrine-dbal that referenced this issue Jul 22, 2023
Entity identifiers that are PHP resource streams need to work for all
references, not just the first one. PDOStatement->execute is reading
the binary resources but not restoring the original offsets. No other
code is restoring these streams to their original position so that
they can be reused.

Examples of failures include empty collections on read (the first lazy
load collection on an entity populates correctly, the second is empty)
and foreign-key violations while persisting changes (the first entity
join produces the correct SQL, the second has no data to read and the
FK is violated with the missing binary data).

Making this change as close as possible to the external code that
moves the stream pointer eliminates the need to do this in calling
code. Resource offsets are retrieved immediately before execute in
case they change between the bindValue and execute calls.

The request was originally for the PDO driver but IBMDB2, Mysql, and
PgSQL drivers are also covers. Other drivers will likely also need
work. No attempt has been made to fix the deprecated bindParam code
path. I do not believe this is called by the current Doctrine code,
and is regardless much harder to patch because the reference variables
can be replaced during execute, so the original resources may no
longer be available to restore after the call.

A functional test was added for bindValue and a resource with a
seekable position.

doctrine#5895
@cbichis
Copy link

cbichis commented Aug 18, 2023

Havent tested yet your commit but I am having a similar issue with the secondary entities.

mcurland added a commit to mcurland/doctrine-dbal that referenced this issue Dec 8, 2023
Entity identifiers that are PHP resource streams need to work for all
references, not just the first one. PDOStatement->execute is reading
the binary resources but not restoring the original offsets. No other
code is restoring these streams to their original position so that
they can be reused.

Examples of failures include empty collections on read (the first lazy
load collection on an entity populates correctly, the second is empty)
and foreign-key violations while persisting changes (the first entity
join produces the correct SQL, the second has no data to read and the
FK is violated with the missing binary data).

Making this change as close as possible to the external code that
moves the stream pointer eliminates the need to do this in calling
code. Resource offsets are retrieved immediately before execute in
case they change between the bindValue and execute calls.

The request was originally for the PDO driver but IBMDB2, Mysql, and
PgSQL drivers are also covered. Other drivers will likely also need
work. No attempt has been made to fix the deprecated bindParam code
path. I do not believe this is called by the current Doctrine code,
and is regardless much harder to patch because the reference variables
can be replaced during execute, so the original resources may no
longer be available to restore after the call.

A functional test was added for bindValue and a resource with a
seekable position.

doctrine#5895
mcurland added a commit to mcurland/doctrine-dbal that referenced this issue Aug 16, 2024
Entity identifiers that are PHP resource streams need to work for all
references, not just the first one. PDOStatement->execute is reading
the binary resources but not restoring the original offsets. No other
code is restoring these streams to their original position so that
they can be reused.

Examples of failures include empty collections on read (the first lazy
load collection on an entity populates correctly, the second is empty)
and foreign-key violations while persisting changes (the first entity
join produces the correct SQL, the second has no data to read and the
FK is violated with the missing binary data).

Making this change as close as possible to the external code that
moves the stream pointer eliminates the need to do this in calling
code. Resource offsets are retrieved immediately before execute in
case they change between the bindValue and execute calls.

The request was originally for the PDO driver but IBMDB2, Mysql, and
PgSQL drivers are also covered. Other drivers will likely also need
work. No attempt has been made to fix the deprecated bindParam code
path. I do not believe this is called by the current Doctrine code,
and is regardless much harder to patch because the reference variables
can be replaced during execute, so the original resources may no
longer be available to restore after the call.

A functional test was added for bindValue and a resource with a
seekable position.

doctrine#5895
derrabus pushed a commit to mcurland/doctrine-dbal that referenced this issue Nov 24, 2024
Entity identifiers that are PHP resource streams need to work for all
references, not just the first one. PDOStatement->execute is reading
the binary resources but not restoring the original offsets. No other
code is restoring these streams to their original position so that
they can be reused.

Examples of failures include empty collections on read (the first lazy
load collection on an entity populates correctly, the second is empty)
and foreign-key violations while persisting changes (the first entity
join produces the correct SQL, the second has no data to read and the
FK is violated with the missing binary data).

Making this change as close as possible to the external code that
moves the stream pointer eliminates the need to do this in calling
code. Resource offsets are retrieved immediately before execute in
case they change between the bindValue and execute calls.

The request was originally for the PDO driver but IBMDB2, Mysql, and
PgSQL drivers are also covered. Other drivers will likely also need
work. No attempt has been made to fix the deprecated bindParam code
path. I do not believe this is called by the current Doctrine code,
and is regardless much harder to patch because the reference variables
can be replaced during execute, so the original resources may no
longer be available to restore after the call.

A functional test was added for bindValue and a resource with a
seekable position.

doctrine#5895
derrabus pushed a commit to mcurland/doctrine-dbal that referenced this issue Nov 24, 2024
Entity identifiers that are PHP resource streams need to work for all
references, not just the first one. PDOStatement->execute is reading
the binary resources but not restoring the original offsets. No other
code is restoring these streams to their original position so that
they can be reused.

Examples of failures include empty collections on read (the first lazy
load collection on an entity populates correctly, the second is empty)
and foreign-key violations while persisting changes (the first entity
join produces the correct SQL, the second has no data to read and the
FK is violated with the missing binary data).

Making this change as close as possible to the external code that
moves the stream pointer eliminates the need to do this in calling
code. Resource offsets are retrieved immediately before execute in
case they change between the bindValue and execute calls.

The request was originally for the PDO driver but IBMDB2, Mysql, and
PgSQL drivers are also covered. Other drivers will likely also need
work. No attempt has been made to fix the deprecated bindParam code
path. I do not believe this is called by the current Doctrine code,
and is regardless much harder to patch because the reference variables
can be replaced during execute, so the original resources may no
longer be available to restore after the call.

A functional test was added for bindValue and a resource with a
seekable position.

doctrine#5895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants