Fix issue with reading 32 bit blobs #995

A-Baji · 2022-02-15T17:42:18Z

No description provided.

dimitri-yatsenko · 2022-02-15T19:07:32Z

datajoint/blob.py

+            dtype = 'uint32' if self.is_32_bit else 'uint64'
+        try:
+            data = np.frombuffer(self._blob, dtype=dtype, count=count, offset=self._pos)
+        except ValueError:


This may not be a reliable mechanisms for determining the decoding. The serialization should be explicit with respect to how the encoding is performed.

@dimitri-yatsenko Yes, we discussed that but the issue is that including the precision into the serialization scheme (e.g. dj064 or mym32) is an update that isn't backward compatible. It would work point forward but wouldn't help us for prior blobs which is precisely where there is likely to be an issue.

The Python implementation is deterministic. The serialization does not depend on whether the platform is 32- or 64-bit.

The problem is that the above solution will not catch all errors. Some cases will not raise the ValueError. We should devise the solution to catch those cases.

The real problem is the mym implementation depended on the platform. We should explicitly catch this case.

We are trying to find an approach with a 'fall-back' mechanism that can capture the vast majority of cases without introducing performance overhead for normal unpacking. This was chosen since one of our customers has an enormous amount of data that would need to be migrated otherwise (~100M records).

As far as I understand, for all numeric arrays, the datatype is fully determined. mym underspecified the integer length and it defaulted to the platform default. That was a problem with the matlab implementation only. Correct? If so, we can have a more targeted fix.

dimitri-yatsenko · 2022-02-17T15:27:00Z

We should not rely on the value error to determine the encoding of array dimensions. This issue is limited to a specific case of old blobs created by a version of mYm that datajoint itself never used. Therefore, please make this flag explicit rather than automatic. When working with these outdated blobs, the user will need to set

dj.blob.use_32bit_dims = True

dimitri-yatsenko · 2022-02-17T18:59:18Z

datajoint/blob.py

@@ -43,6 +43,7 @@
 }

 bypass_serialization = False  # runtime setting to bypass blob (en|de)code
+use_32bit_dims = False # runtime setting to read data as 32-bit


Suggested change

use_32bit_dims = False # runtime setting to read data as 32-bit

# runtime setting to read integers as 32-bit to read blobs created by the 32-bit

# version of the mYm library for MATLAB

use_32bit_dims = False

A-Baji · 2022-02-17T19:29:19Z

Is there a better way of wording the change log that you would prefer? @dimitri-yatsenko

dimitri-yatsenko · 2022-02-17T19:53:42Z

CHANGELOG.md

@@ -1,7 +1,7 @@
 ## Release notes

 ### 0.13.4 -- TBA
-* Bugfid - Fix error when fetching data that was inserted as 32-bit
+* Bugfix - Added a module flag for reading 32-bit data


Suggested change

* Bugfix - Added a module flag for reading 32-bit data

* Bugfix - Allow reading blobs produced by legacy 32-bit compiled mYm library for matlab. PR #995

dimitri-yatsenko · 2022-02-17T19:55:28Z

docs-parts/intro/Releases_lang1.rst

@@ -1,6 +1,6 @@
 0.13.4 -- TBA
 ----------------------
-* Bugfix - Fix error when fetching data that was inserted as 32-bit
+* Bugfix - Added a module flag for reading 32-bit data


Suggested change

* Bugfix - Added a module flag for reading 32-bit data

* Bugfix - Allow reading blobs produced by legacy 32-bit compiled mYm library for matlab. PR #995

guzman-raphael

@A-Baji 👏 Fanstastic job on a tricky one! 👏

A-Baji added 13 commits February 7, 2022 16:50

set up test

8a160ab

update test

7900ef5

read values dynamically based on flag

9cc1a53

[WIP] try except for reading 32 bit values

60e6e8c

try catch for reading 32 bit values

301d890

moved try catch into read_value

34ca201

removed is_32_bit param and renamed table

4435058

white line

3229c70

white line

13121dc

add condition to count param

66c9cd7

update test

a5440d2

update test

5dc99c2

take out count condition and update test

2549abd

A-Baji requested a review from guzman-raphael February 15, 2022 17:42

A-Baji linked an issue Feb 15, 2022 that may be closed by this pull request

Error when fetching 32-bit mym-serialized blob data #996

Closed

A-Baji added 2 commits February 15, 2022 12:50

Merge branch 'master' of https://github.com/datajoint/datajoint-python …

3b1c2e1

…into fix-mym-datajoint#86

update docs and version

61ce712

dimitri-yatsenko reviewed Feb 15, 2022

View reviewed changes

use module flag instead

b2eaf2f

dimitri-yatsenko reviewed Feb 17, 2022

View reviewed changes

update comment and tests

6c8745e

A-Baji requested a review from dimitri-yatsenko February 17, 2022 19:22

update docs

1b7a8df

A-Baji added 3 commits February 17, 2022 13:32

update changelog/docs

5bd63dc

remove whitespace

1288f5e

remove whitespace

e7724e8

dimitri-yatsenko reviewed Feb 17, 2022

View reviewed changes

update docs

4b49864

A-Baji requested a review from dimitri-yatsenko February 17, 2022 20:08

guzman-raphael approved these changes Feb 18, 2022

View reviewed changes

dimitri-yatsenko mentioned this pull request Feb 18, 2022

added 32bitflag for data inserted from old myms datajoint/mym#88

Closed

dimitri-yatsenko approved these changes Feb 18, 2022

View reviewed changes

dimitri-yatsenko merged commit 5364981 into datajoint:master Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue with reading 32 bit blobs #995

Fix issue with reading 32 bit blobs #995

A-Baji commented Feb 15, 2022

dimitri-yatsenko Feb 15, 2022

guzman-raphael Feb 15, 2022

dimitri-yatsenko Feb 15, 2022

dimitri-yatsenko Feb 15, 2022

guzman-raphael Feb 15, 2022 •

edited

Loading

dimitri-yatsenko Feb 15, 2022

dimitri-yatsenko commented Feb 17, 2022

dimitri-yatsenko Feb 17, 2022 •

edited

Loading

A-Baji commented Feb 17, 2022 •

edited

Loading

dimitri-yatsenko Feb 17, 2022 •

edited

Loading

dimitri-yatsenko Feb 17, 2022

guzman-raphael left a comment

	* Bugfix - Added a module flag for reading 32-bit data
	* Bugfix - Allow reading blobs produced by legacy 32-bit compiled mYm library for matlab. PR #995

Fix issue with reading 32 bit blobs #995

Fix issue with reading 32 bit blobs #995

Conversation

A-Baji commented Feb 15, 2022

dimitri-yatsenko Feb 15, 2022

Choose a reason for hiding this comment

guzman-raphael Feb 15, 2022

Choose a reason for hiding this comment

dimitri-yatsenko Feb 15, 2022

Choose a reason for hiding this comment

dimitri-yatsenko Feb 15, 2022

Choose a reason for hiding this comment

guzman-raphael Feb 15, 2022 • edited Loading

Choose a reason for hiding this comment

dimitri-yatsenko Feb 15, 2022

Choose a reason for hiding this comment

dimitri-yatsenko commented Feb 17, 2022

dimitri-yatsenko Feb 17, 2022 • edited Loading

Choose a reason for hiding this comment

A-Baji commented Feb 17, 2022 • edited Loading

dimitri-yatsenko Feb 17, 2022 • edited Loading

Choose a reason for hiding this comment

dimitri-yatsenko Feb 17, 2022

Choose a reason for hiding this comment

guzman-raphael left a comment

Choose a reason for hiding this comment

guzman-raphael Feb 15, 2022 •

edited

Loading

dimitri-yatsenko Feb 17, 2022 •

edited

Loading

A-Baji commented Feb 17, 2022 •

edited

Loading

dimitri-yatsenko Feb 17, 2022 •

edited

Loading