-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with reading 32 bit blobs #995
Conversation
datajoint/blob.py
Outdated
dtype = 'uint32' if self.is_32_bit else 'uint64' | ||
try: | ||
data = np.frombuffer(self._blob, dtype=dtype, count=count, offset=self._pos) | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be a reliable mechanisms for determining the decoding. The serialization should be explicit with respect to how the encoding is performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri-yatsenko Yes, we discussed that but the issue is that including the precision into the serialization scheme (e.g. dj064
or mym32
) is an update that isn't backward compatible. It would work point forward but wouldn't help us for prior blobs which is precisely where there is likely to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python implementation is deterministic. The serialization does not depend on whether the platform is 32- or 64-bit.
The problem is that the above solution will not catch all errors. Some cases will not raise the ValueError
. We should devise the solution to catch those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real problem is the mym
implementation depended on the platform. We should explicitly catch this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to find an approach with a 'fall-back' mechanism that can capture the vast majority of cases without introducing performance overhead for normal unpacking. This was chosen since one of our customers has an enormous amount of data that would need to be migrated otherwise (~100M records).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, for all numeric arrays, the datatype is fully determined. mym
underspecified the integer length and it defaulted to the platform default. That was a problem with the matlab implementation only. Correct? If so, we can have a more targeted fix.
We should not rely on the value error to determine the encoding of array dimensions. This issue is limited to a specific case of old blobs created by a version of mYm that datajoint itself never used. Therefore, please make this flag explicit rather than automatic. When working with these outdated blobs, the user will need to set dj.blob.use_32bit_dims = True |
datajoint/blob.py
Outdated
@@ -43,6 +43,7 @@ | |||
} | |||
|
|||
bypass_serialization = False # runtime setting to bypass blob (en|de)code | |||
use_32bit_dims = False # runtime setting to read data as 32-bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_32bit_dims = False # runtime setting to read data as 32-bit | |
# runtime setting to read integers as 32-bit to read blobs created by the 32-bit | |
# version of the mYm library for MATLAB | |
use_32bit_dims = False |
Is there a better way of wording the change log that you would prefer? @dimitri-yatsenko |
CHANGELOG.md
Outdated
@@ -1,7 +1,7 @@ | |||
## Release notes | |||
|
|||
### 0.13.4 -- TBA | |||
* Bugfid - Fix error when fetching data that was inserted as 32-bit | |||
* Bugfix - Added a module flag for reading 32-bit data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Bugfix - Added a module flag for reading 32-bit data | |
* Bugfix - Allow reading blobs produced by legacy 32-bit compiled mYm library for matlab. PR #995 |
docs-parts/intro/Releases_lang1.rst
Outdated
@@ -1,6 +1,6 @@ | |||
0.13.4 -- TBA | |||
---------------------- | |||
* Bugfix - Fix error when fetching data that was inserted as 32-bit | |||
* Bugfix - Added a module flag for reading 32-bit data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Bugfix - Added a module flag for reading 32-bit data | |
* Bugfix - Allow reading blobs produced by legacy 32-bit compiled mYm library for matlab. PR #995 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@A-Baji 👏 Fanstastic job on a tricky one! 👏
No description provided.