How to add custom compression / decompression? #1784

dotphoton-ziad · 2022-07-15T13:20:58Z

Hi all! I have all my data stored in an S3 bucket and I would like to use Hub to load my data from S3, yet my data on the cloud is compressed using Jetraw by Dotphoton and I would like it to decompress the images as I am pulling them from the cloud. I would be ready to write code to make this happen, but as I am new to the code base I would like to know where this would fit in the most and where I should start. Thank you all in advance!

davidbuniat · 2022-07-16T22:18:31Z

Hey @ziadomalik would love to accept the contribution to support your compression method. Most of our compression code is written here https://github.com/activeloopai/Hub/blob/main/hub/core/compression.py.

Feel free to join our slack communtiy at https://slack.activeloop.ai #develop channel to discuss in more details how to complete the contribution.

Looking forward to it.

FayazRahman · 2022-07-17T03:38:13Z

Hey @ziadomalik. The list of supported compressions can be found in hub/compression.py. Make sure to add your new format there. The decompression code can be found at hub/core/compression.py. Import the required libraries and write your decompression function (something like _decompress_jetraw). Also see the decompress_array function in hub/core/compression.py - that's where your function will be called. Let me know if you need more help.

h20200051 · 2022-09-12T09:50:09Z

@ziadomalik I would like to work.Kindly assign me

mikayelh · 2022-09-12T11:00:55Z

hey @h20200051 , We can only assign one issue per contributor, which one would you like to take on?

Hussain0520 · 2022-09-15T19:29:24Z

Hey @mikayelh @davidbuniat is this issue still open? I would like to try and contribute to this issue.

dotphoton-ziad · 2022-09-15T20:35:30Z

Hi all, so I've been assigned to work on other projects, so in this case this issue is on hold (for now). Yet it's still something we are actively discussing, and if it's appropriate, we could close this issue and reopen it once it becomes relevant again. I still need to the green light from my Project Manager. I hope you understand and thank you for your patience.

mikayelh · 2022-09-15T20:48:46Z

@ziadomalik if you'd like, we can assign this issue to @Hussain0520 to work on it in the meantime, but if you want to be the one who writes this particular code, I'm not against putting this on hold. Maybe the best solution could be allowing someone else to take a stab and then improving on their contribution later on?

dotphoton-ziad · 2022-09-15T20:54:43Z

@mikayelh Sounds like a plan! As a starting point, you guys could check out our Python documentation here and learn about the technology itself here. We also have a C++ API. Whenever you have questions, code review requests or anything I could help with, feel free to ping me!
cc: @Hussain0520

mikayelh · 2022-09-15T20:58:26Z

That's awesome!

hey @Hussain0520! I've just assigned you this issue - feel free to check in with us and @ziadomalik in case you need any help! Thanks for following up, @ziadomalik :)

dotphoton-ziad · 2022-09-16T07:02:07Z

Hi, so I spoke with my project manager. Normally, we wanted to postpone this all the way to January because internally, we are still experimenting with the cloud and how our compression fits best into that context. If you guys would like to discuss, we could hop on a call so we could figure out the best way we can integrate Jetraw into the Activeloop Hub.
cc: @mikayelh

Hussain0520 · 2022-09-16T10:40:55Z

Thank you @mikayelh @ziadomalik . I'll surely contact you guys for help.

St3V0Bay · 2022-11-22T08:41:46Z

Please allow me to sneak in here... I was looking for a way how to compress my 1 million nifti stacks. On one hand "nifti" is not yet supported by deeplake (but dicom is). On the other hand, I was looking for a way to use the general dtype and add a custom compression on top. I was even more surprised to see someone from dotphoton here (@ziadomalik) You guys are on my list for more than a year. The stars seem to align :-)

istranic · 2022-11-22T22:59:16Z

Hi @St3V0Bay. Thx for following up on this thread! Adding custom compression is quite tricky, because even if it's implemented in Deep Lake OSS, it won't work in our visualizer or the optimized C++ dataloader, because they are not in the OSS repo.

We're also happy to add support for your nifti data directly. Are you working with dicom files that are combined into nifti stacks? If you're able to provide us with example data, we can implement support for it across our stacks.

Regarding dotphoton, are you using any of their compressions currently, or this is something you're excited about for future work?

St3V0Bay · 2022-11-25T15:00:40Z

Hi @istranic,
thanks for the swift response. I see - so custom compressions are a bit tricky to handle.

Re nifti: In the medical imaging domain most open-source data is offered as nifti (bioimaging has their own preference however). That's the best format for data scientists to get started. However, DICOM is the true standard that is actually used in the clinic (you have that already integrated, which is great). To pool DICOM data with open-sourced nifti files, the dicom files are converted (e.g. https://github.com/rordenlab/dcm2niix). The other way (from nifti > dicom) is a lot more complicated.

Exemplary nifti files can be pulled using this repo (https://github.com/neheller/kits19). After installation it is just a one liner. You can look at the data using ITKSnap (for example; http://www.itksnap.org/pmwiki/pmwiki.php) and it can be opened in Python with the PyLib called nibabel (https://nipy.org/nibabel/). Another huge nifti repository is here: http://medicaldecathlon.com/

Re dotphoton: we are not using it. But their value proposition is really charming, which is: less costs for storage, faster data transfer. In projects with a certain size this really starts to matter, because things add up quickly if you have literally millions of data points.

istranic · 2022-11-26T14:26:05Z

Thanks for the info @St3V0Bay. We'll keep you in the loop regarding our decision making around nifti support.

davidbuniat assigned farizrahman4u Jul 15, 2022

mikayelh assigned Hussain0520 Sep 15, 2022

Hussain0520 removed their assignment Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to add custom compression / decompression? #1784

How to add custom compression / decompression? #1784

dotphoton-ziad commented Jul 15, 2022

davidbuniat commented Jul 16, 2022

FayazRahman commented Jul 17, 2022

h20200051 commented Sep 12, 2022

mikayelh commented Sep 12, 2022

Hussain0520 commented Sep 15, 2022

dotphoton-ziad commented Sep 15, 2022

mikayelh commented Sep 15, 2022 •

edited

Loading

dotphoton-ziad commented Sep 15, 2022

mikayelh commented Sep 15, 2022

dotphoton-ziad commented Sep 16, 2022

Hussain0520 commented Sep 16, 2022

St3V0Bay commented Nov 22, 2022 •

edited

Loading

istranic commented Nov 22, 2022

St3V0Bay commented Nov 25, 2022

istranic commented Nov 26, 2022

How to add custom compression / decompression? #1784

How to add custom compression / decompression? #1784

Comments

dotphoton-ziad commented Jul 15, 2022

davidbuniat commented Jul 16, 2022

FayazRahman commented Jul 17, 2022

h20200051 commented Sep 12, 2022

mikayelh commented Sep 12, 2022

Hussain0520 commented Sep 15, 2022

dotphoton-ziad commented Sep 15, 2022

mikayelh commented Sep 15, 2022 • edited Loading

dotphoton-ziad commented Sep 15, 2022

mikayelh commented Sep 15, 2022

dotphoton-ziad commented Sep 16, 2022

Hussain0520 commented Sep 16, 2022

St3V0Bay commented Nov 22, 2022 • edited Loading

istranic commented Nov 22, 2022

St3V0Bay commented Nov 25, 2022

istranic commented Nov 26, 2022

mikayelh commented Sep 15, 2022 •

edited

Loading

St3V0Bay commented Nov 22, 2022 •

edited

Loading