ZFP cuda produces 0 decompressed data #105

sheltongeosx · 2020-08-31T20:06:12Z

Dear ZFP developers,

I tested zfp with cuda option to compress and decompress a dataset of around 5G. But its decompressed dataset contains all 0. Here is the commands I used:

zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 150 5850 1601
zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 150 5850 1601

output.decomp contain all 0. It produced same result both on IBM Power8 (P100 GPU) and Dell X86_64 node (V100 GPU). However it will run correctly without "-x cuda" option (which means running on CPU)! Here are my environments:

compiler: gcc/7.3.0
CUDA:  10.1 
zfp version: 0.5.5

Is there anything missing in my case?
Thanks in advance!

Best
Shelton Ma

The text was updated successfully, but these errors were encountered:

lindstro · 2020-08-31T21:14:09Z

Dear Shelton, This is a rather large data set. The uncompressed data is 5.2 GB and the compressed data is another 2.6 GB. zfp should definitely report an error if there is not enough GPU memory, but I think there may be sections of the CUDA implementation that assume that the uncompressed data can be addressed using only 32 bits (4 GB) and that can cause silent errors. We will revisit the CUDA implementation in October to address any such issues. For now, can you try compressing the data in two or more pieces and see if that works? The easiest would be to partition the data along z into slabs that are 800+801 or even 400+400+400+401 elements wide. You can perform such partitioning using the Unix dd command and pipe the output of dd to the input of zfp to avoid temporary files, e.g., dd if=inputdata.dat bs=3510000 count=401 skip=1200 | zfp -i - -z output.comp -r 16 -x cuda -f -3 150 5850 401 would compress the last 401 "layers" of elements. Best, Peter

…

-- Peter Lindstrom . [email protected]<mailto:[email protected]> . http://people.llnl.gov/pl . 925-423-5925

________________________________ From: Shelton Ma <[email protected]> Sent: Monday, August 31, 2020 1:06 PM To: LLNL/zfp <[email protected]> Cc: Subscribed <[email protected]> Subject: [LLNL/zfp] ZFP cuda produces 0 decompressed data (#105) Dear ZFP developers, I tested zfp with cuda option to compress and decompress a dataset of around 5G. But its decompressed dataset contains all 0. Here is the commands I used: zfp -i inputdata.dat -z output.comp -r 16 -x cuda -f -3 150 5850 1601 zfp -z output.comp -o output.decomp -r 16 -x cuda -f -3 150 5850 1601 output.decomp contain all 0. It produced same result both on IBM Power8 (P100 GPU) and Dell X86_64 node (V100 GPU). However it will run correctly without "-x cuda" option (which means running on CPU)! Here are my environments: compiler: gcc/7.3.0 CUDA: 10.1 zfp version: 0.5.5 Is there anything missing in my case? Thanks in advance! Best Shelton Ma — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#105>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAJ6RV3HNVUCH5IUZHMZHEDSDP7EHANCNFSM4QQ4NKLA>.

sheltongeosx · 2020-09-01T20:05:58Z

Dear Peter,

Thank you very much for your suggestions.
I actually splitted the input data set into 4 parts and running with one of the pieces still gives the 0 decompressed volume:

zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 38 5850 1601
zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 38 5850 1601

The compressed and decompressed data sizes now are: 750 M and 1.4G respectively.

Best,
Shelton

lindstro · 2020-09-01T21:16:45Z

Maybe a dumb question, but are you certain that the input data actually has nonzero values?

Note that zfp assumes that the leftmost index varies fastest (aka. Fortran order). To partition the data along x like you've done, you would have had to piece together noncontiguous chunks of data. Partitioning along z (as in the example I gave) would be far easier. And given your choice of partitioning, I suspect that you may have transposed the dimensions (see this discussion). Such accidental transposition can lead to a nearly random sequence of values that is difficult to compress. That shouldn't result in all-zeros, but could still lead to unusually large errors in the reconstructed field.

Before we speculate any further on what's causing this issue, may I suggest that you check out the develop branch and run the CUDA tests just to make sure that the CUDA implementation is working correctly on smaller data:

git clone https://github.com/LLNL/zfp.git
cd zfp
git checkout develop
mkdir build
cd build
cmake .. -DZFP_WITH_CUDA=ON -DBUILD_TESTING=ON
make
ctest

sheltongeosx · 2020-09-02T20:41:40Z

Dear Peter,

Thank you very much for mentioning about the order to specify the data dimensions. The following commands

 zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 1061 5850 38
 zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 1061 5850 38

now produce correct results. As you mentioned earlier, it could not handle my 5G data example.

Best
Shelton

lindstro · 2020-09-02T23:12:56Z

I'm glad to hear this is working, though we need to look into what's causing the failure for the larger data set and why zfp is not reporting an error. I will keep this issue open until we've had time to take a closer look.

lindstro · 2021-02-02T15:21:00Z

@sheltongeosx Sorry for taking so long to get back to you regarding this issue. We're finally at a point where we have time to go over the CUDA implementation to make sure it's bug free.

We fixed a related issue (#121) on the develop branch that might also address the one you reported. Would you mind rerunning your example (on the whole 1061x5850x38 volume) to see if it works now?

GarrettDMorrison · 2023-02-09T19:42:13Z

@sheltongeosx was this fixed for you? I've run some recent tests against our staging branch that seem to show this issue has been solved but it would be good to hear from your end if the issue remains or was indeed solved by the #121 fix.

GarrettDMorrison · 2023-06-14T22:35:58Z

Going to close this for now, feel free to re-open if you are still seeing issues.

lindstro added the bug label Sep 2, 2020

data-panda mentioned this issue Feb 1, 2021

CUDA ZFP produces 0 for arrays beyond a size #121

Closed

tpwrules mentioned this issue Aug 27, 2022

zfp binary CUDA support broken? #178

Open

GarrettDMorrison closed this as completed Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFP cuda produces 0 decompressed data #105

ZFP cuda produces 0 decompressed data #105

sheltongeosx commented Aug 31, 2020

lindstro commented Aug 31, 2020 via email

sheltongeosx commented Sep 1, 2020

lindstro commented Sep 1, 2020

sheltongeosx commented Sep 2, 2020

lindstro commented Sep 2, 2020

lindstro commented Feb 2, 2021

GarrettDMorrison commented Feb 9, 2023

GarrettDMorrison commented Jun 14, 2023

ZFP cuda produces 0 decompressed data #105

ZFP cuda produces 0 decompressed data #105

Comments

sheltongeosx commented Aug 31, 2020

lindstro commented Aug 31, 2020 via email

sheltongeosx commented Sep 1, 2020

lindstro commented Sep 1, 2020

sheltongeosx commented Sep 2, 2020

lindstro commented Sep 2, 2020

lindstro commented Feb 2, 2021

GarrettDMorrison commented Feb 9, 2023

GarrettDMorrison commented Jun 14, 2023