Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFP cuda produces 0 decompressed data #105

Closed
sheltongeosx opened this issue Aug 31, 2020 · 8 comments
Closed

ZFP cuda produces 0 decompressed data #105

sheltongeosx opened this issue Aug 31, 2020 · 8 comments
Labels

Comments

@sheltongeosx
Copy link

Dear ZFP developers,

I tested zfp with cuda option to compress and decompress a dataset of around 5G. But its decompressed dataset contains all 0. Here is the commands I used:

zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 150 5850 1601
zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 150 5850 1601

output.decomp contain all 0. It produced same result both on IBM Power8 (P100 GPU) and Dell X86_64 node (V100 GPU). However it will run correctly without "-x cuda" option (which means running on CPU)! Here are my environments:

compiler: gcc/7.3.0
CUDA:  10.1 
zfp version: 0.5.5

Is there anything missing in my case?
Thanks in advance!

Best
Shelton Ma

@lindstro
Copy link
Member

lindstro commented Aug 31, 2020 via email

@sheltongeosx
Copy link
Author

Dear Peter,

Thank you very much for your suggestions.
I actually splitted the input data set into 4 parts and running with one of the pieces still gives the 0 decompressed volume:

zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 38 5850 1601
zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 38 5850 1601

The compressed and decompressed data sizes now are: 750 M and 1.4G respectively.

Best,
Shelton

@lindstro
Copy link
Member

lindstro commented Sep 1, 2020

Maybe a dumb question, but are you certain that the input data actually has nonzero values?

Note that zfp assumes that the leftmost index varies fastest (aka. Fortran order). To partition the data along x like you've done, you would have had to piece together noncontiguous chunks of data. Partitioning along z (as in the example I gave) would be far easier. And given your choice of partitioning, I suspect that you may have transposed the dimensions (see this discussion). Such accidental transposition can lead to a nearly random sequence of values that is difficult to compress. That shouldn't result in all-zeros, but could still lead to unusually large errors in the reconstructed field.

Before we speculate any further on what's causing this issue, may I suggest that you check out the develop branch and run the CUDA tests just to make sure that the CUDA implementation is working correctly on smaller data:

git clone https://github.com/LLNL/zfp.git
cd zfp
git checkout develop
mkdir build
cd build
cmake .. -DZFP_WITH_CUDA=ON -DBUILD_TESTING=ON
make
ctest

@sheltongeosx
Copy link
Author

Dear Peter,

Thank you very much for mentioning about the order to specify the data dimensions. The following commands

 zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 1061 5850 38
 zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 1061 5850 38

now produce correct results. As you mentioned earlier, it could not handle my 5G data example.

Best
Shelton

@lindstro
Copy link
Member

lindstro commented Sep 2, 2020

I'm glad to hear this is working, though we need to look into what's causing the failure for the larger data set and why zfp is not reporting an error. I will keep this issue open until we've had time to take a closer look.

@lindstro
Copy link
Member

lindstro commented Feb 2, 2021

@sheltongeosx Sorry for taking so long to get back to you regarding this issue. We're finally at a point where we have time to go over the CUDA implementation to make sure it's bug free.

We fixed a related issue (#121) on the develop branch that might also address the one you reported. Would you mind rerunning your example (on the whole 1061x5850x38 volume) to see if it works now?

@GarrettDMorrison
Copy link
Member

@sheltongeosx was this fixed for you? I've run some recent tests against our staging branch that seem to show this issue has been solved but it would be good to hear from your end if the issue remains or was indeed solved by the #121 fix.

@GarrettDMorrison
Copy link
Member

Going to close this for now, feel free to re-open if you are still seeing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants