Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats has incomplete output under certain circumstances #3870

Closed
lwellerastro opened this issue May 14, 2020 · 3 comments
Closed

Stats has incomplete output under certain circumstances #3870

lwellerastro opened this issue May 14, 2020 · 3 comments
Assignees
Labels
bug Something isn't working Products Issues which are impacting the products group

Comments

@lwellerastro
Copy link
Contributor

ISIS version(s) affected: 3.10.2 and prior

Description
The program stats does not fill all available fields in an output flat or pvl file format if it can not calculate basic DN statistics (min, max, mean, etc.) for an image. This is typically because the input image is 100% special pixels.

This is a problem when multiple (say many, many thousands) of images are read in and output is saved either by way of batchlist or concatenating individual output csv files in particular. It is impossible to work with csv output as input to a database table when there are 20+ columns expected for each row but some rows only end up with half that.

Additionally, the header row in the output csv file is affected by this problem. When running in batch mode and appending to an existing output flatfile, if the first image stats works on has valid DNs, then there are 20+ column names. If the first image in the list has no valid DN's then there are only headings for the 10 or so things it does report.

How to reproduce
Data under /work/users/lweller/Isis3Tests/Stats/

There are two images to demonstrate the problem:
N1459928352_1.cal.cub has valid DNs
N1733673166_7.cal.cub contains only special pixel values

Run either individually selecting format=csv and pvl or run then in batch mode as in the examples below. The order of the images in the list is swapped between list1.lis and list2.lis so you can see the problem with the headers in the output flatfile. The order has no effect for the output pvl so there is only one run for that.

stats from=$1 to=Stats_list1.csv format=flat append=true -batchlist=list1.lis
stats from=$1 to=Stats_list2.csv format=flat append=true -batchlist=list2.lis

stats from=$1 to=Stats_list1.pvl format=pvl append=true -batchlist=list1.lis

Possible Solution
If a statistical value can not be calculated for an image, there needs to be a place holder in output csv (especially) and pvl files. Maybe "N/A".

Additional context
Clearly there are problems if images are all NULL, etc. - that's what I need the statistics for. I processing many thousands of images and I need to find problems after every processing step and this is about the only way I can think of for so much data. Having this information in a database is essential for what I am doing. I'm not sure I can even work around this problem.

@lwellerastro lwellerastro added the Products Issues which are impacting the products group label May 14, 2020
@lwellerastro
Copy link
Contributor Author

Note: The program caminfo will also provide DN statistics but not the same exact ones as stats (in keyword reporting, not values). I have never understood why this is the case. When caminfo stats comes across an image that does not have valid pixels it reports something like this:

Object = Statistics
MeanValue = -1.79769313486231e+308
StandardDeviation = -1.79769313486231e+308
MinimumValue = -1.79769313486231e+308
MaximumValue = -1.79769313486231e+308
PercentHIS = 0.0
PercentHRS = 0.0
PercentLIS = 0.0
PercentLRS = 0.0
PercentNull = 100.0
TotalPixels = 262144.0

I honestly don't know what the flat/csv output might look like since I don't use it. I'm not loving the values reported for Mean, Min, Max, etc., but it's better than not reporting any information.

Just as a comparison, this is what stats reports (pvl style) for an image with valid dns:

Group = Results
From = N1459928352_1.cal.cub
Band = 1
Average = 0.0013781489303329
StandardDeviation = 0.00214298959146
Variance = 4.59240438910597e-06
Median = 0.0013503109095907
Mode = 0.0013629601921162
Skew = 0.038970820278095
Minimum = 0.0
Maximum = 0.20724268257618
Sum = 1445.0925146239
TotalPixels = 1048576
ValidPixels = 1048575
OverValidMaximumPixels = 0
UnderValidMinimumPixels = 0
NullPixels = 0
LisPixels = 0
LrsPixels = 0
HisPixels = 0
HrsPixels = 1
End_Group

Sure would be nice to have all of the above reported from one program. Anyway, just something to consider if "N/A" is not appropriate.

@jessemapel jessemapel added the bug Something isn't working label May 15, 2020
@jessemapel
Copy link
Contributor

Labeling this as a bug because it causes issues when appending to a CSV. For the output when nothing is available, we can replace the really weird numbers with just none, n/a, null, or something else.

@lwellerastro
Copy link
Contributor Author

Cool. I'm not sure what is appropriate for the empties, but does it make sense to avoid NULL since that means something in ISIS? I'm not sure because I think we use NULL for geometry output a point intersected has information. Whatever makes sense for DN data since that's what this program reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Products Issues which are impacting the products group
Projects
None yet
Development

No branches or pull requests

3 participants