-
Notifications
You must be signed in to change notification settings - Fork 3
Calibration routine evaluation
The sections below describe comparison of ALUs-generated Sentinel-1 calibrated, geocoded products to respective SNAP-generated products.
The benchmarking exercise was performed on AWS p3.8xlarge instance.
The exact commands and story are available here - https://github.com/cgi-estonia-space/ALUs-benchmark/blob/main/stories/run_maharashtra.sh
The benchmarking exercise uses ALUs version 1.6, available here - https://github.com/cgi-estonia-space/ALUs/releases/tag/v1.6.0
Input product - S1A_IW_SLC__1SDV_20210722T005537_20210722T005604_038883_049695_2E58
A single subswath level benchmark was performed on a Sentinel-1 SLC image from 22.07.2021 (product name shown in table below). The subswath (IW2) contains full landmass, hence it provides a perfect example to validate processing speed, since none of the areas are masked out during the terrain correction step.
The two tables below show the pixel value comparison for Calibration Routine outputs, with the SRTM3 and Copernicus 30 DEM's used, respectively. The column names are described below:
1sw_srtm3 | min | max | mean | std dev | valid percent | processing time |
---|---|---|---|---|---|---|
SNAP | 4.36E-09 | 139.63372802734 | 0.041358601134968 | 0.067379523830514 | 67.67 | 25.319 |
ALUs | 4.36E-09 | 139.63372802734 | 0.041358601134643 | 0.067379523766886 | 67.67 | 9.51 |
Difference | 0 | 0 | 3.25003912671207E-13 | 6.36279917642923E-11 | 0 | 2.66235541535226 |
1sw_copdem | min | max | mean | std dev | valid percent | processing time |
---|---|---|---|---|---|---|
SNAP | 1.38E-09 | 96.673011779785 | 0.043002644066854 | 0.068440809284713 | 67.68 | 74.042 |
ALUs | 1.38E-09 | 96.673011779785 | 0.043002644066667 | 0.068440809284715 | 67.68 | 9.524 |
Difference | 0 | 0 | 1.86996251816396E-13 | 1.99840144432528E-15 | 0 | 7.7742545149097 |
What can be clearly drawn is the processing speed difference with SNAP when SRTM3 and COPDEM 30m are used. COPDEM 30m has higher resolution, hence more pixel calculations are required in order to interpolate values for the raster. For GPU higher computing intensity does not matter and the processing time is equal, while CPU based solution clearly slows down in case of a higher resolution DEM being used. For this processing routine only terrain correction is affected.
Pixel by pixel comparison of SNAP and ALUs outputs using a tool called rastcomp results are shown below.
DEM single subswath | bad pixels % | pixels different % | avg relative difference PPM |
---|---|---|---|
SRTM3 | 0 | 1.615699 | 2.689596141238527 |
COPDEM 30m | 0 | 0.288575 | 0.087527928773048 |
The match between pixels is very high. There are diminutive differences for floating point calculations between different ICs, but for calibration this is especially good. Again, COPDEM 30m results are very good, since higher spatial resolution when interpolating pixel values gives less room for errors when approximating.
Below are colored relative difference rasters (differing pixels are hard to spot).
SRTM3
COPDEM 30m
The scene contains about 15% of sea that is masked out during the processing.
3sw_srtm3 | min | max | mean | std dev | valid percent | processing time |
---|---|---|---|---|---|---|
SNAP | 3.55E-09 | 403.77536010742 | 0.038183092419879 | 0.07526116163104 | 59.56 | 47.257 |
ALUs | 3.55E-09 | 403.77536010742 | 0.038183064136637 | 0.075261126198311 | 59.56 | 30.214 |
Difference | 0 | 0 | 2.82832419984391E-08 | 3.54327290008616E-08 | 0 | 1.56407625604025 |
3sw_copdem | min | max | mean | std dev | valid percent | processing time |
---|---|---|---|---|---|---|
SNAP | 8.20E-09 | 523.56628417969 | 0.039470374739396 | 0.091372051397192 | 59.49 | 170.027 |
ALUs | 8.20E-09 | 523.56628417969 | 0.03947034975309 | 0.09137200920301 | 59.49 | 29.991 |
Difference | 0 | 0 | 2.49863060033939E-08 | 4.21941820094585E-08 | 0 | 5.66926744690074 |
Again, same conclusions can be drawn as before, although since data amount is trice over the single subwath, the I/O speed does influence the difference between CPU and GPU based solution. Also, calibration routine is not that math heavy as coherence estimation for example, the pure gain over the CPU solution is not that high.
Pixel by pixel comparisons are showing little larger disparity, but this comes from the fact that sea is masked where coastline pixels introduce more floating point calculation influence disparity. Also there are 3 times more pixels hence little errors on subswath do accumulate over the area. Below statistics is displayed a resulting raster rather than pixel scarce relative difference picture (a COPDEM 30m produced result).
DEM single subswath | bad pixels % | pixels different % | avg relative difference PPM |
---|---|---|---|
SRTM3 | 0.000074 | 3.181084 | 11.108893449756106 |
COPDEM 30m | 0.000073 | 0.225636 | 93.170354827889327 |
Below is a screenshot from one of the NVIDIA provided profilers. This one show kernels run (on the left) along with the total GPU time in percentage consumed by that calculation. On the right side is the total (which is longer from usual because of the profiling overhead) time and also GPU specific stuff like "Compute Utilization" which is marked in red. From that we can conclude that the current implementation would be able to process 3 subswaths around 88% faster, if the implementation could harness GPU all the time. In the main timeline view a lot of gaps can be seen. Hence a ~30 second processing time for Sentinel-1 scene could be ~3.5 seconds. And for a GPU with 4GB memory subswaths could be processed at the same time, since the kernels listed are utilizing small % of the floating point compute resources (2nd picture) while enabling to fill the memory with data. This was measured on a personal computer with NVIDIA GeForce GTX 1660 Ti GPU. On a cloud/grid level GPU where floating point calculations throughput is not capped, this would mean smaller "Compute Utilization" and hence even faster theoretical processing time (processing subswaths or scenes at the same time).