Skip to content

Calibration routine evaluation

kautlenbachs edited this page Nov 28, 2024 · 18 revisions

Benchmark

The sections below describe comparison of ALUs-generated Sentinel-1 calibrated, geocoded products to respective SNAP-generated products.

The benchmarking exercise was performed on AWS p3.8xlarge instance.
The exact commands and story are available here - https://github.com/cgi-estonia-space/ALUs-benchmark/blob/main/stories/run_maharashtra.sh
The benchmarking exercise uses ALUs version 1.6, available here - https://github.com/cgi-estonia-space/ALUs/releases/tag/v1.6.0

Input product - S1A_IW_SLC__1SDV_20210722T005537_20210722T005604_038883_049695_2E58

Single subswath

A single subswath level benchmark was performed on a Sentinel-1 SLC image from 22.07.2021 (product name shown in table below). The subswath (IW2) contains full landmass, hence it provides a perfect example to validate processing speed, since none of the areas are masked out during the terrain correction step.

The two tables below show the pixel value comparison for Calibration Routine outputs, with the SRTM3 and Copernicus 30 DEM's used, respectively. The column names are described below:

1sw_srtm3 min max mean std dev valid percent processing time
SNAP 4.36E-09 139.63372802734 0.041358601134968 0.067379523830514 67.67 25.319
ALUs 4.36E-09 139.63372802734 0.041358601134643 0.067379523766886 67.67 9.51
Difference 0 0 3.25003912671207E-13 6.36279917642923E-11 0 2.66235541535226
1sw_copdem min max mean std dev valid percent processing time
SNAP 1.38E-09 96.673011779785 0.043002644066854 0.068440809284713 67.68 74.042
ALUs 1.38E-09 96.673011779785 0.043002644066667 0.068440809284715 67.68 9.524
Difference 0 0 1.86996251816396E-13 1.99840144432528E-15 0 7.7742545149097

What can be clearly drawn is the processing speed difference with SNAP when SRTM3 and COPDEM 30m are used. COPDEM 30m has higher resolution, hence more pixel calculations are required in order to interpolate values for the raster. For GPU higher computing intensity does not matter and the processing time is equal, while CPU based solution clearly slows down in case of a higher resolution DEM being used. For this processing routine only terrain correction is affected.

Pixel by pixel comparison of SNAP and ALUs outputs using a tool called rastcomp results are shown below.

DEM single subswath bad pixels % pixels different % avg relative difference PPM
SRTM3 0 1.615699 2.689596141238527
COPDEM 30m 0 0.288575 0.087527928773048

The match between pixels is very high. There are diminutive differences for floating point calculations between different ICs, but for calibration this is especially good. Again, COPDEM 30m results are very good, since higher spatial resolution when interpolating pixel values gives less room for errors when approximating.

Below are colored relative difference rasters (differing pixels are hard to spot).

rastcomp_rel_err_legend

SRTM3

image

COPDEM 30m

image

Whole scene (3 subswaths)

The scene contains about 15% of sea that is masked out during the processing.

3sw_srtm3 min max mean std dev valid percent processing time
SNAP 3.55E-09 403.77536010742 0.038183092419879 0.07526116163104 59.56 47.257
ALUs 3.55E-09 403.77536010742 0.038183064136637 0.075261126198311 59.56 30.214
Difference 0 0 2.82832419984391E-08 3.54327290008616E-08 0 1.56407625604025
3sw_copdem min max mean std dev valid percent processing time
SNAP 8.20E-09 523.56628417969 0.039470374739396 0.091372051397192 59.49 170.027
ALUs 8.20E-09 523.56628417969 0.03947034975309 0.09137200920301 59.49 29.991
Difference 0 0 2.49863060033939E-08 4.21941820094585E-08 0 5.66926744690074

Again, same conclusions can be drawn as before, although since data amount is trice over the single subwath, the I/O speed does influence the difference between CPU and GPU based solution. Also, calibration routine is not that math heavy as coherence estimation for example, the pure gain over the CPU solution is not that high.

Pixel by pixel comparisons are showing little larger disparity, but this comes from the fact that sea is masked where coastline pixels introduce more floating point calculation influence disparity. Also there are 3 times more pixels hence little errors on subswath do accumulate over the area. Below statistics is displayed a resulting raster rather than pixel scarce relative difference picture (a COPDEM 30m produced result).

DEM single subswath bad pixels % pixels different % avg relative difference PPM
SRTM3 0.000074 3.181084 11.108893449756106
COPDEM 30m 0.000073 0.225636 93.170354827889327

image

Practical GPU gains

Below is a screenshot from one of the NVIDIA provided profilers. This one show kernels run (on the left) along with the total GPU time in percentage consumed by that calculation. On the right side is the total (which is longer from usual because of the profiling overhead) time and also GPU specific stuff like "Compute Utilization" which is marked in red. From that we can conclude that the current implementation would be able to process 3 subswaths around 88% faster, if the implementation could harness GPU all the time. In the main timeline view a lot of gaps can be seen. Hence a ~30 second processing time for Sentinel-1 scene could be ~3.5 seconds. And for a GPU with 4GB memory subswaths could be processed at the same time, since the kernels listed are utilizing small % of the floating point compute resources (2nd picture) while enabling to fill the memory with data. This was measured on a personal computer with NVIDIA GeForce GTX 1660 Ti GPU. On a cloud/grid level GPU where floating point calculations throughput is not capped, this would mean smaller "Compute Utilization" and hence even faster theoretical processing time (processing subswaths or scenes at the same time).

For better resolution click picture source