Replies: 1 comment 1 reply
-
Hi I experienced some RT performance issues when working with .CR2 and .CR3 files and had to tune the use of OMP_NUM_THREADS and OMP_THREAD_LIMIT environment variables on a high core count workstation class machine, yet noticed adequate performance (for me) on a Latte Panda D featuring four cores and four threads with OPENMP compiled into RT running under Linux (Kubuntu). I am unclear what values of OMP_NUM_THREADS you have used for each server other than the last one in the table above. For me, OMP_NUM_THREADS between 4 and 8 (the number of threads to allocate per parallel region) and OMP_THREAD_LIMIT of 32 work for me (.CR2 and .CR3 files): 100MB .RAF files is a work in progress. I suppose you could try profiling RT under your installation to see what might be causing the issues identified. NS |
Beta Was this translation helpful? Give feedback.
-
Disclaimer: I tried my best to find an answer in issues + web before posting here, but wasn't successful.
Summary: I experience slow batch processing when using more cores.
I'm running a batch conversion of about 240k CR2 raw images, always 5-9 per batch at a time. The profile I'm running, is fairly minimal. And it outputs to 16-bit TIFF (
rawtherapee-cli -t -b16
) for subsequent HDR stacking.Now the hardware I'm running this on are multiple powerful Linux servers that are interconnected with (SSD-cached) NFS.
Doing some testing with limiting the number of threads I get the following results. I am able to limit it to the cores that are on one socket via
/sys/devices/system/node/node{0,1}/cpulist
anddocker --cpuset-cpus
OMP_NUM_THREADS=8
)I/O considerations
The raw files are read from NFS, and the output is written to a RAMdisk
/run/user/<userid>/
. Even if I run it twice on the same images (such that the input would be RAM cached by the Linux kernel), the results don't change.OpenMP
Glancing through the code, I see that parallel programming is solved here using OpenMP. In my own programming, I have rather bad experience with OMP in terms of performance (on GCC/Linux), to the point where running it single-threaded (
OMP_NUM_THREADS=1
) was sometimes even faster.I hope you find this evaluation somewhat useful. I would like to see the developers comment on this from their experience, and possibly have the documentation being appended on the topic of multicore machines.
Tags: multicore, many threads, many core, HPC, parallelization, OpenMP, slow
Beta Was this translation helpful? Give feedback.
All reactions