-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INVESTIGATION] ImageBuffer iterators based on Space Filling Curves #4554
Comments
My first suggestion would be to double check what the access pattern is in the current code. For instance, if the data is stored in scanline order and you're reading in scanline order, then you are already optimal. On the other hand, if your algorithm needs to randomly read a 3x3 window, then the space filling curve is likely to help. If you have a 3x3 moving window, then you're reading from 3 scanlines in a predictable fashion and each time you move the window to the right by one pixel, the 3 new pixels are all likely to already be in your L2 and often in 3 L1 cachelines. |
Hey @ThiagoIze, thanks for the note that definitely makes sense ! |
Sorry, I don't have anything in mind that could benefit from space filling curves. I imagine most image processing algorithms already operate on all the pixels in relatively cache friendly ways. Having said that, there's large amounts of OIIO I am not familiar with, so maybe there's something that could benefit from it? |
Maybe if you were operating on huge, ImageCache-backed, tiled images that would exceed tile cache size if read in a suboptimal order, you could reduce I/O significantly if you had a pixel traversal order that was tile-size-aware and made sure it never had to revisit a tile after moving to another? For in-core images (or cached images where the cache size is more than the total of the images you need in memory -- in which case you're still reading each tile exactly once), I'm not sure that you could do measurably better than we do currently in scanline order. (Detail: most IBA function will use the thread pool, so actually it's breaking the image into horizontal strips of adjacent scanlines, and working on those simultaneously. So I'm less sure about how that complicates predictions of how either the RAM cache or IC cache performance would be affected by pixel traversal order.) Maybe before investing much work, one thing you could try is to experiment with doing it as badly as possible -- like if you made an alternate IB::Iterator that visits every pixel within the region, but as incoherently as possible (it's the blue noise of iterators!) and benchmark a bunch of IBA functions against to the current boring scanline traversal iterator, then surely that tells you something about whether this is likely to be helpful. I mean, if you can barely measure the perf difference between "as incoherent as possible" and "scanline", you are not going to get much benefit from "slightly better space-filling order". But if there is a huge difference in this experiment, maybe it's worth the next step of the investigation? |
Another thought: If this did seem promising in benchmarks, I still think it's likely that there is no best order, but that different orders might be better in different situations. I definitely wouldn't want to make additional conditional evaluations in the IB::Iterator traversal, like you don't want every ++ to be doing "if traversal order A then... else if traversal order B then...". It seems self-defeating to add expense to the traversal itself. A design I might propose is to change IB::Iterator, which is currently templated on the data type of the buffer and the data type that you'd like the values to APPEAR to be (usually float), and add a third template parameter for "IteratorTraits", which among other possible thing can be the home of the code that is currently IB::Iterator::operator++, which is implicitly determines the traversal order. So the default one can be as it currently is, but if you defined
then you could have
and it would do the alternate traversal order. Anyway, something like that could allow you to experiment with different traversal patterns without adding any overhead or complexity to the default behavior. |
Definitely a good idea, I'll start with that 😅 |
Sounds like a good way to go ! |
I just want to emphasize something: I absolutely do not want to "force push" a new feature in OIIO, if the investigation does not display any real gain in implementing space filling curves, or if it's only in some particuliar situation that no one is ever going to run into, then let's just not do it, that would create additional code complexity for nothing and I tend to hate that 😅 |
Recently I've been looking into space filling curves, in particular the Morton curve (see https://en.wikipedia.org/wiki/Z-order_curve) and the Hilbert curve (see https://en.wikipedia.org/wiki/Hilbert_curve). I've read that these curves can be used to iterate through the pixels of an image in a "cache-friendly" way as they preserve spatial locality fairly well, and thus could improve the performances of some image processing algorithms.
I've started doing some investigation work, however the results I got so far do not suggest any performance improvements when using the Morton curve to iterate: I'm applying a simple convolution with a Gaussian kernel on a 4096x4096 image (on a single thread), and iterating with
ImageBuffer::Iterator
vs with my custom Morton curve implementation both take on average the exact same time.If someone has some knowledge around space filling curves, performance engineering, or simply wants to continue the investigation on their side and provide more insight on this, your help is definitely welcome!
Here is the code I wrote for this initial investigation:
The text was updated successfully, but these errors were encountered: