Fuzz testing version 3 #242

Numpsy · 2024-11-19T21:53:40Z

I tried running the current version 3 through SharpFuzz using the 'english.presets.doc' test file as a base file, and it generated the attached file which seems to cause an infinite loop of some sort trying to open the file (Just trying to open it in structured storage explorer makes it hang).

For reference, v2.4 has the same issue, but the Windows native functions report that the file is corrupt without any real delay.

timeout.zip

jeremy-visionaid · 2024-11-20T08:08:05Z

Thanks for testing it out against SharpFuzz! It actually isn't an infinite loop - it will throw an exception once the maximum chain length is reached. However, we can make that determination a lot sooner by comparing to the sector count implied by the stream length rather than the maximum sector value constant.

In the general case though, if the chain has a big loop then we'd to commit some time and memory if we wanted to detect them sooner. We could do something similar to v2.4 and store some hash set of previously encountered values, but since streams could potentially be terabytes long, we'd also likely want some approach that avoids blowing out time and memory in checking for previously encountered entries. My initial though would be something like binary exponential backoff where only the values at indexes that are powers of two are retained.

Improve loop detection time, but limit memory usage and HashSet operations with binary exponential backoff. Fixes: #242

jeremy-visionaid · 2024-11-20T08:58:35Z

It's worth mentioning that the loop detection I was proposing above was more of a guard against DOS, and that validation of the payload itself would need to be handled by the client app. Though this particular issue only affects the directory entry chain anyway, since the streams declare their length in their respective directory entry and the implementation wouldn't try to read beyond the stream length (and an exception is already thrown if the chain is too short).

In v4 the detection time could also be improved since the header declares the directory sector count, so the largest valid FAT value would be the minimum of the base stream sector count and the directory sector count.

jeremy-visionaid · 2024-11-20T09:49:27Z

Oh, haha, I probably shouldn't try and fix things like this late in the evening. The loop might occur anywhere, so we'd need to do something like Floyd/Brent... I wonder if the reference implementation actually does anything more sophisticated than the chain length against the sector count....

jeremy-visionaid · 2024-11-20T10:10:18Z

I need to add something similar for the mini FAT, but it's late so I'll leave it till tomorrow when I can look at it with fresh eyes!

Numpsy · 2024-11-20T15:05:05Z

There was a mention of floyds algorithm in one of the previous 'corrupted document' issues, but it didn't go anywhere.

A tangentially related thought on apis - there was a discussion about async functions that got put off, but is it worth considering adding cancellation token support to none-async functions that could still be used to abort potentially long lived operations? (a caller could have one that cancelled on a timer for example).

I just remembered that we raised something similar against the Open XML SDK , though that's basically a no go because lower level bits like System.IO.Packaging don't support it.

jeremy-visionaid · 2024-11-20T20:11:56Z

Yeah, I think Floyd's is the intuitively obvious approach, but it would have been hard to add it to 2.4 since it worked by constructing lists, so performance/memory would be worse than just using a simple HashSet. Since 3.0 works on enumerators/iterators it's fairly straightforward to do cycle detection with just an extra uint for the "slow" path.

As for cancellation tokens, my answer is similar to the Open XML SDK folk - we can't add cancellation tokens without async (it's only long lived if it's blocking on I/O for some reason, and sync streams can't be cancelled). Otherwise the caller can abort whenever they like. i.e. there's just no point in adding cancellation tokens to sync stream functions IMHO - there's nothing that I can add to the API that you can't already do for yourself as a caller (hence it's beyond the scope of what the API should do). Fortunately, we're also not blocked like they are - if we want to add async functions then we can, and cancellation tokens would inherently be part of that. You're welcome to create separate ticket for implementing the async APIs.

jeremy-visionaid added the performance issues or enhancements related to the performance characteristics of the project label Nov 20, 2024

jeremy-visionaid added this to the 3.0 milestone Nov 20, 2024

jeremy-visionaid added a commit that referenced this issue Nov 20, 2024

Improve FAT chain loop detection

7a80c56

Improve loop detection time, but limit memory usage and HashSet operations with binary exponential backoff. Fixes: #242

jeremy-visionaid closed this as completed in aa1c1ea Nov 20, 2024

jeremy-visionaid reopened this Nov 20, 2024

jeremy-visionaid mentioned this issue Nov 20, 2024

Use Floyd's algorithm for FAT cycles #247

Merged

jeremy-visionaid closed this as completed in #247 Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz testing version 3 #242

Fuzz testing version 3 #242

Numpsy commented Nov 19, 2024

jeremy-visionaid commented Nov 20, 2024 •

edited

Loading

jeremy-visionaid commented Nov 20, 2024 •

edited

Loading

jeremy-visionaid commented Nov 20, 2024

jeremy-visionaid commented Nov 20, 2024

Numpsy commented Nov 20, 2024

jeremy-visionaid commented Nov 20, 2024

Fuzz testing version 3 #242

Fuzz testing version 3 #242

Comments

Numpsy commented Nov 19, 2024

jeremy-visionaid commented Nov 20, 2024 • edited Loading

jeremy-visionaid commented Nov 20, 2024 • edited Loading

jeremy-visionaid commented Nov 20, 2024

jeremy-visionaid commented Nov 20, 2024

Numpsy commented Nov 20, 2024

jeremy-visionaid commented Nov 20, 2024

jeremy-visionaid commented Nov 20, 2024 •

edited

Loading

jeremy-visionaid commented Nov 20, 2024 •

edited

Loading