-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use FastHash64 or FNV1a as a 64-bit hash algorithm. #6138
base: master
Are you sure you want to change the base?
Conversation
Now that f799a29 fixes the demo hash collision I see no reason not to use FNV1a hash algorithm and incompatibility commented in point 7 of my original post So I'm force-pushing a version of this PR with this FNV1a instead of FastHash64, the code is simpler, faster and properly checks for the Anybody feel free to comment on either version of the PR, of course. |
Thank you, I'll try to review next week.
#ifdef _MSC_VER
{ sizeof(ImS64), "S64", "%I64d","%I64d" }, // ImGuiDataType_S64
{ sizeof(ImU64), "U64", "%I64u","%I64u" },
#else
{ sizeof(ImS64), "S64", "%lld", "%lld" }, // ImGuiDataType_S64
{ sizeof(ImU64), "U64", "%llu", "%llu" },
#endif BUT would need to check history/blame about this. As we raised our build requirement to a subset of C++11 (VS2012 ?) in 1.87 it might be easier now. I'll be able to check this later if you don't have millions of compilers installed (I do).
Obviously the slight yet biggest annoyance are the Don't act on this idea yet, but I am wondering if it would make sense adding a helper relying on constructor/destructor to optimally fill a static buffer (with a small number of slots) and return a |
There is a test suite?! I see it now, I'll try and play with it.
I don't have so many, I just tested g++ and clang++, they seem to work fine. Personally I would prefer to define
That would make things simpler. I did the option because I thought maybe there are embedded or old systems out there without proper support for 64 bit integers.
Why a static buffer? You can use an automatic one and avoid the destructor altogether. Something like this:
This is risky because that macro returns a pointer to a temporary, but that is the beauty of C++, I guess:
|
Didn’t think of a macro wrapper, that’s better for the risk of dangling pointer is too high with this technique. I think PRIX64 will work just fine on any 64-bit type. |
Minor: as part of e816bc6 I merged the signature fix for ImHashXXX functions using |
I just saw this PR and thought i would share my (very similar) patch against the docking branch. |
crc has the advantage of having intrinsic implementations available..(It would be nice if the ImGuiID were 64bit though.) ^ You would think github of all places could properly handle displaying code, it has a add code button but apparently it does not work properly. Gave up and added it as a text file |
The code button does
Here's your CRC example properly formatted: // 32/64 bit test msvc and intel 32/64 bit test clang,gcc,intel(linux)
#if (defined(_M_IX86)||defined(_M_X64) || defined(__i386__)||defined(__x86_64__)) && (defined(__SSE4_2__) || defined(__AVX__) ) //msvc doesn't define __SSE4_2__, so use next available __AVX__, which implies sse4.2 support
ImGuiID ImHashData(const void* data_p, size_t data_size, ImU32 seed)
{
unsigned char* data = (unsigned char*)data_p, * data_end = (unsigned char*)data_p + data_size;
ImU32 crc = ~seed;
#if defined(_M_X64) || defined(__x86_64__) // 32bit mode doesn't have 64bit crc32
uint64_t crc64 = crc;
for (; (data + 8) <= data_end; data += 8)
crc64 = _mm_crc32_u64(crc64, *(uint64_t*)data);
crc = (ImU32)crc64;
#endif
for (; (data + 4) <= data_end; data += 4)
crc = _mm_crc32_u32(crc, *(ImU32*)data);
for (; data != data_end; ++data)
crc = _mm_crc32_u8(crc, *data);
return ~crc;
}
#else
//original imgui impl here
#endif |
I wonder if it might be easier to just add an advanced config option that leaves FNV1a would be an improvement over CRC32, but it's hardly state of the art anymore. The 32-bit variant of xxHash is supposedly a much higher quality hash with significantly better performance while still being 32-bits. xxHash is probably too big to distribute alongside Dear ImGui (although the main library is single header), but an add-on like |
That CPU-assisted CRC32 change seems surprisingly viable to consider as a temporary step.
While not aiming to hard-preventing it, I'd be in favor of not encouraging people to swap that hashing function, nor adding too many compile-time features that will make their way into package managers, bindings, and complexify things etc. If we settle on a nicer hash function, we'd be good enough. Intuitively I imagine that nicer hash may be 64-bits but I haven't deeply investigated trade-offs. Also note our use case are: very small data, very frequent calls (so code size/prologue/epilogue may have meaningful effect), and things may be not benchmarked enough for that favor. |
That intrinsic looks great, but note that for Also, AFAIK |
This PR implements an alternative 64-bit hash algorithm instead of CRC32, to make it more resilient to hash collisions.
It has already been discussed a few times such as in #4612.
According to my calculations, using the Birthday problem approximation formula, and assuming a perfect hash, with a 32-bit hash the probability of collisions grows as follows:
With a 64-bit hash the probabilities are:
Which, IMO makes a non-malicious collision practically impossible.
A few additional comments:
CXXFLAGS += -DIMGUI_USE_HASH64
to yourMakefile
.ImGuiID
typedef to be a 64-bit integer instead of 32-bits. It is an ABI change, obviously. And I would feel more comfortable doingtypedef ImU64 ImGuiID;
instead of usingunsigned long long
, but that would change the order of existing typedefs.%08X
. With a 64-bit integer that is not correct, so I'm defining a macroIMGUI_HASH_FMT
,<inttypes.h>
style, to do the proper formatting. I think I have changed all of them, but third party code that does something similar would need to be updated.ShowDemoWindow()
suffered from the same collision as CRC32, as commented by @opadin in the linked issue. I'm opening a separate PR for this collision, as it is not just random chance, but this shows how a better hashing may avoid nasty surprises.ImHashData/ImHashStr
must return the unmodifiedseed
if they receive zero bytes. CRC32 and FNV1a do that, but FastHash64 does not, so I'm doing that check manually. It is required for some things such asBeginMenuEx()
callingSelectable("", ...)
, orDragScalarN()
callingDragScalar("", ...)
to reuse the same ID.###
and the ending NUL cannot be done inline. So I'm callingstrlen()
andstrstr()
beforehand. It is incompatible with the previous behavior, as I'm only checking for###
ifdata_size==0
. This is because if the string length is passed explicitly we cannot be sure if there is a NUL byte at the end, sostrsstr
cannot be used. And AFAIK,memmem()
orstrnstr()
are not standard C functions. If desired, we can implementImMemMem/ImStrNStr
easily, of course. Or use a different hashing algorithm that can chomp bytes one by one, such as FNV1a. I think ImGui itself never calls those functions withdata_size != 0
so there is no harm here. With third-party libraries or other language bindings this will be an issue, so please consider this point a request for comments.str*
calls make the runtime ofImHashStr
about the same as CRC32. I expect that 32-bit CPUs will be slightly slower.ImHashData/ImHashStr
so that theseed
argument is of typeImGuiID
instead ofImU32
so they keep the same signature with both hashes.With all that said, I've been using it for a while in a project of mine with no issues at all.