It doesn't work with large files (~40M in my case) #17

peske · 2020-12-13T01:29:27Z

Thanks for the effort, but it looks that it doesn't work... At least not optimally. Here's my setup:

I have two binary files (DLLs), which have the exact same size (about 40M), and which differ very slightly. Here's the screenshot from BeyondCompare:

As you can see, the files differ only in very few bytes.

But when I've did the following (example from the documentation):

//Create fingerprint of a file
fingerprint := NewFingerprint("/path/foo_v1.binary", 1024)

//Say the file was updated
//Lets generate the diff
diff := NewDiff("/path/foo_v2.binary", *fingerprint)

I've found out that the resulting diff has more than 725,000 blocks (Block). Serialized in JSON the diff is about 9M. I've also tried with a smaller block size (64), and ended up with diff of 150M in JSON.

Sadly I cannot share the actual DLLs (company secret), but I believe that you can reproduce by using any DLL with a similar size, make a copy with few bytes changed here and there, and try.

The text was updated successfully, but these errors were encountered:

peske · 2020-12-13T01:41:17Z

Btw. wast majority (99.999% in my case) of returned blocks have RawData=nil and HasData=false. Are they really needed? I see that they contain some checksums - maybe to check the input file before patching? If so, isn't be better to ensure the input file integrity in a cheaper way, like include the whole file checksum as output...

monmohan · 2020-12-13T02:51:12Z

I have test cases with different binary files (including large files) so I know that the diff generation and patching does work. But I do understand your point about the patch file size being not optimal. I need to look into more optimal ways of serializing the patch information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It doesn't work with large files (~40M in my case) #17

It doesn't work with large files (~40M in my case) #17

peske commented Dec 13, 2020

peske commented Dec 13, 2020

monmohan commented Dec 13, 2020 •

edited

Loading

It doesn't work with large files (~40M in my case) #17

It doesn't work with large files (~40M in my case) #17

Comments

peske commented Dec 13, 2020

peske commented Dec 13, 2020

monmohan commented Dec 13, 2020 • edited Loading

monmohan commented Dec 13, 2020 •

edited

Loading