Speed up detection of binary files #34

epage · 2019-07-14T04:42:28Z

Currently we load the entire file in memory and search for a null byte through all of it.

See #29 for other implementations for how to speed it up

epage · 2019-10-25T21:03:32Z

So there are two optimizations

Speed up text files
Speed up binary files, less important but some repos do have them

We reduce how much of the buffer we walk twice which should speed up large files. We still load the entire file into memory which will still hurt binary files. This is part of #34.

This switches us from a homegrown implementation to `context_inspector` - Adds some optimizations by looking for the BoM. - We used the same algorithm for finding Null bytes - `context_inspector` caps how much of the buffer is searche though Besides performance, `content_inspector` also has some known-binary magic numbers to avoid bad detections. Fixes #34

epage added the enhancement Improve the expected label Jul 14, 2019

epage referenced this issue in epage/typos Oct 25, 2019

perf: Speed up detection of text files

c20e8f6

We reduce how much of the buffer we walk twice which should speed up large files. We still load the entire file into memory which will still hurt binary files. This is part of #34.

epage mentioned this issue Aug 21, 2020

perf: Faster binary-file detection #135

Merged

epage closed this as completed in #135 Aug 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up detection of binary files #34

Speed up detection of binary files #34

epage commented Jul 14, 2019

epage commented Oct 25, 2019

Speed up detection of binary files #34

Speed up detection of binary files #34

Comments

epage commented Jul 14, 2019

epage commented Oct 25, 2019