Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up detection of binary files #34

Closed
epage opened this issue Jul 14, 2019 · 1 comment · Fixed by #135
Closed

Speed up detection of binary files #34

epage opened this issue Jul 14, 2019 · 1 comment · Fixed by #135
Labels
enhancement Improve the expected

Comments

@epage
Copy link
Collaborator

epage commented Jul 14, 2019

Currently we load the entire file in memory and search for a null byte through all of it.

See #29 for other implementations for how to speed it up

@epage epage added the enhancement Improve the expected label Jul 14, 2019
@epage
Copy link
Collaborator Author

epage commented Oct 25, 2019

So there are two optimizations

  • Speed up text files
  • Speed up binary files, less important but some repos do have them

epage referenced this issue in epage/typos Oct 25, 2019
We reduce how much of the buffer we walk twice which should speed up
large files.  We still load the entire file into memory which will still
hurt binary files.

This is part of #34.
epage referenced this issue in epage/typos Aug 21, 2020
This switches us from a homegrown implementation to `context_inspector`
- Adds some optimizations by looking for the BoM.
- We used the same algorithm for finding Null bytes
- `context_inspector` caps how much of the buffer is searche though

Besides performance, `content_inspector` also has some known-binary
magic numbers to avoid bad detections.

Fixes #34
epage referenced this issue in epage/typos Aug 21, 2020
This switches us from a homegrown implementation to `context_inspector`
- Adds some optimizations by looking for the BoM.
- We used the same algorithm for finding Null bytes
- `context_inspector` caps how much of the buffer is searche though

Besides performance, `content_inspector` also has some known-binary
magic numbers to avoid bad detections.

Fixes #34
epage referenced this issue in epage/typos Aug 21, 2020
This switches us from a homegrown implementation to `context_inspector`
- Adds some optimizations by looking for the BoM.
- We used the same algorithm for finding Null bytes
- `context_inspector` caps how much of the buffer is searche though

Besides performance, `content_inspector` also has some known-binary
magic numbers to avoid bad detections.

Fixes #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant