-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add commits range option for scan
in Git repositories
#29
Comments
scan
in Git repositories
Hi @Coruscant11. This is a good use case and a feature that would be nice to have. One challenge with implementing this in Nosey Parker is that Git repos are not scanned commit-by-commit, but instead, all blobs found in the repository are scanned. (This Git scanning technique uncovers more things than going commit-by-commit.) To add this feature to Nosey Parker, we would need to add some alternative Git enumeration mechanism that would walk commit-by-commit and only select blobs reachable from the desired set of commits. The current source for Git repo enumeration is here. Another thing to consider is the CLI for this added feature. |
Another related change to this that I'd like to make in Nosey Parker is to keep track of which inputs have already been scanned, and avoid rescanning them if possible. Currently, I'm going to make a separate issue for this. |
See also: #30 |
That is what I thought. With other scanners which takes the commits by commits way, some repos can take few hours to scan while noseyparker took only 15 seconds. The purpose of this issue is to save time, but if the scanner is that fast, it is not necessarily worth to implement this issue very quickly. But even so, a feature to scan specific revision would be very nice I think! And why not specify a datastore as you said in order to not duplicate scans. For the rare people which are working on insanely huge repositories 😄 For the git revision scan, here is my personal use case :
Datastore are nice, but I think also that in some cases you do not want to rely too much on that, for example on CI/CD when you do not know where can your program run. That is what I am doing at work, I have a very tiny API which has the role to save only commits scan history, but not secrets. But this scanner seems so fast that it become a way more tiny problem. I had a question, in some repositories, the scanner will found the same amount of distinct match at every run but not the same amount of total matches. Do you know why ? I do not think that it is an issue but I was wondering why. Eitherway, the scan method of noseyparker seems very awesome. Very fast, and as you said, discover way more things. 😄 |
I think you're talking about the summary table? For example, from scanning Nosey Parker's repo itself, you get something like this:
The numbers here for each rule indicate how many times that rule matched across all the scanned inputs.
|
@Coruscant11 That is surprising. If you run |
Yeah that doesn't look right! Thanks for reporting that. A separate issue would be perfect. |
@Coruscant11 I created a new issue for the strange behavior your see: #32 |
I've heard it would also be useful to have an option to skip digging into Git history altogether. Noting that here. |
scan
in Git repositoriesscan
in Git repositories
Hi 👋
A great option in secret scanner is to be able to scan a range of commits, for example by adding an option to
scan
.In my case, we use scanners for very large repositories. Once reported, in futures runs there will be no need to scan previously scanned commits. Only new commits are relevant. It saves a lot of time in large repositories.
Gitleaks has this feature , and Trufflehog too.
For example a
since_commits
option, scanning between a specific commit andHEAD
. And why not auntil_commits
option.Do you see any blocking issues for this enhancement?
😄
The text was updated successfully, but these errors were encountered: