Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize various bits around scan profiles #4050

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

underdarknl
Copy link
Contributor

@underdarknl underdarknl commented Jan 28, 2025

Changes

I noticed we do a periodic recalculate_scan_profiles, which by itself loops over the list of profiles three times.

QA notes

Scan profiles should still be handled the exact same way. The code is functionally the same as before.

Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.
  • Tickets have been created for newly discovered issues.
  • For any non-trivial functionality, I have added integration and/or end-to-end tests.
  • I have informed others of any required .env changes files if required and changed the .env-dist accordingly.
  • I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

Copy-paste the checklist from the docs/source/templates folder into your comment.


Checklist for QA:

Copy-paste the checklist from the docs/source/templates folder into your comment.

@underdarknl underdarknl added octopoes Issues related to octopoes tech-debt labels Jan 28, 2025
@underdarknl underdarknl requested a review from a team as a code owner January 28, 2025 09:18
@underdarknl
Copy link
Contributor Author

Further improvements:
possibly only fetch the state of the scanlevels IF there have been any changes. Can we select the highest transactionID from xtdb somehow?

ammar92
ammar92 previously approved these changes Jan 29, 2025
Copy link
Contributor

@ammar92 ammar92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It's clever to do all_declared_scan_profiles and assigned_scan_levels in one pass. There could be one step more for little optimization, and that is by creating source_scan_profile_references after the loop. You gain a little performance improvement because it's faster to setup a set on an existing list (or any sequence or iterable) than constructing it element by element via set.add (e.g. previous implementation source_scan_profile_references = {sp.reference for sp in all_declared_scan_profiles})

Example of benchmark code below.

import timeit

code1 = """
x = set()
l = []
for i in range(50000):
    l.append(i)
    x.add(i)
"""

print(timeit.timeit(code1, number=2000)) # 2.3204293749295175

code2 = """
l = [i for i in range(50000)]
x = set([i for i in l])
"""

print(timeit.timeit(code2, number=2000)) # 2.087212207959965

But the performance gain is negligible and the suggested implementation is already fast and understandable, so I'll leave it up to you.

@underdarknl
Copy link
Contributor Author

Nice work! It's clever to do all_declared_scan_profiles and assigned_scan_levels in one pass. There could be one step more for little optimization, and that is by creating source_scan_profile_references after the loop. You gain a little performance improvement because it's faster to setup a set on an existing list (or any sequence or iterable) than constructing it element by element via set.add (e.g. previous implementation source_scan_profile_references = {sp.reference for sp in all_declared_scan_profiles})

Yes, I was pondering this too, as set operations 'one by one' are slower than doing an update, or mass assignment. There's a bunch of other optimizations that can be done, but I'm focussing on not running the entire loop if its needed first, as thats the main win that would safe 90% or more for most installs.

Furthermore Im guessing all of this will be undone once the nibbles start taking care of propagation, as they do all this taint tracking in a much more efficient way.

@underdarknl underdarknl changed the title optimize loops in recalculate_scan_profiles optimize various bits around scan profiles Jan 30, 2025
@underdarknl
Copy link
Contributor Author

the tests are now failing on ill-formatted test-references.

@stephanie0x00
Copy link
Contributor

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.
  • I checked the logs for errors and/or warnings and made issues where necessary

What works:

I think it works. Did identify some 'weird' things in the propagation, but those seem to already be on main.

What doesn't work:

n/a

Bug or feature?:

n/a

@underdarknl underdarknl added this to the OpenKAT v1.19 milestone Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
octopoes Issues related to octopoes tech-debt
Projects
Status: Ready for merge
Development

Successfully merging this pull request may close these issues.

4 participants