Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid failing incidents from being scheduled in aggregates #154

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

foursixnine
Copy link
Member

@foursixnine foursixnine commented Jan 30, 2024

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9872fbc) 66.84% compared to head (e551352) 67.49%.
Report is 6 commits behind head on master.

Files Patch % Lines
openqabot/types/incident.py 96.29% 1 Missing ⚠️
openqabot/types/incidents.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #154      +/-   ##
==========================================
+ Coverage   66.84%   67.49%   +0.64%     
==========================================
  Files          24       25       +1     
  Lines        1659     1692      +33     
==========================================
+ Hits         1109     1142      +33     
  Misses        550      550              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

openqabot/types/incident.py Outdated Show resolved Hide resolved
openqabot/types/incident.py Outdated Show resolved Hide resolved
openqabot/types/aggregate.py Outdated Show resolved Hide resolved
@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch 2 times, most recently from 0d80e14 to 851afb1 Compare January 30, 2024 07:25
@foursixnine
Copy link
Member Author

@okurz since you're requesting changes already, can you guide me in the test/code paths? the_demon_incident in theory should not show up, any thoughts?

openqabot/types/incident.py Outdated Show resolved Hide resolved
openqabot/types/incident.py Outdated Show resolved Hide resolved
openqabot/types/aggregate.py Outdated Show resolved Hide resolved
@foursixnine
Copy link
Member Author

@okurz can you please tag this pr as ai-assisted?

@Martchus
Copy link
Contributor

We don't have that kind of label yet (in that repo). Are you saying you've been using AI here? If yes, why is that relevant and what would adding that tag change?

@foursixnine
Copy link
Member Author

We don't have that kind of label yet (in that repo). Are you saying you've been using AI here? If yes, why is that relevant and what would adding that tag change?

It add traceability, of things that have been done using some sort of assistance, while not important for you, SUSE-wise it is.

Copy link
Contributor

@Martchus Martchus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good.

tests/test_incident.py Outdated Show resolved Hide resolved
openqabot/types/incident.py Outdated Show resolved Hide resolved
tests/test_incident.py Show resolved Hide resolved
@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch from 51e7792 to 091b1cf Compare January 31, 2024 10:11
@okurz
Copy link
Member

okurz commented Jan 31, 2024

@okurz can you please tag this pr as ai-assisted?

As I explained I don't think it's a good idea but as you insist I created and added that label.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
openqabot/types/aggregate.py Show resolved Hide resolved
tests/test_aggregate.py Outdated Show resolved Hide resolved
@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch from ccb9ef0 to 5622827 Compare January 31, 2024 11:49
Copy link
Contributor

mergify bot commented Jan 31, 2024

This pull request is now in conflicts. Could you fix it? 🙏

@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch from 8f2a737 to d436336 Compare January 31, 2024 12:37
Makefile Outdated
Comment on lines 44 to 51
# devel: environment
# maybe use Makefile.VENV instead to get a shell with virtualenv
# # we need to detect what shell we are using
# shell=$$(basename $$SHELL); \
# echo "Activating virtualenv for $$shell"; \
# . $(VENV) && \
# exec $($(SHELL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep it on a different branch for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, updated the comment though; Not adding another TODO, to avoid causing a heart attack to @okurz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this might be worse than a TODO as it's dead/disabled code and nobody will know why it's not enabled. I recommend you just remove that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okurz thank you for the recommendation, however same comment applies:

#154 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to at least state why this code has been disabled; e.g. why it is not good/useful enough for general use and in what situations it would make sense to use it nevertheless. The comments

# Developers have bad memory, so we need to remind them to activate the virtualenv
# maybe use Makefile.VENV instead to get a shell with virtualenv

don't make that clear to me at all.

Additionally, also if that code was not commented-out I'd frankly struggle to make sense of its intended use and purpose. So that should probably be clarified anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, also if that code was not commented-out I'd frankly struggle to make sense of its intended use and purpose. So that should probably be clarified anyway.

Good point, updated the comment

openqabot/types/incident.py Outdated Show resolved Hide resolved
@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch 2 times, most recently from ffb63a0 to 905101f Compare January 31, 2024 21:51
Comment on lines 171 to 177
{"status": "passed", "job_id": 1},
{"status": "failed", "job_id": 1777}, # Accept the turk
{
"status": "softfailed",
"job_id": 2020,
}, # 2020 is the genesys of dark fate
{"status": "failed", "job_id": 2042}, # This one has a dark fate
{"status": "passed", "job_id": 3},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python linting can be ugly at times 🗡️

@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch from 905101f to b194cd7 Compare January 31, 2024 21:56
# - remove almost duplicated code from Approver.is_job_marked_acceptable_for_incident
# as approver does not seem to operate over incidents
# about the TODO see discussion at https://github.com/openSUSE/qem-bot/pull/154#discussion_r1472721681
@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static method of what ??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of class Incident is suppose. I'm not that familiar with Python so I'm wondering what are you getting at. Can you provide a concrete suggestion?

Copy link
Member

@mimi1vx mimi1vx Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OF which class? look at place whole has_ignored_comment is a function not a method .. and isn't part of any class

In perl class is usualy whole file , In python identaton and place matter :D

)

if not results:
raise NoResultsError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exception anywhere caught? and resolved?, btw aggregates could be scheduled before any results are available ..

Copy link
Member

@mimi1vx mimi1vx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions ..

Copy link
Contributor

mergify bot commented Feb 8, 2024

This pull request is now in conflicts. Could you fix it? 🙏

foursixnine added a commit to foursixnine/qem-bot that referenced this pull request Feb 27, 2024
The code used by approver doesn't seem to use Incidents class and a
rewrite at this point has less benefit than simply extracting the
duplicated regular expression. A TODO has been left in place to keep
track for subsequent PRs steming from discussion in [1]

[1] openSUSE#154 (comment)
@foursixnine foursixnine force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch from e551352 to 5c9d366 Compare February 27, 2024 16:20
@foursixnine
Copy link
Member Author

foursixnine commented Feb 27, 2024

@okurz as agreed last week, here is the Pull request to deploy Forbid failing incidents from being scheduled in aggregates #154 which would have helped with the sudo update of yesterday

Copy link
Contributor

@Martchus Martchus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still pending questions. If that was an opt-in we would be able to merge it more easily (without everything being perfect but we could try it out in production without changing the deployment).

# - remove almost duplicated code from Approver.is_job_marked_acceptable_for_incident
# as approver does not seem to operate over incidents
# about the TODO see discussion at https://github.com/openSUSE/qem-bot/pull/154#discussion_r1472721681
@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of class Incident is suppose. I'm not that familiar with Python so I'm wondering what are you getting at. Can you provide a concrete suggestion?

Comment on lines +190 to +193
# TODO:
# - move to utils.py or a better place
# - remove almost duplicated code from Approver.is_job_marked_acceptable_for_incident
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two points don't seem to hard to implement. Am I overlooking something or can we maybe just do them before merging this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we maybe just do them before merging this PR?

I'll leave it for a follow-up when addressing the rest of the changes, as that would need a bigger refactor, due to the incidents class not being used in the approver. thingie.

for comment in ret:
if regex.match(comment["text"]):
# leave comment for future debugging purposes
# log.debug("matched comment incident %s: with comment %s", inc, comment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, after last changes, this is not necessary anymore so I dropped them

It looks like the current version on GitHub still has the disabled line.

@Martchus
Copy link
Contributor

Martchus commented Mar 5, 2024

@Mergifyio rebase

foursixnine and others added 16 commits March 5, 2024 10:27
Let the build system die if any errors are found, this is intended for
local development only.
Leave early when filtering incidents to schedule, as incidents that have
failures don't need further processing. Adjust tests accordingly fixing
that ugly off by one
- While incidents are less likely to have an exception comment, there are
  cases where a failing aggregate from day before, might impact an
  incident on its own to be scheduled

- We want to accept only passed results without questioning,
  anything else, will need to have an acceptable_for, following the
  discussion in [1]

[1] https://github.com/openSUSE/qem-bot/pull/154/files#r1474042954
The code used by approver doesn't seem to use Incidents class and a
rewrite at this point has less benefit than simply extracting the
duplicated regular expression. A TODO has been left in place to keep
track for subsequent PRs steming from discussion in [1]

[1] openSUSE#154 (comment)
qem-bot's data is normalized, so either passed or failed.
Copy link
Contributor

mergify bot commented Mar 5, 2024

rebase

✅ Branch has been successfully rebased

@Martchus Martchus force-pushed the veteolvidamicaramicasaminombreypegalavuelta branch from 5c9d366 to 36ed6db Compare March 5, 2024 10:27
@okurz
Copy link
Member

okurz commented Mar 6, 2024

I also called the application locally but found no relevant logs are output:

./bot-ng.py --configs metadata --singlearch metadata/bot-ng/singlearch.yml -t 1234 --debug --dry updates-run

Possibly the relevant steps are not executed due to dry-run or something else preventing the evaluation of what products to trigger:

2024-03-06 12:34:58 INFO     Bot schedule starts now
2024-03-06 12:34:58 INFO     Project SUSE:Maintenance:17818 has empty channels - check incident in SMELT
2024-03-06 12:34:58 INFO     Project SUSE:Maintenance:17958 has empty channels - check incident in SMELT
…
2024-03-06 12:35:28 INFO     Project SUSE:Maintenance:18479 can't calculate repohash  .. skipping
2024-03-06 12:35:28 INFO     Project SUSE:Maintenance:18485 has empty channels - check incident in SMELT
…
2024-03-06 12:35:59 INFO     Project SUSE:Maintenance:19102 can't calculate repohash  .. skipping
2024-03-06 12:35:59 INFO     Project SUSE:Maintenance:24734 has empty channels - check incident in SMELT
…
2024-03-06 12:36:31 INFO     Project SUSE:Maintenance:28369 can't calculate repohash  .. skipping
…
2024-03-06 12:37:02 INFO     Project SUSE:Maintenance:28667 can't calculate repohash  .. skipping
…
2024-03-06 12:37:32 INFO     Project SUSE:Maintenance:28784 can't calculate repohash  .. skipping
2024-03-06 12:37:32 INFO     Project SUSE:Maintenance:29248 has empty packages - check incident in SMELT
2024-03-06 12:37:32 INFO     Project SUSE:Maintenance:30071 has empty channels - check incident in SMELT
2024-03-06 12:37:41 INFO     Project SUSE:Maintenance:31645 has empty channels - check incident in SMELT
2024-03-06 12:37:42 INFO     Project SUSE:Maintenance:32086 has empty channels - check incident in SMELT
2024-03-06 12:37:59 INFO     Project SUSE:Maintenance:32288 has empty channels - check incident in SMELT
2024-03-06 12:38:09 INFO     Project SUSE:Maintenance:32462 has empty channels - check incident in SMELT
2024-03-06 12:38:33 INFO     Project SUSE:Maintenance:32613 has empty channels - check incident in SMELT
2024-03-06 12:39:13 INFO     Project SUSE:Maintenance:32782 has empty channels - check incident in SMELT
2024-03-06 12:39:33 INFO     Project SUSE:Maintenance:32808 has empty channels - check incident in SMELT
2024-03-06 12:39:43 INFO     Project SUSE:Maintenance:32824 has empty channels - check incident in SMELT
2024-03-06 12:40:16 INFO     Project SUSE:Maintenance:32877 has empty channels - check incident in SMELT
…
2024-03-06 12:40:47 INFO     Project SUSE:Maintenance:32879 can't calculate repohash  .. skipping
2024-03-06 12:40:47 INFO     … incidents loaded from qem dashboard
2024-03-06 12:40:47 DEBUG    Skipping invalid config metadata/.gitlab-ci.yml
2024-03-06 12:40:47 DEBUG    Skipping invalid config metadata/products.yml
2024-03-06 12:40:47 INFO     Starting bot mainloop
2024-03-06 12:40:47 INFO     Would trigger 0 products in openQA
2024-03-06 12:40:47 INFO     End of bot run

EDIT: I also called

for i in full-run incidents-run updates-run inc-approve inc-sync-results aggr-sync-results; do echo "### $i" && ./bot-ng.py --configs metadata --singlearch metadata/bot-ng/singlearch.yml -t 1234 --debug --dry $i; done 2>&1 | tee qem_bot_dry_run-master-$(date -Is).log && hub pr checkout 154 && for i in full-run incidents-run updates-run inc-approve inc-sync-results aggr-sync-results; do echo "### $i" && ./bot-ng.py --configs metadata --singlearch metadata/bot-ng/singlearch.yml -t 1234 --debug --dry $i; done 2>&1 | tee qem_bot_dry_run-pr154-$(date -Is).log

and compared both output logs to see if there is any reasonable difference. "inc-sync-results" provide a way too huge list of results to process, inc-approve shows a lot of difference but due to changed realtime results, not related to this pull request. The relevant commands (if at all) are "full-run incidents-run updates-run" and there are no differences at all in the output (except for timestamps) meaning what I stated in before: Possibly the relevant steps are not executed due to dry-run or something else preventing the evaluation of what products to trigger and further more significant reverse-engineering would be necessary to change that.

Copy link
Contributor

mergify bot commented Apr 16, 2024

This pull request is now in conflicts. Could you fix it? 🙏

dzedro pushed a commit to dzedro/os-autoinst-distri-opensuse that referenced this pull request Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.