Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notifications: duplicate messages #11131

Closed
agjohnson opened this issue Feb 21, 2024 · 16 comments · Fixed by #11213
Closed

Notifications: duplicate messages #11131

agjohnson opened this issue Feb 21, 2024 · 16 comments · Fixed by #11213
Assignees
Labels
Accepted Accepted issue on our roadmap Bug A bug

Comments

@agjohnson
Copy link
Contributor

agjohnson commented Feb 21, 2024

Noticed here:
https://beta.readthedocs.org/projects/xclim/builds/23494705/

The error messages seem to stack here:

image

Also not sure why the string interpolation fails for these, I've noticed that in a few spots though.

On the current dashboard there are no notifications though, not sure why.

https://readthedocs.org/projects/xclim/builds/23494705/

@agjohnson agjohnson added Bug A bug Accepted Accepted issue on our roadmap labels Feb 21, 2024
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Feb 21, 2024
@agjohnson
Copy link
Contributor Author

cc @humitos I don't know if we've fixed this yet or not

@humitos
Copy link
Member

humitos commented Feb 21, 2024

Without jumping too much on this yet:

  1. for some reason the build is not being reset properly when it's retried. We have code that cleanup all the notifications:
    def reset(self):
    """
    Reset the build so it can be re-used when re-trying.
    Dates and states are usually overridden by the build,
    we care more about deleting the commands.
    """
    self.state = BUILD_STATE_TRIGGERED
    self.status = ''
    self.success = True
    self.output = ''
    self.error = ''
    self.exit_code = None
    self.builder = ''
    self.cold_storage = False
    self.commands.all().delete()
    self.notifications.all().delete()
    self.save()
  2. since the notification is exactly the same, it should be de-duplicated when it's added. However, when checking the APIv2 view (the one used by the builders), it doesn't seem we are using our custom .add() method that de-duplicates notifications:
    def perform_create(self, serializer):
    """Restrict creation to notifications attached to the project's builds from the api key."""
    attached_to = serializer.validated_data["attached_to"]
    build_api_key = self.request.build_api_key
    project_slug = None
    if isinstance(attached_to, Build):
    project_slug = attached_to.project.slug
    elif isinstance(attached_to, Project):
    project_slug = attached_to.slug
    # Limit the permissions to create a notification on this object only if the API key
    # is attached to the related project
    if not project_slug or build_api_key.project.slug != project_slug:
    raise PermissionDenied()
    return super().perform_create(serializer)

So, working on 2) to call .add() should be the first point to attack here. I suppose we should overwrite the api.v2.NotificationSerializer.save() method to call our custom .add().

@humitos
Copy link
Member

humitos commented Feb 21, 2024

On the current dashboard there are no notifications though, not sure why.
readthedocs.org/projects/xclim/builds/23494705

I think this is because we implemented permissions on notifications in the last deploy (see #11117) and there may be a bug on it because those notifications should be shown to all the users, since the build is public.

@humitos
Copy link
Member

humitos commented Feb 21, 2024

Also not sure why the string interpolation fails for these, I've noticed that in a few spots though.

In [12]: Notification.objects.filter(message_id=BuildMaxConcurrencyError.LIMIT_REACHED, format_values={}).count()
Out[12]: 4595

In [13]: Notification.objects.filter(message_id=BuildMaxConcurrencyError.LIMIT_REACHED).exclude(format_values={}).count()
Out[13]: 43746

It seems somewhere we are creating this notification without the proper format_values. 10% of them are broken. I suppose there should be a Celery workflow that we are not handling correctly here. I doubt it's this one in particular

if concurrency_limit_reached:
# By calling ``retry`` Celery will raise an exception and call ``on_retry``.
# NOTE: autoretry_for doesn't work with exceptions raised from before_start,
# it only works if they are raised from the run/execute method.
log.info("Concurrency limit reached, retrying task.")
self.retry(
exc=BuildMaxConcurrencyError(
BuildMaxConcurrencyError.LIMIT_REACHED,
format_values={
"limit": max_concurrent_builds,
},
)
)

@agjohnson
Copy link
Contributor Author

agjohnson commented Mar 2, 2024

This is probably worth a separate issue, but I've noticed these build concurrency notifications on a number of builds too. Shouldn't these notifications clear after the build is successfully started?

image

I've noticed these in a few spots, but it's hard for me to tell if this should be resolved now or not. The above error is on my local instance on a finished build -- same with the original errors I posted here.

I suppose also these should be warning level notifications? That would at least help make it clear that the build is not failed.

@humitos
Copy link
Member

humitos commented Mar 8, 2024

I suppose also these should be warning level notifications? That would at least help make it clear that the build is not failed.

I've done this in #11196

@humitos
Copy link
Member

humitos commented Mar 8, 2024

Shouldn't these notifications clear after the build is successfully started?

Yes, I've mentioned this in 1) from my previous comment #11131 (comment)

@humitos
Copy link
Member

humitos commented Mar 8, 2024

So, working on 2) to call .add() should be the first point to attack here. I suppose we should overwrite the api.v2.NotificationSerializer.save() method to call our custom .add().

This is done in #11197

@humitos
Copy link
Member

humitos commented Mar 8, 2024

Now, with the two PRs I've opened I think this issue shouldn't be an issue anymore. However, 1) could be still present sometimes due to network issues/web instance congestion or similar scenarios where the reset API call fails to perform. That said, I'd would move forward with what we have keeping the issue open for now in case we continue seeing it but I wouldn't invest too much time on digging further on debugging for now.

@humitos humitos moved this from Planned to Needs review in 📍Roadmap Mar 8, 2024
@agjohnson
Copy link
Contributor Author

Yes, I've mentioned this in 1) from my previous comment #11131 (comment)

That comment implied to me that the messages would only be deduplicated with the fix -- ie:

since the notification is exactly the same, it should be de-duplicated when it's added.

Are you saying that this fix is also what will dismiss the notification once the build has started?

To clarify, what I would expect here is that while the build is still queued that warning message should show, but once the build is no longer queued due to concurrency limits the message will disappear.

@humitos
Copy link
Member

humitos commented Mar 9, 2024

what I would expect here is that while the build is still queued that warning message should show, but once the build is no longer queued due to concurrency limits the message will disappear.

Yes, this is how it should work.

However, even with all these fixes... the scenario I mentioned before, may be still present on some edge cases 👇🏼

However, 1) could be still present sometimes due to network issues/web instance congestion or similar scenarios where the reset API call fails to perform.

humitos added a commit that referenced this issue Mar 11, 2024
* Notifications: de-duplicate them when using APIv2 from builders

Related #11131

* Update readthedocs/notifications/querysets.py

Co-authored-by: Santos Gallegos <[email protected]>

---------

Co-authored-by: Santos Gallegos <[email protected]>
@humitos
Copy link
Member

humitos commented Mar 12, 2024

I found this issue in https://readthedocs.org/projects/docs/builds/23722719/ as well. This was a recent build in our own documentation. I still don't understand why this is happening, but it's worth investigating it a little more.

@humitos
Copy link
Member

humitos commented Mar 12, 2024

We should probably run a small Python code from Django shell to remove these notifications from builds that are in finished state.

@agjohnson agjohnson moved this from Needs review to In progress in 📍Roadmap Mar 12, 2024
@humitos
Copy link
Member

humitos commented Mar 13, 2024

This would be Python code we could run in production to remove these invalid notifications from builds:

from readthedocs.builds.constants import (
    BUILD_STATE_FINISHED,
    BUILD_STATE_CANCELLED,
)
from readthedocs.doc_builder.exceptions import BuildMaxConcurrencyError


for n in Notification.objects.filter(message_id=BuildMaxConcurrencyError.LIMIT_REACHED).prefetch_related("attached_to"):
  if n.attached_to.state in (BUILD_STATE_FINISHED, BUILD_STATE_CANCELLED):
    n.delete()

@humitos
Copy link
Member

humitos commented Mar 13, 2024

I think I found why we are not resetting the build in some cases:

def _reset_build(self):
# Reset build only if it has some commands already.
if self.data.build.get("commands"):
log.info("Resetting build.")
self.data.api_client.build(self.data.build["id"]).reset.post()

We are only resetting it if has at least one command already. We should remove that restriction 👍🏼

humitos added a commit that referenced this issue Mar 13, 2024
@humitos humitos moved this from In progress to Needs review in 📍Roadmap Mar 13, 2024
humitos added a commit that referenced this issue Mar 18, 2024
* Build: always reset the build before building

Closes #11131

* Mock the APIv2 reset endpoint for testing

* Update tests
@github-project-automation github-project-automation bot moved this from Needs review to Done in 📍Roadmap Mar 18, 2024
@humitos
Copy link
Member

humitos commented Mar 19, 2024

The duplicated "concurrecy limit" notifications deleted and they should not appear again in new builds. As example, the old build linked in this issue doesn't show it anymore: https://beta.readthedocs.org/projects/xclim/builds/23494705/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Bug A bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants