Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notification improvements #1420

Open
fourtyplustwo opened this issue Jul 18, 2024 · 4 comments · May be fixed by #1748
Open

notification improvements #1420

fourtyplustwo opened this issue Jul 18, 2024 · 4 comments · May be fixed by #1748

Comments

@fourtyplustwo
Copy link
Contributor

fourtyplustwo commented Jul 18, 2024

There are various actions in data.all that happen asynchronously such as a share being processed, dataset and environment stacks being updates, health of shares being verified etc. All of these actions can fail for whatever reason. Currently when things fail users are not able to find out until they check the logs or inspect data.all UI.

I'd like to propose that all asynchronous actions in data.all that can fail would send notifications and also be integrated with the recent reminder service that was implemented for shares to remind users that approvals for shares are pending.

A list of actions that should send notifications and to whom they should be sent:

a) An approved share fails (or revoked) to become successfully processed - both dataset owner and requester team should be notified and reminded. Optionally DA admins should also be able to subscribe to these notifications.

b) Dataset or Environment stack fails to update environment admin team should be notified as well as the dataset owner team. Reminders should be sent. DA admins should be able to optionally subscribe as well.

c) Any background task such as catalog reindexing, share health verified - if it fails (throws an exception) then DA admins should be notified ONCE.

d) If a share becomes unhealthy both the requester and dataset owner should be notified and reminded. DA admins should be able to optionally subscribe also.

e) If attempting to re-apply a share fails both the requester and dataset owner should be notified ONCE. DA admins should be able to optionally subscribe also.

There are already a few tickets filed for this effort that we should combine into this major story:

#1251
#1299

There might be more that I was not able to find.

@noah-paige
Copy link
Contributor

Hi @zsaltys - also want to link these comments from an earlier discussion on persistent reminders feature that begin to start thinking through the above issue described

#1248 (comment)
#1248 (comment)

Ultimately I envision we would have a list of different "NotificationEvents" (i.e. StackFailedNotification, ShareUnHealthyNotification, CatalogIndexerFailedNotification, etc.) and so on that each have their own generic notification groups and reminder schedules depending on the event specified

Also this list of events should be easily extendable if tomorrow we wanted to add a notification for NewTableMetricsNotification OR NewTeamInvitedNotification for example.

I think it may be best to take some time and iron our a good generic and flexible design on how this notification system would look like in data.all. I think once we define it once it will be seamless to extend to more and more type of events or more and more type of delivery channels to send outbound notifications (on top of already in the UI and (optionally) via email)

Please let me know if you are thinking the same and let's use this issue here to discuss further on what that design could entail

@TejasRGitHub
Copy link
Contributor

Another small issue - Currently if the share is with a consumption role then the email notification body text contains something like - share request for dataset <DATASET_NAME> for principal ggief66q.

The principal name is not readable and it is the internal id used by data.all for that principal. Instead of id this should be replaced with a user friendly readable consumption role name

@TejasRGitHub
Copy link
Contributor

TejasRGitHub commented Dec 18, 2024

Proposal

Summarizing the notifications improvements which need to be made

  1. Send notifications ( both UI and email ) when a share fails
  2. Send notification ( both UI and email ) when a share verifier find the share in unhealthy state. Also send notification when the reapply for the share fails.
    • As a sub part of this, send email notification when a share which was previously in unhealthy state gets to a healthy state.
  3. Send weekly reminder notifications ( similar to the persistent email ) with a share info digest. This digest will contain
  • First Section can be about the shares which are in submitted state and need some action to be taken from the dataowner / stewards
  • Second section can be about share which belong to the owner and which are in unhealthy state.
  • Third section can be about the environment and datasets which are in unhealthy state.
  1. Dataset notifications
    • When a dataset is created in an environment ? Send it to the env admin team ? Dataset owner and steward team
    • When dataset gets into an unhealthy state ( CREATE_FAILED, UPDATE_ROLLBACK_COMPLETE, UPDATE_FAILED, etc ) then send email notifications to dataset owner and stewards ( also environment admin ? )
    • When tables are removed from dataset ( i.e. after synchronizing ) , send email to the dataset owner ?
    • Notify dataset owner and stewards when the S3 bucket / KMS registered with data.all for the bucket / Glue database change / get deleted , etc
  2. Environment notifications
    • When an environment is in unhealthy state ( CREATE_FAILED, UPDATE_ROLLBACK_COMPLETE, etc ) to env admin and org admins ?
  3. Notifications when any task fails
    • When any ECS task fails - scheduled ECS task only - then send a notification to the DAAdmins
  4. Notification Module Improvement - As highlighted by @noah-paige in this comment ( notification improvements #1420 (comment) ) , there is a need to refactor and design notifications such that there is base template which other modules can leverage and create notifications seemlessly.
  5. Send email notifications when a glue table belonging to a dataset has any updates - newly added table, removed table, table updates. Send an email notification to the dataset stewards and the owner when there are changes. Also, if there are shares on that dataset, send emails to the share owners ( requestors ) indicating that their shares might be affected if the requested tables by them are deleted.

Stage 1

There are instances in which someone approves a share, thinking it will succeed, only later to find out that the share failed. Also, since the bucket / kms and IAM policies can be modified in an environment account apart from data.all, there are instance in which the verifier detects unhealthy share but the share requestor and dataset owner are not aware. Apart from this, all ECS tasks which are scheduled fail sometime and this failure is not notified to data.all DAAdmins. Thus adding these notifications is very important.

Since these notifications are share related notifications and there already exist share notifications code base, in the first stage, additional notification support can be added. Following will be covered as a part of Stage 1,

  1. Sending ( UI and Email ) notifications when a share fails.
    • Both approver and requestor will be notified
  2. Sending ( UI and Email ) notifications when share verifier finds a share in unhealthy state. Same for share re-apply
    • Requestor of the share will get notified
  3. Sending ( Email ) notifications when an ECS task fails due to some reason
    • Share Verifier ECS Task
    • Share Reapplier ECS Task
    • Catalog Service ECS Task
    • Stack Update ECS Task ( Unable to add admin notifications for stack update tasks since env modules is in core and notifications are in modules. @noah-paige , @dlpzx )
    • Persistent Email Reminder ECS Task
    • Table Syncer ECS Task
  4. Weekly ( configurable ) reminder digest email which will combine all the share which are unhealthy, in submitted state and needs actions by the team, and environment, dataset which are in unhealthy state. This could leverage the existing code in persistent email reminders and then structure it to contain all this. Since persistent email reminders send emails about a particular share to all the team members, it so happens that a single user gets multiple emails about different shares. This can be annoying at times. With the digest, number of emails a user from a team gets for unsubmitted shares / unhealthy shares, etc will reduce and also all the actionable items will be in one single email.
  5. Send email notifications when a glue table belonging to a dataset has any updates - newly added table, removed table, table updates. Send an email notification to the dataset stewards and the owner when there are changes. Also, if there are shares on that dataset, send emails to the share owners ( requestors ) indicating that their shares might be affected if the requested tables by them are deleted.

Stage 2 ( WIP )

In this stage the notifications system will be refactored so that each module can hook up their own notifications (which will conform with some standards set by the notification module )

Currently notifications are only created for shares in the ShareNotificationService file. In order to add new notifications, ( lets say dataset notification) , one has to create a file DatasetNotificationService and then create functions for different types of notifications and write some repeated code to create notifications on the UI and one for email. If another transport type ( e.g. slack ) is added to the notifications module then every method created for notification - in the ShareNotificationService and DatasetNotificationService - has to be modified for the new transport type.

In the case of notifications created for reminders - with persistent email reminders and weekly email reminders tasks -, the code has to specifically find out each data.all resource ( shares, datasets, environments, etc ) and then create a custom email for them. For any additional data.all resource ( e.g. MLStudio ) which needs to be added to these notifications - persistent email / weekly reminders - new functions have to be added to fetch the details in the reminder tasks and then modify the email content. To avoid doing this every time a new resource has to be added to these email, the responsibility of fetching these resources should be handled by the modules themselves and then they should send the data ( with an agreed contract ) to the reminder task, which can then construct the email.

In order to solve above issues, there has to be

  1. A way to decouple notification transports ( i.e. UI , email , etc ) from the code which produces notification data
  2. Once a notification is created it has to conform to a certain standard ( e.g. NotificationEvent class ) and then passed onto the NotificationTransportManager, which will handle the task of sending that notification to appropriate destinations.
  3. For tasks like weekly reminders / persistent reminders, each module should implement a base notification template ( NotificationsBase ) and then provide implementation for the abstract methods.

Before diving into the design for notification improvement, the config.json has to be modified to accommodate new notifications ( dataset notifications, environment notifications , etc ). Also, any additional transport type should be added to the config.json.

  1. How should the configs look like ?
      "datasets_base": {
            "active": true,
            "features": {
               "**dataset_notifications**":  {
                   "email": {
                       "active": true,
                       "persistent_reminders": true
                   }
               }
....
        "shares_base": {
            "active": true,
            "features": {
                "show_share_logs": "admin-only",
                 "**share_notifications**": {
                    "email": {
                        "active": true,
                        "persistent_reminders": true,
                        "parameters": {
                            "group_notifications": true
                           }
                    }
                }
            },
        "notifications" :  {
               "email" : true/false,
               "slack" : true/false,
                "UI" : true/false,
          }
....

    "core": {
        "features": {
            "env_aws_actions": false,
            "cdk_pivot_role_multiple_environments_same_account": false,
            "enable_quicksight_monitoring": false,
            "show_stack_logs": "admin-only",
            "**environment_notifications**" : {
                   "email": {
                       "active": true,
                       "persistent_reminders": true
                   }
            }
        },

OR

"notifications": {
      "active" : "active/inactive"
      "shares_base" : {
                "email" : "active",
                "slack" : "inactive",
                "persistent_reminders (reminders)": true
        },
       "dataset_base" : {
                 "email" : "active",
                 "slack" : "inactive"
        } 
...
}

OR

[UPDATE] - Noah : +1 on this

"notifications": {
      "email" : "active",
      "slack" : "active"
      "shares_base" : {
                "active" : "active/inactive"
                "persistent_reminders (reminders)": true,
                 .....
        },
       "dataset_base" : {
               "active" : "active/inactive",
               "persistent_reminders (reminders)": true,
                .....
        } 
 .... 

}

Q1. Can transport and notification separated in a meaningful way ?

In order to de-couple notification from the transport of sending the notification, a NotificationTransport manager can be used as described by @noah-paige in his comment.

""" 
Notification Transport will be registered while initializing the NotificationsModule
"""

class NotificationTransport(ABC):
     @abc.abstractVariable
      NotificationTransportType: TransportType

      @abc.abstractmethod
      # This method has to be implemented by the NotificationTransport ( e.g. email , slack, sms, UI notification, etc ) 
       def send_notification(self, NotificationEvent(Interface) ):
               raise NotImplementedError 
       
       def send_notification_async(self, NotificationEvent(Interface) ):
               raise NotImplementedError 

class NotificationTransportManager():
         self._registered_transport_types: List[NotificationTransport] = []
        
         def **send_notifications_for_type**(self, NotificationEvent, NotificationTransportTypes: List[TransportTypes])
              # First, check if the Notification module for the NotificationEvent is enabled 
                
               # If notification tranports are specified as a parameter use those types and send notification via those types
               notification_tranports = self._registered_transport_types
               if NotificationTransportTypes != None:
                     notification_tranports = [ notificationTranpsport for notificationTransport in NotificationTransportTypes if notificationTransport in self._registered_transport_types]

              # Send notifications for different types of transport types
              # e.g. one way could be
               for transport_type in self._registered_transport_types:
                        transport_type.send_notification(NotificationEvent)

NotificationTransport's concrete implementation will be initialized in the __init__.py of notifications module at the time of initialization. Thus, at runtime, an instance of each NotificationTranport() will be registered with the NotificationTransportManager.

The TransportTypes( Enum ) will be an enum class describing the types of notifications( e.g. email, slack, UI , etc ). These transport types can later be used as a filter to send notification only via certain transport. e.g. sending email notification but not UI notifications.

Q2. How should the notification template look like ?
Every time a notification service is created for any of the module, it should atleast implement a method to send resources which have to be included in the reminder notifications.

e.g

class NotificationsBase:

     @abc.abstractMethod
     def  get_resources_for_reminders() -> List[NotificationEvent]
            raise NotImplemetedError()

Q3. What is most common and important things in the template for notification event ?

Any event can be described by the following NotificationEvent class. In the future if needed this class can be extended..

class NotificationEvent:      
        def __init__():
              self.message = message
              self.receivers = receivers 
              self.title / subject = title
              self.notification_module = module_name ( 'shares_base', 'dataset_base', 'environment', etc notification ) 
              self.resource_status = Any string / enum describing notification ( Optional for email notifications but required for UI notifications)
              self.resource_identifier = Any identifier for that event ( e.g. {shareUri}|{datasetUri} )
              self.uri_path = contains path of the weblink ( e.g. /console/shares/{shareUri} ) 
              self.async_process = true/false ( This is used for determining if the Notification should be send synchronously of asynchronously ) 

Q4. Simplifying Share Notification msg / email body, subject so as to avoid repeated template ?
Basically creating a template where based on the share ( resource ) notification type, the email body/ msg will be structured - avoiding repeatedly using the same email content messages.

Part of refactoring,

  1. Move share_notifications from dataset_base to shares_base.
  2. Add SNS as another notification transport mechanism and refactor code where SNS is used creating alarms when some service/ ECS task fails

@TejasRGitHub TejasRGitHub linked a pull request Jan 2, 2025 that will close this issue
@noah-paige noah-paige linked a pull request Jan 10, 2025 that will close this issue
@TejasRGitHub
Copy link
Contributor

TejasRGitHub commented Jan 16, 2025

@dlpzx , @noah-paige , do you know why we converted notifications from core to modules ? I wanted to add admin notifications for stacks but I noticed that notifications is in modules and there should not be imports from modules to core

TejasRGitHub pushed a commit to TejasRGitHub/aws-dataall that referenced this issue Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

3 participants