Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable Domain Change Notification #777

Merged
merged 26 commits into from
Jun 5, 2018
Merged

Reliable Domain Change Notification #777

merged 26 commits into from
Jun 5, 2018

Conversation

wxing1292
Copy link
Contributor

@wxing1292 wxing1292 commented May 25, 2018

  • add new domain v2 table, with additional 2 parameter: 1. domain change notification version, 2 failover change notification version
  • add domain notification version to shard to track the domain change
  • modify the domain cache and shard for reliable notification

solve #734

@wxing1292 wxing1292 force-pushed the domain-mirror branch 2 times, most recently from d8918a7 to b45d016 Compare May 25, 2018 20:53
@wxing1292 wxing1292 force-pushed the domain-mirror branch 3 times, most recently from 9f2a2c3 to f98bd44 Compare May 25, 2018 21:31
MetadataManager interface {
Closeable
CreateDomain(request *CreateDomainRequest) (*CreateDomainResponse, error)
GetDomain(request *GetDomainRequest) (*GetDomainResponse, error)
UpdateDomain(request *UpdateDomainRequest) error
DeleteDomain(request *DeleteDomainRequest) error
DeleteDomainByName(request *DeleteDomainByNameRequest) error
ListDomain(request *ListDomainRequest) (*ListDomainResponse, error)
GetMetadata() (int64, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only there to get domains_partition?
Why do we need this on the interface rather than just defined as a config knob for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely sure what do you mean.

the list api will list all domains in the new table.
get metadata method will return the metadate (notification version) from the new table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it. Then how about create a response struct for metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -77,6 +77,7 @@ type (
TaskMgr TaskManager
HistoryMgr HistoryManager
MetadataManager MetadataManager
MetadataManagerV2 MetadataManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have these 2 versions only at the persistence layer? I'm a little worried bubbling it up across all layers of application might cause problems and bugs. There are just too many changes because of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you see, i am using the same interface.
so, anywhere other than persistence (and domain frontend API and the domain cache) will not use the new metadata manager

is_global_domain boolean, -- indicating whether a domain is a global domain
config_version bigint, -- indicating the version of domain config, excluding the failover / change of active cluster name
failover_version bigint, -- indicating the version of active domain only, used for domain failover
failover_notification_version bigint, -- indicating the last change related to domain failover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need 2 separate notification versions?
Previously we used to have db_version too, why we don't need that anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failover_notification_version is actually your idea, tracking the last change which related to domain failover

the previous db_version is now the notification_version, managed by the metadata record

@@ -88,6 +88,20 @@ func (s *Service) Start() {
}
metadata = persistence.NewMetadataPersistenceClient(metadata, base.GetMetricsClient())

metadataV2, err := persistence.NewCassandraMetadataPersistenceV2(p.CassandraConfig.Hosts,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly I want to avoid having 2 versions of metadata persistence passed through all layers of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should talk

@@ -167,27 +167,13 @@ func (e *historyEngineImpl) Start() {
logging.LogHistoryEngineStartingEvent(e.logger)
defer logging.LogHistoryEngineStartedEvent(e.logger)

e.registerDomainFailoverCallback()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are setting up the callback before starting the processors. This could be problematic if the callback is received before the processors are even started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failover processor is independent of the normal processor

domainNotificationVersion := e.shard.GetDomainCache().GetDomainNotificationVersion()
shardDomainNotificationVersion := e.shard.GetDomainNotificationVersion()
if domainNotificationVersion > shardDomainNotificationVersion {
for _, domain := range e.shard.GetDomainCache().GetAllDomain() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic should live within domainCache. RegisterDomainCallback should take the last notification version and have logic to fire notifications since then for catch up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought the domain cache notification aims to be generic?
if so, then we should not put this domain failover specific logic in the domain cache.

move domain change catchup into domain cache
@wxing1292 wxing1292 force-pushed the domain-mirror branch 2 times, most recently from 92285f3 to 66c401a Compare June 5, 2018 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants