-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reliable Domain Change Notification #777
Conversation
…tion * add domain notification version to shard
d8918a7
to
b45d016
Compare
9f2a2c3
to
f98bd44
Compare
common/persistence/dataInterfaces.go
Outdated
MetadataManager interface { | ||
Closeable | ||
CreateDomain(request *CreateDomainRequest) (*CreateDomainResponse, error) | ||
GetDomain(request *GetDomainRequest) (*GetDomainResponse, error) | ||
UpdateDomain(request *UpdateDomainRequest) error | ||
DeleteDomain(request *DeleteDomainRequest) error | ||
DeleteDomainByName(request *DeleteDomainByNameRequest) error | ||
ListDomain(request *ListDomainRequest) (*ListDomainResponse, error) | ||
GetMetadata() (int64, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only there to get domains_partition?
Why do we need this on the interface rather than just defined as a config knob for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely sure what do you mean.
the list api will list all domains in the new table.
get metadata method will return the metadate (notification version) from the new table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh got it. Then how about create a response struct for metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -77,6 +77,7 @@ type ( | |||
TaskMgr TaskManager | |||
HistoryMgr HistoryManager | |||
MetadataManager MetadataManager | |||
MetadataManagerV2 MetadataManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just have these 2 versions only at the persistence layer? I'm a little worried bubbling it up across all layers of application might cause problems and bugs. There are just too many changes because of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you see, i am using the same interface.
so, anywhere other than persistence (and domain frontend API and the domain cache) will not use the new metadata manager
is_global_domain boolean, -- indicating whether a domain is a global domain | ||
config_version bigint, -- indicating the version of domain config, excluding the failover / change of active cluster name | ||
failover_version bigint, -- indicating the version of active domain only, used for domain failover | ||
failover_notification_version bigint, -- indicating the last change related to domain failover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need 2 separate notification versions?
Previously we used to have db_version too, why we don't need that anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the failover_notification_version is actually your idea, tracking the last change which related to domain failover
the previous db_version is now the notification_version, managed by the metadata record
service/frontend/service.go
Outdated
@@ -88,6 +88,20 @@ func (s *Service) Start() { | |||
} | |||
metadata = persistence.NewMetadataPersistenceClient(metadata, base.GetMetricsClient()) | |||
|
|||
metadataV2, err := persistence.NewCassandraMetadataPersistenceV2(p.CassandraConfig.Hosts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly I want to avoid having 2 versions of metadata persistence passed through all layers of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should talk
@@ -167,27 +167,13 @@ func (e *historyEngineImpl) Start() { | |||
logging.LogHistoryEngineStartingEvent(e.logger) | |||
defer logging.LogHistoryEngineStartedEvent(e.logger) | |||
|
|||
e.registerDomainFailoverCallback() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are setting up the callback before starting the processors. This could be problematic if the callback is received before the processors are even started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the failover processor is independent of the normal processor
service/history/historyEngine.go
Outdated
domainNotificationVersion := e.shard.GetDomainCache().GetDomainNotificationVersion() | ||
shardDomainNotificationVersion := e.shard.GetDomainNotificationVersion() | ||
if domainNotificationVersion > shardDomainNotificationVersion { | ||
for _, domain := range e.shard.GetDomainCache().GetAllDomain() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic should live within domainCache. RegisterDomainCallback should take the last notification version and have logic to fire notifications since then for catch up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought the domain cache notification aims to be generic?
if so, then we should not put this domain failover specific logic in the domain cache.
move domain change catchup into domain cache
92285f3
to
66c401a
Compare
solve #734