Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-144: Making SchemaRegistry implementation configurable #14102

Open
wants to merge 141 commits into
base: master
Choose a base branch
from

Conversation

aparajita89
Copy link

Fixes #14101

Motivation

SchemaRegistryService is hardcoded to use SchemaRegistryServiceImpl in org.apache.pulsar.broker.service.schema.SchemaRegistryService#create

Modifications

broker.conf has a new config called schemaRegistryClassName which will have the name of the class to be used. the class is instantiated in SchemaRegistryService interface's static create method via reflection.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests which create a producer or consumer successfully.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: yes
  • The default values of configurations: yes
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs?

  • doc-required
    this PR adds a new config which needs to be explained in the docs

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Feb 3, 2022
@@ -180,6 +180,8 @@ maxTopicsPerNamespace=0
# 'is_allow_auto_update_schema' of namespace policy.
isAllowAutoUpdateSchemaEnabled=true

schemaRegistryClass=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have a description, by default pulsar will use the internal schema registry which is based on the bookkeeper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui i have added it now

@@ -42,19 +44,29 @@
return checkers;
}

static SchemaRegistryService create(SchemaStorage schemaStorage, Set<String> schemaRegistryCompatibilityCheckers) {
static SchemaRegistryService create(String schemaRegistryName, SchemaStorage schemaStorage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static SchemaRegistryService create(String schemaRegistryName, SchemaStorage schemaStorage,
static SchemaRegistryService create(String schemaRegistryClassName, SchemaStorage schemaStorage,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is updated

Comment on lines 55 to 56
return SchemaRegistryServiceWithSchemaDataValidator
.of(new SchemaRegistryServiceImpl(schemaStorage, checkers));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the code format there.

return SchemaRegistryServiceWithSchemaDataValidator
.of(new SchemaRegistryServiceImpl(schemaStorage, checkers));
} else {
return (SchemaRegistryService) Class.forName(schemaRegistryName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have a unit test to make sure the configuration works well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default case is already tested in the existing test cases. i'll add some more test cases to explicitly test this part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have updated this logic. we are using a default value in the config and continuing to use the schema validator wrapper class.
the reasoning was that any custom schema registry implementation would need to comply with existing usages of SchemaRegistryService interface, including pulsar-admin apis and the client connection flows. the SchemaRegistryServiceWithSchemaDataValidator wrapper class ensures that any input schemas are validated via avro/json/protobuf-native parser before the compatibility checks and schema storage interactions are applied.

@aparajita89
Copy link
Author

comments from yesterday's review by @merlimat and ankur:

  • schema storage might not be used in custom schema registry, it need not be passed in the constructor
  • custom schema registry will require additional configs which will be present in ServiceConfiguration but it's not being passed to the constructor right now
  • evaluate the existing approach once more keeping in mind the above points

@Huanli-Meng
Copy link
Contributor

@aparajita89 do we also need to update the broker.config file? Thanks
@Anonymitaet could you or Jun help follow up adding / updating docs about the PR? Thanks.

@aparajita89
Copy link
Author

@Huanli-Meng this is an optional config. if no value is provided in the broker.conf file then the existing flow would go through. imo, we can add this config to the documentation but there isn't a need to update the config files unless someone really requires it.

@aparajita89
Copy link
Author

@Anonymitaet could you let me know what changes i need to make for the documentation?

@Anonymitaet
Copy link
Member

@aparajita89 how about adding the parameter and its description here https://pulsar.apache.org/docs/en/next/reference-configuration/#standalone?

@aparajita89
Copy link
Author

@Anonymitaet thanks, i have added it in reference-configuration.md

Copy link
Contributor

@Huanli-Meng Huanli-Meng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aparajita89 only one minor comment. PTAL

@@ -235,6 +235,7 @@ Pulsar brokers are responsible for handling incoming messages from producers, di
|athenzDomainNames| Supported Athenz provider domain names(comma separated) for authentication ||
|exposePreciseBacklogInPrometheus| Enable expose the precise backlog stats, set false to use published counter and consumed counter to calculate, this would be more efficient but may be inaccurate. |false|
|schemaRegistryStorageClassName|The schema storage implementation used by this broker.|org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory|
|schemaRegistryClassName|Override the schema registry used by pulsar with a custom implementation. If this config is not provided, the default schema registry will be used.|org.apache.pulsar.broker.service.schema.SchemaRegistryServiceImpl|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the description of the parameter as below ?
Specify the schema registry to be used in Pulsar. If it is not set, the default schema registry is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i have updated it

Huanli-Meng
Huanli-Meng previously approved these changes Feb 11, 2022
Copy link
Contributor

@Huanli-Meng Huanli-Meng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve from technical writer's points of view. Let's wait for technical approval.

@Anonymitaet Anonymitaet added doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. and removed doc-required Your PR changes impact docs and you will update later. labels Feb 14, 2022
@codelipenghui codelipenghui added this to the 2.11.0 milestone Feb 15, 2022
codelipenghui
codelipenghui previously approved these changes Feb 15, 2022
} catch (Exception e) {
LOG.warn("Unable to create schema registry storage, defaulting to empty storage", e);
}
}
return new DefaultSchemaRegistryService();
}

void initialize(ServiceConfiguration configuration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this interface? It doesn't seem necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be needed in custom implementations of the schema registry where additional configs are required which are specific to the custom implementation. for example, we require a custom schema registry which needs to make HTTP calls to another service. configs specific to the schema registry, like the service endpoint, needs to be passed via ServiceConfiguration and set in the schema registry using this initialize method. this is the same convention being followed for other configs such as authorization provider as well.

315157973
315157973 previously approved these changes Feb 15, 2022
@@ -235,6 +235,7 @@ Pulsar brokers are responsible for handling incoming messages from producers, di
|athenzDomainNames| Supported Athenz provider domain names(comma separated) for authentication ||
|exposePreciseBacklogInPrometheus| Enable expose the precise backlog stats, set false to use published counter and consumed counter to calculate, this would be more efficient but may be inaccurate. |false|
|schemaRegistryStorageClassName|The schema storage implementation used by this broker.|org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory|
|schemaRegistryClassName|Specify the schema registry to be used in Pulsar. If it is not set, the default schema registry is used.|org.apache.pulsar.broker.service.schema.SchemaRegistryServiceImpl|

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc changes should be in recent pulsar versions as well right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to the latest doc as the change will be in newer versions

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @aparajita89

Thanks for your initiative.
Can you please start a PIP and a discussion on [email protected] for this feature?

@aparajita89
Copy link
Author

Hello @aparajita89

Thanks for your initiative. Can you please start a PIP and a discussion on [email protected] for this feature?

sure, will do that

@aparajita89
Copy link
Author

@eolivelli the PIP has been created, please take a look: #14395

@aparajita89
Copy link
Author

@eolivelli i have added the integration test

@@ -706,8 +707,9 @@ public void start() throws PulsarServerException {
this.startNamespaceService();

schemaStorage = createAndStartSchemaStorage();
schemaRegistryService = SchemaRegistryService.create(
schemaStorage, config.getSchemaRegistryCompatibilityCheckers());
ensureSchemaRegistryName(schemaStorage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createAndStartSchemaStorage will create schemaStorage, if don't set the schemaRegistryStorageClassName, it will throw NPE, so this method seem not useful or need to add a default SchemaRegistryStorageClassName

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@congbobo184 the default schema registry name is being set in ServiceConfiguration so NPE would not happen here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why we need check schemaStorage == null in ensureSchemaRegistryName

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was part of the existing logic. if schema storage is null then the SchemaRegistryServiceDisabled needs to be used. otherwise, SchemaRegistryServiceImpl needs to be used by default.
below is the expected behavior for each of the combinations for configs set in broker's conf file.

  • if schemaRegistryClassName == null and schemaRegistryStorageClassName == null then set schemaStorage to null and schemaRegistry to SchemaRegistryServiceDisabled
  • if schemaRegistryClassName != null and schemaRegistryStorageClassName == null then set schemaStorage to null and schemaRegistry to SchemaRegistryServiceDisabled
  • if schemaRegistryClassName == null and schemaRegistryStorageClassName != null then set schemaStorage to the configured object and schemaRegistry to SchemaRegistryServiceImpl
  • if schemaRegistryClassName != null and schemaRegistryStorageClassName != null then set schemaStorage to the configured object and schemaRegistry to the configured object

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the older code, this logic was implemented in SchemaRegistryService.create(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaStorage schemaStorage = (SchemaStorage) createMethod.invoke(factoryInstance, this);

in this code, createAndStartSchemaStorage will not create a schemaStorage is null. so why we should check ensureSchemaRegistryName and set default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, schemaStorage can never be null. i had only kept this null check to retain the existing logic. i have removed it now.

@codelipenghui
Copy link
Contributor

@eolivelli Please help review this PR again.

@codelipenghui
Copy link
Contributor

ping @eolivelli Please help review this PR.

result.complete(new SchemaAndMetadata(schemaId, schemaData,
storedSchemaCompletableFuture.get().get(version).get().version));
} catch (InterruptedException e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logger

result.complete(new SchemaAndMetadata(schemaId, schemaData,
storedSchemaCompletableFuture.get().get(versionInt).get().version));
} catch (InterruptedException e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logger

@@ -24,6 +24,11 @@
public class Constants {

public static final String GLOBAL_CLUSTER = "global";
public static final String SCHEMA_REGISTRY_CLASS_NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please declare the value here: "org.apache.pulsar.broker.service.schema.SchemaRegistryServiceDisabled"

@Slf4j
public class SchemaRegistryTest extends TopicMessagingBase {
protected String methodName;
private String SCHEMA_REGISTRY="org.apache.pulsar.broker.service.schema.MockSchemaRegistry";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static ? also please add a white space around the '=' char

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jun 18, 2022
@codelipenghui codelipenghui modified the milestones: 2.11.0, 2.12.0 Jul 26, 2022
@RobertIndie RobertIndie modified the milestones: 3.0.0, 3.1.0 Apr 10, 2023
@Technoboy- Technoboy- modified the milestones: 3.1.0, 3.2.0 Jul 31, 2023
@Technoboy- Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

making SchemaRegistryService configurable