Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. #7081

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

SaketaChalamchala
Copy link
Contributor

@SaketaChalamchala SaketaChalamchala commented Aug 16, 2024

What changes were proposed in this pull request?

Right now we cannot use a single client to read/write encrypted files to multiple Ozone clusters across trusted realms.
This is resolved in HDFS. HDFS overrides the implementation of hadoop.fs.FileSystem.getServerDefaults() where Namenode returns it's default keyProviderUri to the client. The client then uses Namenodes keyProvider instead of it's local keyProvider to correctly read/write encrypted files.
Proposed solution does the same in Ozone. OzoneManager returns it's server default configuration which includes it's keyProviderUri to the client. The client then uses the OM's keyProviderUri if available to get encryption keys before defaulting to it's local keyProvider. This enables the client to correctly encrypt/decrypt files from multiple Ozone clusters.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11227

How was this patch tested?

Manual test.

Before the fix.
HDDS-11227_before_fix

After the fix.
HDDS-11227_after_fix

Created a separate JIRA for a docker test: https://issues.apache.org/jira/browse/HDDS-11332

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR self contained? I.e. it looks like there should be extra code for the client to call getServerDefaults() and determine the key provider path to be used?

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if you intend to implement Hadoop's getServerDefaults() FileSystem API, you'd need to use Hadoop's FsServerDefaults class rather than creating a new one. Or at least, extends Hadoop's FsServerDefaults base class.

@SaketaChalamchala SaketaChalamchala marked this pull request as ready for review August 16, 2024 23:19
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest change makes sense. But you'll have to move the getServerDefaults() from OzoneFileSystem/RootedOzoneFileSystem to BasicOzoneFileSystem/BasicRootedOzoneFileSystem. Unfortunately we have this mysterious many level architecture to support Hadoop 2/3 runtime. As a rule of thumb if a FileSystem API exists for a long time (such as getServerDefaults()) they most likely exist in Hadoop 2 as well so BasicOzoneFileSystem/BasicRootedOzoneFileSystem is the right place to add them.

@@ -67,6 +68,11 @@ public RootedOzoneFileSystem() {
forceRecovery = Strings.isNullOrEmpty(force) ? false : Boolean.parseBoolean(force);
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to BasicRootedOzoneFileSystem

@@ -65,6 +66,11 @@ public OzoneFileSystem() {
forceRecovery = Strings.isNullOrEmpty(force) ? false : Boolean.parseBoolean(force);
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to BasicOzoneFileSystem

@SaketaChalamchala
Copy link
Contributor Author

Is this PR self contained? I.e. it looks like there should be extra code for the client to call getServerDefaults() and determine the key provider path to be used?

The client call getKeyProviderUri() which internally calls the getServerDetails() method to get the URI from the OzoneManager. This is implemented similarly in HDFS https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L3180-L3183

@SaketaChalamchala
Copy link
Contributor Author

@fapifta could you please take a look as well?

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. A little more test coverage would be even better.

@@ -2591,11 +2603,22 @@ public KeyProvider call() throws Exception {
}
}

@Override
public OzoneFsServerDefaults getServerDefaults() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to have a test to verify this where it makes sure a second call within the expiration time to this API does not invoke ozoneManagerClient.getServerDefaults()

@@ -1107,6 +1108,11 @@ public short getDefaultReplication() {
return adapter.getDefaultReplication();
}

@Override
public OzoneFsServerDefaults getServerDefaults() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need tests at file system level for ofs and o3fs.

@jojochuang jojochuang merged commit e3b590e into apache:master Aug 22, 2024
41 checks passed
@jojochuang
Copy link
Contributor

PR merged. Thank you, @SaketaChalamchala. Please proceed with the regression test in the follow-up task.

errose28 added a commit that referenced this pull request Aug 23, 2024
* master:
  HDDS-11081. Use thread-local instance of FileSystem in Freon tests (#7091)
  HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (#7089)
  Mark TestPipelineManagerMXBean#testPipelineInfo as flaky
  Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky
  HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky
  HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky
  HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky
  HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (#7081)
  HDDS-11316. Improve Create Key and Chunk IO Dashboards (#7075)
  HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (#7047)
errose28 added a commit to errose28/ozone that referenced this pull request Aug 26, 2024
…an-on-error

* HDDS-10239-container-reconciliation: (428 commits)
  HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091)
  HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089)
  Mark TestPipelineManagerMXBean#testPipelineInfo as flaky
  Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky
  HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky
  HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky
  HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky
  HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081)
  HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075)
  HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047)
  HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076)
  HDDS-11331. Fix Datanode unable to report for a long time (apache#7090)
  HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102)
  HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103)
  HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974)
  HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035)
  HDDS-9790. Add tests for Overview page (apache#6983)
  HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074)
  HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098)
  HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
errose28 added a commit to errose28/ozone that referenced this pull request Aug 28, 2024
…rrupt-files

* HDDS-10239-container-reconciliation: (61 commits)
  HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091)
  HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089)
  Mark TestPipelineManagerMXBean#testPipelineInfo as flaky
  Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky
  HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky
  HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky
  HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky
  HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081)
  HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075)
  HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047)
  HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076)
  HDDS-11331. Fix Datanode unable to report for a long time (apache#7090)
  HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102)
  HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103)
  HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974)
  HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035)
  HDDS-9790. Add tests for Overview page (apache#6983)
  HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074)
  HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098)
  HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
<property>
<name>ozone.client.server-defaults.validity.period.ms</name>
<tag>OZONE, CLIENT, SECURITY</tag>
<value>3600000</value>
Copy link
Contributor

@hemantk-12 hemantk-12 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should add the default unit otherwise there will be an unnecessary log like the one below.

24/09/18 15:43:05 INFO Configuration.deprecation: No unit for ozone.client.server-defaults.validity.period.ms(3600000) assuming MILLISECONDS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And .ms should be omitted from the property name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants