Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IteratorEnvironment fixes #4816

Open
wants to merge 6 commits into
base: 2.1
Choose a base branch
from

Conversation

kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Aug 21, 2024

closes #4810

  • Fixed some inconsistencies and bugs in (non-test) implementations of IteratorEnvironment
    • Fixed RFileScanners unexpected UnsupportedOperationException when calling env.getServiceEnv() or env.getConfig() (both deprecated) or env.getPluginEnv()
    • Some impls returned a new ServiceEnvironment for each call to getServiceEnv()/getPluginEnv(). This did not align as well with the IteratorEnvIT which would assert that these two calls return the same object. Memoize a supplier to the ServiceEnvironment to avoid creating the object unless needed and to return the same object each time.
    • Seems like isUserCompaction() and isFullMajorCompaction() are supposed to be throwing an IllegalStateException if it's not a major compaction. Added missing javadoc to isUserCompaction() and changed impls that were not adhering to these docs.
  • Added tests for these environments to IteratorEnvIT

Questions:

  1. Need confirmation, but it was my interpretation that a TableId doesn't make sense in the context of RFileScanner. Is this correct? ANSWERED
  2. I have getConfiguration() throw an error in RFileScanners IteratorEnvironment since I'm not sure how we would return the system config in this context and I don't think it would even make sense here. Does this seem right? ANSWERED
  3. Made the IteratorEnvironment for RFileScanner return null and not throw an error on env.getTableId() so something like: env.getPluginEnv().getConfiguration(env.getTableId()) can be done. Thoughts on this? ANSWERED

closes apache#4810
- Fixed some inconsistencies and bugs in (non-test) implementations of `IteratorEnvironment`
	- Fixed `RFileScanner`s unexpected `UnsupportedOperationException` when calling `env.getServiceEnv()` or `env.getConfig()` (both deprecated) or `env.getPluginEnv()`
	- Changed the impls to return the same object for calls to `getServiceEnv()`/`getPluginEnv()` (expected by `IteratorEnvIT`)
	- Fixed inconsistencies in functionality of `isUserCompaction()` and `isFullMajorCompaction()` when it is not a major compaction. Added missing javadoc to `isUserCompaction()`
- Added tests for these environments to `IteratorEnvIT`
@kevinrr888 kevinrr888 self-assigned this Aug 21, 2024
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at this, posting comments I have so far. Will circle back to this later.

@ctubbsii ctubbsii added this to the 2.1.4 milestone Aug 21, 2024
- Updated RFile javadocs
	- Added description of new supported functionality for `RFile.ScannerOptions.withTableProperties()`
	- Moved `@see` tag in javadoc for `RFile.{ScannerOptions/SummaryOptions}.withTableProperties(Map<String,String>)` which was incorrect
- Changed `RFileScanner.IterEnv.getTableId()` to return a dummy table id to be used for calling `RFileScanner.IterEnv.getConfiguration(TableId)`
- Added functionality for `RFileScanner.IterEnv.instantiate()`
- Added error msg to invalid uses of `RFileScanner.IterEnv.getServiceEnv()/getPluginEnv()` methods
- Minor RFileScanner changes
- Refactored IteratorEnvIT to ensure expected assertions are executed
Comment on lines 143 to 144
* True if compaction was user initiated. Will throw IllegalStateException if
* {@link #getIteratorScope()} != {@link IteratorScope#majc}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* True if compaction was user initiated. Will throw IllegalStateException if
* {@link #getIteratorScope()} != {@link IteratorScope#majc}.
* @return True if compaction was user initiated, false otherwise.
* @throws IllegalStateException if {@link #getIteratorScope()} != {@link IteratorScope#majc}

Comment on lines 178 to 181
* <p>
* Starting with {@code 2.1.4}, {@link PluginEnvironment#getConfiguration(TableId)} (obtained by
* {@link IteratorEnvironment#getPluginEnv()}) will return the properties passed in here.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these comments about "starting with 2.1.4" make sense to include here. It should have always worked this way. This is just fixing a bug. I don't think we need to document the implementation details of the corrected behavior for the bug fix in the javadoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will remove this in my next commit. Should I just remove the mention of 2.1.4 or that new part entirely?

Comment on lines 148 to 154
var ctx = context.get();
try {
return new ConfigurationCopy(
ctx.tableOperations().getConfiguration(ctx.getTableName(tableId.get())));
} catch (AccumuloException | TableNotFoundException e) {
throw new RuntimeException("Error getting table configuration", e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for implementing getConfig, and specifically as a copy of the table configuration? The behavior of this method in the interface was never well-defined, and was using an internal API and should never be called by user code. I'm not sure why it needs an implementation here, rather than just throwing an exception, like the deprecated default method does in the interface.

If we really do need this, the default method in the interface should implement it by calling the methods on getPluginEnv() and wrapping it similarly to what you did here, instead of that default method throwing an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it here since I thought that because it was deprecated but not yet removed, there should still be functionality for the method. The other impls of IteratorEnvironment still have this implemented; assumed same should be the case here. Less importantly, this also made testing the different impls of IteratorEnvironment in IteratorEnvIT more straight forward as each would call getConfig(), getServiceEnv(), and getPluginEnv(). If this shouldn't be implemented here, I can remove it, or I could change the default impl to use getPluginEnv() like you said. Whichever you think would be better.

Comment on lines 144 to 145
public PluginEnvironment getPluginEnv() {
return new ClientServiceEnvironmentImpl(context.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the method that should be implemented. The getServiceEnv() method only existed for backwards compatibility. The code should not rely on the interface implementation of getPluginEnv() calling the getServiceEnv() here, because that behavior is removed when the deprecated code is dropped. Instead, it's best to keep both methods, as before your change, so that when the deprecated one is dropped in subsequent versions, the correct implementation is kept.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was something I was unsure about when implementing this. Some IteratorEnvironments had both getServiceEnv() and getPluginEnv() (e.g., OfflineIterator) while some had just getServiceEnv() (e.g., TabletIteratorEnvironment), and I wasn't sure which to go with. I'll fix the impls to use both.

private static final String errorMsg =
"This scanner is unrelated to any table or accumulo instance;"
+ " it operates directly on files. Therefore, it can not support this operation.";
private static final TableId dummyTableId = TableId.of("RFileScannerFakeTableId");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed here: #4816 (comment). Either null or a dummy would work though.

@@ -311,14 +319,27 @@ public void updateScanIteratorOption(String iteratorName, String key, String val
}

private class IterEnv implements IteratorEnvironment {
private final Supplier<ServiceEnvironment> serviceEnvironment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents us from just using the implementation from super.getPluginEnv() and super.getServiceEnv()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IteratorEnvironment doesn't have any implementation for getServiceEnv() or getPluginEnv(). Or are you suggesting changing IteratorEnvironment?

Comment on lines +63 to +64
* @return true if the compaction is a full major compaction; false otherwise
* @throws IllegalStateException if {@link #getIteratorScope()} != {@link IteratorScope#majc}.
Copy link
Member

@ctubbsii ctubbsii Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would have preferred fixing the documentation to drop the exception case, rather than forcing an exception to be thrown in all the scan cases. false otherwise should be sufficient. Have you considered that as an alternative?

Copy link
Member Author

@kevinrr888 kevinrr888 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't looked into why an exception was thrown, I had just kept the functionality the same and consistent across impls. I assumed there was probably a reason for wanting all impls to throw an exception and didn't look any further.

Are you suggesting to drop the documentation about the exception case and revert the changes I made to adhere to the exception case, or are you suggesting changing the functionality to no longer throw the exception, or something else?

@ctubbsii
Copy link
Member

After a conversation with @kevinrr888, I added commit 5924b02 to incorporate some of the changes we discussed, that he intends to build on and polish up. The main points included were:

  • Push a reasonable getConfig() implementation up into IteratorEnvironment's default method (depends on the ability to have getTableId() return a null value)
  • Include concrete implementations of getPluginEnv() instead of just getServiceEnv() to make it easier to just drop the deprecated one when merging to newer branches
  • Consider using a Proxy instance of TableId instead of a real instance, so it isn't possible to be confused with any similar user-constructed TableId (may not work)
  • Update javadocs in RFile to make it clearer how properties set on the scanner builder (such as iterators) will be available to client-side iterators from the plugin environment

The main thing left unresolved after our conversation was how to fix the mismatch between the implementations and the javadoc for all the methods that inconsistently return null or throw IllegalStateException or UnsupportedOperationException, when calling a method that doesn't apply to a given situation (like asking for a table ID or table name when operating directly on RFiles or asking if it's a user-initiated compaction when it's a scan). Nearly every caller seems to have a unique set a requirements/implementation, and it's a bit hard to home in on the best implementation, especially for the default methods, and make sure the javadoc matches the behavior.

- Changed ClientSideIteratorScanner.getConfig() to use the safer getPluginEnv()
- Added getPluginEnv() impl to all IteratorEnvironments (besides test envs)
- Improved RFile.ScannerOptions.withTableProperties() javadoc to better explain how iterators will have access to these properties
- Opted for RFileScanner.IterEnv.getTableId() to return null instead of a dummy id
- Removed incorrect impl of RFileScanner.IterEnv.get(Service/Plugin)Env().getTableName() to instead just throw UnsupOpExc since the method doesn't make sense anyways.

Co-authored-by: Christopher Tubbs <[email protected]>
@kevinrr888
Copy link
Member Author

kevinrr888 commented Sep 12, 2024

Some comments about decisions in a68ec11

Comment about IteratorEnvironment.getConfig():

TLDR: opted for not having a default impl for IteratorEnvironment.getConfig() (keep default as throwing UOE)

It might be strange to have a default impl for this method but not others... If we do want a default impl for this that isnt just throwing a UOE, we should probably change the others too... But that should be done in a follow on if that is desired. Keeping the default as throwing a UOE keeps it consistent with the other default impls/more inline with the original approach (throw UOE by default, concrete impls should have an actual impl). The only IteratorEnvironment which would be using the default getConfig() would be ClientSideIteratorScanner. But we can just add the impl there (done in latest commit). It is also hard to know what would be a good default impl since its not exactly clear how getConfig() was originally working.

Comment about RFileScanner.IterEnv.getTableId()

TLDR: Opted for having RFileScanner.IterEnv.getTableId() return null

Proxy can only be used to create proxies for interfaces, not concrete classes, so can't be used for TableId. Opting to just return null with added javadoc.

Background: we want to support getConfiguration(TableId) since this is the method defined as returning the props set on the table and getConfiguration() should not be returning table props, so we cant substitute that method. So, something needs to be passed, but it doesn't matter what since it wont be used. The way getConfiguration(TableId) will be called is env.getPluginEnv().getConfiguration(env.getTableId()) (from javadocs for IteratorEnvironment.getPluginEnv()) So, we should have env.getTableId() returning something.

No solutions seem great, but Proxy is a no-go, a real dummy id may be confusing (e.g., a user may think that it is a real table id), null may lead to NPE. I figure null makes the most sense here. A NPE would only occur if the user is trying to do something with the table id which they shouldnt be doing anyways. It is also now noted in the javadocs that null will be returned so if a NPE does occur, it should be easy to see why.

Comment about UOE/ISE

Did not change anything further than I had already changed in my earlier commits relating to when/how UOE or ISE are thrown, how its documented etc. Some future considerations would be

  • IteratorEnvironment.getAuthorizations() currently documented as throwing UnsupportedOperationException if scope != scan (should probably be ISE if anything).
  • IteratorEnvironment.isFullMajorCompaction() and isUserCompaction() should maybe throw ISE in the default if the scope is majc
    • Maybe these shouldn't be throwing exceptions at all and only returning false... This would probably be the cleanest if possible. Would have to look more into why these are throwing exceptions in the first place.

Comment about RFileScanners createServiceEnv()

@ctubbsii suggested having this be an inner class instead of anon class. Leaving this comment as a reminder. Will do once the other more important changes are looked at; want to avoid confusing diff between commits for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFileScanner's IteratorEnvironment throws UnsupportedOperationException for getConfig() and getServiceEnv()
3 participants