Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace AbstractFateStore.createDummyLockID() #4904

Open
kevinrr888 opened this issue Sep 19, 2024 · 0 comments · May be fixed by #5028
Open

Replace AbstractFateStore.createDummyLockID() #4904

kevinrr888 opened this issue Sep 19, 2024 · 0 comments · May be fixed by #5028
Assignees
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Milestone

Comments

@kevinrr888
Copy link
Member

Is your feature request related to a problem? Please describe.
From the changes from #4524, reserving a transaction now indicates it is reserved with a FateReservation object which has a LockID. From using the stores from the context of the Manager, this is needed for identifying reservations held by a dead Manager so they can be unreserved. However, the stores are not always used in the context of a Manager. One example is testing. As it is right now, whenever a store needs to make a reservation, a LockID is required. The LockID only makes sense in the context of the Manager, so currently a dummy lock is made when a store is created outside of the context of a Manager. This dummy lock exists so reservations can still be made.

Describe the solution you'd like
A path could be created in ZK for utilities to get a ZK lock (proposed by @keith-turner in #4524 (comment)). This could replace the AbstractFateStore.createDummyLockID() method. This could also be used to change some of the admin fate commands (those that require the Manager to be down: fail and delete) to no longer require the Manager to be down. Can instead get a LockID at this utility path and reserve the transaction they operate on instead of requiring the Manager to be down. At the same time working on this, if the store is used in a readonly way, then there is no need for a LockID as it will never reserve. In the changes in #4524, a dummy lock is created and passed to the store for all use cases of the store outside of the Manager. This should be looked at more closely and only use a LockID if the store will be used to write data. If the store is not expected to write, it should fail on write operations, which it won't currently.

So, the changes/solution are:

  • Create a path in ZK for utilities to get a ZK lock
  • Change admin fate fail and admin fate delete commands to get a LockID at this path and no longer require the Manager to be down. This will also require changes to FateOpsCommandsIT
  • Only use a LockID for a store if write operations are expected. The store should fail on write operations if writes are not expected.
@kevinrr888 kevinrr888 added the enhancement This issue describes a new feature, improvement, or optimization. label Sep 19, 2024
@kevinrr888 kevinrr888 added this to the 4.0.0 milestone Sep 19, 2024
@kevinrr888 kevinrr888 changed the title Deprecate AbstractFateStore.createDummyLockID() Replace AbstractFateStore.createDummyLockID() Sep 19, 2024
@kevinrr888 kevinrr888 self-assigned this Oct 18, 2024
kevinrr888 added a commit to kevinrr888/accumulo that referenced this issue Oct 31, 2024
This commit makes several improvements/fixes: improvements to the `admin fate` util which were made possible from apache#4524, replaced incorrect use of `createDummyLockID()` in real code (now only used in tests), `<User/Meta>FateStore` now support a null lock id if they will be used as read-only stores: write ops will fail on a store with a null lock, and some other misc. changes.

Full list of changes:
- Removed the check for a dead Manager in the Admin fate util (AdminUtil) which was checked before `admin fate delete <tx>` or `admin fate fail <tx>` was able to run. This check is no longer needed with the changes from apache#4524. apache#4524 moved reservations out of Manager memory into the FATE table (for UserFateStore) and into ZK (for MetaFateStore). Prior to this, the Admin process would have no way of knowing if the Manager process had a transaction reserved, so the Manager had to be shutdown to ensure it was not. But now that reservations are visible to any process, we can try to reserve the transaction in Admin, and if it cannot be reserved and deleted/failed in a reasonable time, we let the user know that the Manager would need to be shutdown if deleting/failing the transaction is still desired.
	- This has several benefits:
	- It is one less thing to worry about when implementing multiple managers in the future since Admin assumes only one Manager for these commands. However, there is still the case where the Manager may
	keep a transaction reserved for a long period of time and the Admin can never reserve it. In this case, we inform the user that the transaction could not be deleted/failed and that if deleting/failing
	is still desired, the Manager may need to be shutdown.
	- It covers a potential issue in the previously existing code where there was nothing stopping or ensuring a Manager is not started after the check is already performed in Admin but before the delete/
	fail was executed.
	- It also should make the commands easier to use now since the Manager is not required to be shutdown before use.
- Changes and adds some tests for `admin fate fail` and `admin fate delete`: ensures the Manager is not required to be down to fail/delete a transaction, and ensures that if the Manager does have a transaction reserved, admin will fail to reserve and fail/delete the transaction.
- Another change which was needed as a prerequisite for the above changes was creating a ZK lock for Admin so transactions can be properly reserved by the command. Added new constant `Constants.ZADMIN_LOCK = "/admin/lock"`, changed `ServiceLockPaths`, and added `Admin.createAdminLock()` to support this
- New class `TestLock` (in test package) which is used by tests to create a real ZK lock, or a dummy one. Removed `createDummyLockID()` from `AbstractFateStore` (moved to TestLock), and `createDummyLock()` is now only used in test code. Added new constant `ZTEST_LOCK = "/test/lock"`, changed `ServiceLockPaths`, and added `createTestLock()` which is used to create a real lock id (one held in ZK) which is needed for some tests.
	- This fixes an unexpected failure that could have occurred for `ExternalCompaction_1_IT`. Was using a dummy lock for the store before and the fate data was being stored in the same locations that the
	Manager uses for it's fate data. The DeadReservationCleaner running in Manager would have cleaned up reservations created using this store if it ran when reservations were present. Now the test creates
	a real ZK lock so the DeadReservationCleaner won't clean these up unexpectedly.
- Stores now support a null lock id for the situation where they will be used as read-only stores. A store with a null lock id will fail on write ops. Changed all existing uses of stores to only have a lock id if writes will occur (previously, all instances of the stores had a lock id).
- Removed unused or unneccesary constructors for AbstractFateStore, MetaFateStore, UserFateStore
- Ensured all tests changed, all FATE tests, and sunny day tests still pass

closes apache#4904
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant