Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close writer before soft deletion of searchindex #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jiangphcn
Copy link
Contributor

Overview

This fix is to refactor the logic for soft-deletion of search index. In the past, there is case where write.lock was stayed in search index directory and can't be moved. It is due to the fact that the index was not closed gracefully before moving/renaming search index. With this fix, the search index writer can be closed, and then search index can be moved to .deleted directory smoothly.

Testing recommendations

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.cloudant.clouseau.AnalyzerServiceSpec
Running com.cloudant.clouseau.AnalyzerServiceSpec
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.024 sec
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.238 sec
Running com.cloudant.clouseau.ClouseauTypeFactorySpec
Running com.cloudant.clouseau.ClouseauTypeFactorySpec
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.022 sec
Running com.cloudant.clouseau.IndexManagerServiceSpec
Running com.cloudant.clouseau.IndexManagerServiceSpec
2019-11-21 15:26:30 clouseau [INFO] foo.5432109876 Renaming '/Users/jiangph/couchdb/clouseau.dreyfus/clouseau/target/indexes/foo.5432109876' to '/Users/jiangph/couchdb/clouseau.dreyfus/clouseau/target/indexes/foo.20191121.072630.deleted.5432109876'
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.445 sec
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.465 sec
Running com.cloudant.clouseau.ClouseauQueryParserSpec
Running com.cloudant.clouseau.ClouseauQueryParserSpec
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.018 sec
Running com.cloudant.clouseau.IndexServiceSpec
Running com.cloudant.clouseau.IndexServiceSpec
Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.937 sec
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.954 sec
Running com.cloudant.clouseau.SupportedAnalyzersSpec
Running com.cloudant.clouseau.SupportedAnalyzersSpec
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Tests run: 42, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 sec

Results :

Tests run: 95, Failures: 0, Errors: 0, Skipped: 0

Related Issues or Pull Requests

apache/couchdb#2130

Checklist

Copy link
Contributor

@theburge theburge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me (with some very rusty Scala knowledge). I have a few comments on the approach, but I'll defer to Bob for a final opinion.

case e: IOException =>
warn("Error while closing writer", e)
ctx.args.writer.close()
case e: IOException => warn("Error while closing writer", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we have the correct close-then-rename order, do you think it's worth including the suggested approach from the Lucene 4.6.1 docs, which was something like:

 try {
   writer.close();
 } finally {
   if (IndexWriter.isLocked(directory)) {
     IndexWriter.unlock(directory);
   }
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I can add this.

val manager = self
node.spawn((_) => {
openTimer.time {
IndexService.start(node, ctx.args.config, path, "standard") match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not a bit wasteful? (Imagine a bulk deletion case: it dramatically increases the intensity of the work done.)

It seems to me that here we could:

  • Verify it's not in LRU
  • Directly execute the rename operation on the appropriate directory in this actor

(It looks to me like IndexService.softDelete() is already passed almost all of the context it requires, and it wouldn't be too difficult to call it from here either.)

A downside here would be additional error handling (e.g., if the directory doesn't exist, etc.). Another would be that we might increase the time taken to execute the 'soft_delete operations in this actor, and delay legitimate open operations, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of logic in IndexService.softDelete() can be moved to IndexManagerService.scala. Just one line ctx.args.writer.close() in https://github.com/cloudant-labs/clouseau/pull/25/files#diff-2ed27527cbacd7a17e06d19270f99182R185, this still needs IndexService.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"soft" delete should be handled the same as "hard" delete. A message to be sent from dreyfus to the IndexService actor which, on receipt, closes the index and renames its files then calls exit on its own pid. This ensures the correct order of processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not duplicate part of the OpenIndexMsg logic here. I note also that this doesn't put the Index in the LRU, so could potentially lead us to attempt to open the index multiple times.

@@ -34,6 +34,7 @@ case class RenamePathMsg(dbName: String)
case class CleanupDbMsg(dbName: String, activeSigs: List[String])
case class DiskSizeMsg(path: String)
case class CloseLRUByPathMsg(path: String)
case class SoftDeleteMsg(path: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should delete RenamePathMsg above.

val manager = self
node.spawn((_) => {
openTimer.time {
IndexService.start(node, ctx.args.config, path, "standard") match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"soft" delete should be handled the same as "hard" delete. A message to be sent from dreyfus to the IndexService actor which, on receipt, closes the index and renames its files then calls exit on its own pid. This ensures the correct order of processing.

@jiangphcn jiangphcn changed the title refactor soft deletion of searchindex close writer before soft deletion of searchindex Nov 21, 2019
@jiangphcn jiangphcn force-pushed the refactor-soft-deletion-searchindex branch 4 times, most recently from cd53287 to 105b805 Compare November 22, 2019 09:36
rename(srcDir, destDir)
case SoftDeletePathMsg(path: String) =>
logger.info("Soft-deleting " + path)
val pattern = Pattern.compile(path + "/([0-9a-f]+)$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a different pattern to the one we use in CleanupDbMsg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for CleanupDbMsg, the passed is pure database name while database shardname is passed for SoftDeletePathMsg, like shards/00000000-ffffffff/foo.1234567890. So the regexp is different for these two messages.

softDelete(file, path, includePattern)
}

logger.info(fileOrDir.getAbsolutePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

@@ -220,7 +227,16 @@ class IndexService(ctx: ServiceContext[IndexServiceArgs]) extends Service(ctx) w
ctx.args.writer.rollback()
} catch {
case e: AlreadyClosedException => 'ignored
case e: IOException => warn("Error while closing writer", e)
case e: IOException =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right pattern. rollback() is already a close() (but without the commit), so the fallback to a failed rollback() is just the isLocked/unlock portion (the finally clause).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let me remove close() line.

@@ -29,4 +35,33 @@ object Utils {
new BytesRef(string)
}

def rename(rootDir: File, dbName: String, sig: String) {
val logger = Logger.getLogger("clouseau.utils")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rootDir here is always the same, so don't pass it in, simply fetch it within this method. This will make it clearer which directory we are renaming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that there is noctx in Utils. So I keep this for now.

@jiangphcn
Copy link
Contributor Author

hi Bob @rnewson thanks for your comments. I addressed them.
In addition. I mentioned that the message from 105b805#diff-7a2bf8eb3cb4bef2dcc50a64d66703b1R127 was not received in IndexService. I made investigation, and found that it is possible that the Pid doesn't exist in concurrent situation. In other words, when database was deleted, I can see that https://github.com/cloudant-labs/clouseau/blob/master/src/main/scala/com/cloudant/clouseau/IndexManagerService.scala#L159 will be called. Whether message can be sent successfully from IndexManagerService to IndexServer depends on when process exits with normal.

For now, I select the approach of only closing process with pid, and then move rename back to IndexCleanupService. Would you please take a look at 3e3f686? Thank you

@jiangphcn jiangphcn force-pushed the refactor-soft-deletion-searchindex branch 5 times, most recently from d59a2ff to 61b636c Compare December 9, 2019 13:00
@jiangphcn jiangphcn force-pushed the refactor-soft-deletion-searchindex branch from 61b636c to bbf7bbe Compare December 9, 2019 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants