-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace most shell script logic with Java #85758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, the tests we can write now! I left some questions/comments. We should probably add a test-windows label to the PR.
} | ||
} catch (IOException e) { | ||
ioFailure = e; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to put the flush and the latch countdown into a finally block here? I'm thinking about getting some unexpected error here and maybe it's better if we actually terminated and reported the error in userExceptionMsg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I agree. We should do both in a finally block in case we encounter some error reading from the input stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user exceptions are for things the user can control, like configuration. If it is a general coding error (eg an NPE in our code here), then it would get thrown to the default uncaught exception handler. In the Elasticsearch process, we have an uncaught exception handler that would catch it and log it. I wonder if we should add something similar to CliToolLauncher? Normally CLIs don't create threads, but in the case they do, we can be more consistent about (1) exiting and (2) how we log the error (eg a nice message like "there was an unexpected internal error, see below"). WDYT? I could do that as a followup?
exit $? | ||
CLI_NAME=server | ||
CLI_LIBS=lib/tools/server-cli | ||
source "`dirname "$0"`"/elasticsearch-cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anticipating that the java process launch will be heavier on resources than the bash script, can we add an additional JVM command line option here to stop the JDK from using the optimizing compiler, i.e. -XX:TieredStopAtLevel=1. This will reduce both startup time, CPU and memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be for all CLIs or just server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we were still launching Java processes before (JVM options parser, ergonomics, etc) so I don't think we've added any overhead here in terms of JVM startup, we've just consolidated some of that logic. I'd be surprised if there was any measurable change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about it. If most of our cli tools are generally short running we should use it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll take it upon myself to get some data on CPU/memory while running without the optimizing compiler and maybe if it's beneficial to a great extent, I'll follow up with a PR.
} | ||
|
||
sendShutdownMarker(); | ||
errorPump.drain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this, it seems that waitFor drains stderr anyways.
assertMutuallyExclusiveOptions("--version", "-p", "/tmp/pid"); | ||
assertMutuallyExclusiveOptions("--version", "--pidfile", "/tmp/pid"); | ||
assertMutuallyExclusiveOptions("--version", "-q"); | ||
assertMutuallyExclusiveOptions("--version", "--quiet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also test with --enrollment-token. Right now -V is incompatible with --enrollment-token.
try { | ||
msg = stdin.read(); | ||
} catch (IOException e) {} | ||
if (msg == BootstrapInfo.SERVER_SHUTDOWN_MARKER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also put this in finally?
exit $? | ||
CLI_NAME=server | ||
CLI_LIBS=lib/tools/server-cli | ||
source "`dirname "$0"`"/elasticsearch-cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we were still launching Java processes before (JVM options parser, ergonomics, etc) so I don't think we've added any overhead here in terms of JVM startup, we've just consolidated some of that logic. I'd be surprised if there was any measurable change here.
* | ||
* http://commons.apache.org/proper/commons-daemon/procrun.html | ||
* | ||
* NOTE: If this method is renamed and/or moved, make sure to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem right. The configuration of --StopMethod
used to be in the batch file but now it lives in WindowsServiceInstallCommand
. I think we should point folks there.
} | ||
} catch (IOException e) { | ||
ioFailure = e; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I agree. We should do both in a finally block in case we encounter some error reading from the input stream.
// the thread pumping stderr watching for state change messages | ||
private final ErrorPumpThread errorPump; | ||
|
||
// a flag marking whether the java process has been detached from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment reads like it's been truncated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just poor grammar on my part. I've reworded.
@elasticmachine run elasticsearch-ci/packaging-tests-windows |
@elasticmachine run elasticsearch-ci/part-1 |
command.add("-cp"); | ||
// The '*' isn't allowed by the windows filesystem, so we need to force it into the classpath after converting to a string. | ||
// Thankfully this will all go away when switching to modules, which take the directory instead of a glob. | ||
command.add(esHome.resolve("lib") + (isWindows ? "\\" : "/") + "*"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivially, this could simplified to use java.io.File.separator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can! We specifically use the local isWindows here so that we can check the unix and linux behavior in tests, regardless of which platform we are actually on. See ServerProcessTests.testCommandLine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LVGTM
@grcevski I think I addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Password must be at least 114 bits in FIPS mode. This PR fixes the password length in the new ServerCliTests so it passes in FIPS mode. Relates: #85758 PS: The test [failed](https://gradle-enterprise.elastic.co/s/mrlw6o27onxee/tests/:distribution:tools:server-cli:test/org.elasticsearch.server.cli.ServerCliTests/testKeystorePassword) on my PR CI.
A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758
A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates #85758
A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758
A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758
Elasticsearch provides several command line tools, as well as the main script to start elasticsearch. While most of the logic is abstracted away for cli tools, the main elasticsearch script has hundreds of lines of platform specific shell code. That code is difficult to maintain because it uses many special shell features which then must also exist in other platforms (ie windows batch files). Additionally, the logic in these scripts are not easy to test, we must be on the actual platform and test with a full installation of Elasticsearch, which is relatively slow (compared to most in process tests).
This commit replaces most of the shell specific logic with Java code. It introduces a singular entrypoint, the Launcher, to start any Elasticsearch CLI. Each shell script must then only describe the tool to call and the lib directories it needs to load.
There is a small amount of shell logic that remains. Specifically, that is to identify the location of ES_HOME from the shell script path, and then to find which java installation should be used. After that, the cli can be launched, using a small heap (as we do already for CLIs). For the main Elasticsearch server, the cli figures out all the jvm options and such necessary, then launches the real server process. If run in the foreground, the launcher will stay alive for the lifetime of Elasticsearch; the streams are effectively inherited so all output from Elasticsearch still goes to the console. If daemonizing, the launcher waits around until Elasticsearch is "ready" (this means the Node startup completed), then detaches and exits.