Replace most shell script logic with Java #85758

rjernst · 2022-04-07T23:31:58Z

Elasticsearch provides several command line tools, as well as the main script to start elasticsearch. While most of the logic is abstracted away for cli tools, the main elasticsearch script has hundreds of lines of platform specific shell code. That code is difficult to maintain because it uses many special shell features which then must also exist in other platforms (ie windows batch files). Additionally, the logic in these scripts are not easy to test, we must be on the actual platform and test with a full installation of Elasticsearch, which is relatively slow (compared to most in process tests).

This commit replaces most of the shell specific logic with Java code. It introduces a singular entrypoint, the Launcher, to start any Elasticsearch CLI. Each shell script must then only describe the tool to call and the lib directories it needs to load.

There is a small amount of shell logic that remains. Specifically, that is to identify the location of ES_HOME from the shell script path, and then to find which java installation should be used. After that, the cli can be launched, using a small heap (as we do already for CLIs). For the main Elasticsearch server, the cli figures out all the jvm options and such necessary, then launches the real server process. If run in the foreground, the launcher will stay alive for the lifetime of Elasticsearch; the streams are effectively inherited so all output from Elasticsearch still goes to the console. If daemonizing, the launcher waits around until Elasticsearch is "ready" (this means the Node startup completed), then detaches and exits.

grcevski

This looks great, the tests we can write now! I left some questions/comments. We should probably add a test-windows label to the PR.

grcevski · 2022-05-18T19:21:47Z

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ErrorPumpThread.java

+            }
+        } catch (IOException e) {
+            ioFailure = e;
+        }


Does it make sense to put the flush and the latch countdown into a finally block here? I'm thinking about getting some unexpected error here and maybe it's better if we actually terminated and reported the error in userExceptionMsg.

👍 I agree. We should do both in a finally block in case we encounter some error reading from the input stream.

user exceptions are for things the user can control, like configuration. If it is a general coding error (eg an NPE in our code here), then it would get thrown to the default uncaught exception handler. In the Elasticsearch process, we have an uncaught exception handler that would catch it and log it. I wonder if we should add something similar to CliToolLauncher? Normally CLIs don't create threads, but in the case they do, we can be more consistent about (1) exiting and (2) how we log the error (eg a nice message like "there was an unexpected internal error, see below"). WDYT? I could do that as a followup?

grcevski · 2022-05-18T19:30:19Z

distribution/src/bin/elasticsearch

-exit $?
+CLI_NAME=server
+CLI_LIBS=lib/tools/server-cli
+source "`dirname "$0"`"/elasticsearch-cli


Anticipating that the java process launch will be heavier on resources than the bash script, can we add an additional JVM command line option here to stop the JDK from using the optimizing compiler, i.e. -XX:TieredStopAtLevel=1. This will reduce both startup time, CPU and memory usage.

Should this be for all CLIs or just server?

Well, we were still launching Java processes before (JVM options parser, ergonomics, etc) so I don't think we've added any overhead here in terms of JVM startup, we've just consolidated some of that logic. I'd be surprised if there was any measurable change here.

Yeah, I was thinking about it. If most of our cli tools are generally short running we should use it by default.

Good point, I'll take it upon myself to get some data on CPU/memory while running without the optimizing compiler and maybe if it's beneficial to a great extent, I'll follow up with a PR.

grcevski · 2022-05-18T19:34:56Z

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ServerProcess.java

+        }
+
+        sendShutdownMarker();
+        errorPump.drain();


We probably don't need this, it seems that waitFor drains stderr anyways.

grcevski · 2022-05-18T19:55:35Z

distribution/tools/server-cli/src/test/java/org/elasticsearch/server/cli/ServerCliTests.java

+        assertMutuallyExclusiveOptions("--version", "-p", "/tmp/pid");
+        assertMutuallyExclusiveOptions("--version", "--pidfile", "/tmp/pid");
+        assertMutuallyExclusiveOptions("--version", "-q");
+        assertMutuallyExclusiveOptions("--version", "--quiet");


I think we can also test with --enrollment-token. Right now -V is incompatible with --enrollment-token.

grcevski · 2022-05-18T20:22:05Z

server/src/main/java/org/elasticsearch/bootstrap/Elasticsearch.java

+            try {
+                msg = stdin.read();
+            } catch (IOException e) {}
+            if (msg == BootstrapInfo.SERVER_SHUTDOWN_MARKER) {


Maybe we should also put this in finally?

mark-vieira · 2022-05-18T20:49:42Z

distribution/src/bin/elasticsearch

-exit $?
+CLI_NAME=server
+CLI_LIBS=lib/tools/server-cli
+source "`dirname "$0"`"/elasticsearch-cli


Well, we were still launching Java processes before (JVM options parser, ergonomics, etc) so I don't think we've added any overhead here in terms of JVM startup, we've just consolidated some of that logic. I'd be surprised if there was any measurable change here.

mark-vieira · 2022-05-18T20:56:47Z

distribution/tools/cli-launcher/src/main/java/org/elasticsearch/launcher/CliToolLauncher.java

+      *
+      * http://commons.apache.org/proper/commons-daemon/procrun.html
+      *
+      * NOTE: If this method is renamed and/or moved, make sure to


This comment doesn't seem right. The configuration of --StopMethod used to be in the batch file but now it lives in WindowsServiceInstallCommand. I think we should point folks there.

mark-vieira · 2022-05-18T21:03:01Z

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ErrorPumpThread.java

+            }
+        } catch (IOException e) {
+            ioFailure = e;
+        }


👍 I agree. We should do both in a finally block in case we encounter some error reading from the input stream.

mark-vieira · 2022-05-18T21:05:49Z

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ServerProcess.java

+    // the thread pumping stderr watching for state change messages
+    private final ErrorPumpThread errorPump;
+
+    // a flag marking whether the java process has been detached from


This comment reads like it's been truncated.

Just poor grammar on my part. I've reworded.

rjernst · 2022-05-18T22:47:30Z

@elasticmachine run elasticsearch-ci/packaging-tests-windows

rjernst · 2022-05-19T05:04:15Z

@elasticmachine run elasticsearch-ci/part-1

ChrisHegarty · 2022-05-19T12:55:45Z

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ServerProcess.java

+        command.add("-cp");
+        // The '*' isn't allowed by the windows filesystem, so we need to force it into the classpath after converting to a string.
+        // Thankfully this will all go away when switching to modules, which take the directory instead of a glob.
+        command.add(esHome.resolve("lib") + (isWindows ? "\\" : "/") + "*");


Trivially, this could simplified to use java.io.File.separator

I don't think we can! We specifically use the local isWindows here so that we can check the unix and linux behavior in tests, regardless of which platform we are actually on. See ServerProcessTests.testCommandLine

ChrisHegarty

LVGTM

rjernst · 2022-05-19T13:56:49Z

@grcevski I think I addressed all your comments.

grcevski

LGTM!

Password must be at least 114 bits in FIPS mode. This PR fixes the password length in the new ServerCliTests so it passes in FIPS mode. Relates: #85758 PS: The test [failed](https://gradle-enterprise.elastic.co/s/mrlw6o27onxee/tests/:distribution:tools:server-cli:test/org.elasticsearch.server.cli.ServerCliTests/testKeystorePassword) on my PR CI.

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates #85758

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates #85758

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758

rjernst and others added 30 commits March 30, 2022 21:00

stash

917ee1e

Merge branch 'master' into launcher

3a0a7e2

cleanup

5ea8dbc

plugin cli sort of works, invoked through cli tool manually

2bf16e1

plugin cli done (on nix)

3500ef7

certutil converted

42f1bde

use exec

483577f

use exec in certutil, remove <java17 checks

b49edb4

move java version checking, use source instead of exec

abefb36

Fix version checker

e11e6a3

convert geoip cli

2ba7d99

convert keystore cli

b3dc890

introspect toolname

eeff485

convert node and shard clis

160bbcf

convert most security tools

eab9724

convert croneval

271096f

use es-env script again, so sql-cli can still use it

e0fb885

fix reconfigure tool

fdc695f

convert postinst

64c65c6

Update windows scripts to use launcher tool

9129265

Convert croneval.bat

92e0247

Set config directory in elasticsearch-cli.bat

9292c78

create server cli provider, move launcher stuff to it

3571c35

Use elasticsearch-env.bat in elasticsearch-cli.bat and in sql cli

fa15e1e

Don't hardcode tool names for windows batch

4c6f9e8

Just get the filename

c76c8b7

Merge branch 'launcher' into launcher-wrb

ab73844

more server cleanup, simplified logging init for clis

8ba04d0

remove old main methods

4af0543

remove beforeMain hook

e0b0b57

rjernst requested a review from grcevski May 18, 2022 00:49

rjernst added 4 commits May 17, 2022 20:10

fix command tests

fa29cd9

fix naming of windows service daemon

1c1c7c9

fix detach state

48363b5

spotless

358447a

grcevski reviewed May 18, 2022

View reviewed changes

rjernst added 2 commits May 18, 2022 13:50

add enrollment token test

48b55a7

guard against exceptional read errors

3234bd4

mark-vieira approved these changes May 18, 2022

View reviewed changes

rjernst added 2 commits May 18, 2022 14:24

protect from unexpected exception in error pump

ce2a433

improve javadocs

83083e2

rjernst added the test-windows Trigger CI checks on Windows label May 18, 2022

rjernst added 2 commits May 18, 2022 15:48

Merge branch 'master' into launcher

79ba71e

tweak for running test on windows

a7f9442

ChrisHegarty reviewed May 19, 2022

View reviewed changes

ChrisHegarty approved these changes May 19, 2022

View reviewed changes

grcevski approved these changes May 19, 2022

View reviewed changes

rjernst merged commit b9c504b into elastic:master May 19, 2022

rjernst deleted the launcher branch May 19, 2022 15:29

ywangd mentioned this pull request May 20, 2022

[Test] Increase length of test password for FIPS #86948

Merged

rjernst added a commit to rjernst/elasticsearch that referenced this pull request May 25, 2022

Remove leftover debugging statement on error

83df8f6

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates elastic#85758

rjernst mentioned this pull request May 25, 2022

Remove leftover debugging statement on error #87128

Merged

rjernst added a commit that referenced this pull request May 25, 2022

Remove leftover debugging statement on error (#87128)

fd442d3

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates #85758

elasticsearchmachine pushed a commit that referenced this pull request May 25, 2022

Remove leftover debugging statement on error (#87128) (#87136)

182143d

A debugging statement was left being printed when ES exits with a non-zero status. This commit removes the debug statement. relates #85758

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace most shell script logic with Java #85758

Replace most shell script logic with Java #85758

rjernst commented Apr 7, 2022

grcevski left a comment

grcevski May 18, 2022

mark-vieira May 18, 2022

rjernst May 18, 2022

grcevski May 18, 2022

rjernst May 18, 2022

mark-vieira May 18, 2022

grcevski May 18, 2022

grcevski May 19, 2022

grcevski May 18, 2022

grcevski May 18, 2022

grcevski May 18, 2022

mark-vieira May 18, 2022

mark-vieira May 18, 2022

mark-vieira May 18, 2022

mark-vieira May 18, 2022

rjernst May 18, 2022

rjernst commented May 18, 2022

rjernst commented May 19, 2022

ChrisHegarty May 19, 2022

rjernst May 19, 2022

ChrisHegarty left a comment

rjernst commented May 19, 2022

grcevski left a comment

Replace most shell script logic with Java #85758

Replace most shell script logic with Java #85758

Conversation

rjernst commented Apr 7, 2022

grcevski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjernst commented May 18, 2022

rjernst commented May 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisHegarty left a comment

Choose a reason for hiding this comment

rjernst commented May 19, 2022

grcevski left a comment

Choose a reason for hiding this comment