Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement getCompactionJob async RPC using Jetty and REST #5018

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cshannon
Copy link
Contributor

@cshannon cshannon commented Oct 28, 2024

This change switches getCompactionJob() rpc call to be an async REST call using Jetty and Jersey so that we can use long polling and not block while waiting for jobs on the queue. This is meant to demonstrate another alternative for async RPC as described in #5000

Jetty is configured to use Jackson and a custom serializer/deserializer to handle Thrift objects being serialized to json.

Authentication has been set up to work with username/password, SSL, and also Kerberos. Right now for the prototype the authentication is working but a couple shortcuts were taken to get it working with the IT set up so some more work needs to be done before it could work outside ITs if we wanted to use it.

Some notes about the prototype:

  1. This is meant as a proof of concept or prototype so it should not be merged as is.
  2. The port for the RPC jetty server is hardcoded for now using 8999 so this would need to be eventually set up with configuration like the other ports.
  3. There are ITs that demonstrate SSL and Kerberos working besides username and password. The Kerberos setup was based on this unit test that is part of Jetty.
  4. For the HTTP client I decided to use the Jetty client because the Kerberos test has an examples for using that. We could always switch to something like the Apache Http 5 client if we wanted.
  5. A few shortcuts were taken just to get things working for the proof of concept. Things like Http client connection pooling were not implemented and some of the authentication peices would need more work to work outside ITS.
  6. I added a spot bugs supression annotation because I couldn't figure out the warning initially but it looks like it is related to the anonymous UserPrincipal class being created in EmbeddedRpcWebServer that probably needs to be marked serializable. So that will need to be fixed eventually if we keep this. That principal will just become a real type and be fixed up.

This change switches getCompactionJob() rpc call to be an async REST
call using Jetty and Jersey so that we can use long polling and not
block while waiting for jobs on the queue.

Jetty is configured to use Jackson and a custom serializer/deserializer
to handle Thrift objects being serialized to json.

Authentication has been set up to work with username/password, SSL, and
also Kerberos. Right now for the prototype the authentication is not
quite complete and a couple shortcuts were taken to get it working with
the IT set up so some more work needs to be done before it is finished.
@cshannon cshannon self-assigned this Oct 28, 2024
@cshannon cshannon marked this pull request as draft October 28, 2024 17:19
// to prevent creating a new deserializer for every object?
private static <T extends TBase<?,?>> void deserialize(T obj, String json) throws IOException {
try {
final TDeserializer deserializer = new TDeserializer(new TJSONProtocol.Factory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Factory is thread-safe, right? I can imagine that the Deserializer is not. You could put the TDeserializer and TSerializers in a ThreadLocal. It looks like they reset their internals for reuse.

Comment on lines +1169 to +1177
String advertiseHost = getHostname();
if (advertiseHost.equals("0.0.0.0")) {
try {
advertiseHost = InetAddress.getLocalHost().getHostName();
} catch (UnknownHostException e) {
log.error("Unable to get hostname", e);
}
}
HostAndPort restHostAndPort = HostAndPort.fromParts(advertiseHost, restPort);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the value of sa.address from the code above this. In the code above the Thrift server was started on an interface, I assume we want to use the same one, just a different port.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants