Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-25: Token based authentication #2888

Merged
merged 23 commits into from
Nov 28, 2018
Merged

PIP-25: Token based authentication #2888

merged 23 commits into from
Nov 28, 2018

Conversation

merlimat
Copy link
Contributor

@merlimat merlimat commented Oct 30, 2018

Motivation

This is the first PR for the implementation of token based authentication

Modifications

  • Added client & broker auth providers
  • Added CLI tool to manage tokens
  • Unit tests

TODOs (in separate PRs):

  • Integration tests
  • Documentation
  • C++ (and related) client plugin implementation

@merlimat merlimat added the type/feature The PR added a new feature or issue requested a new feature label Oct 30, 2018
@merlimat merlimat added this to the 2.3.0 milestone Oct 30, 2018
@merlimat merlimat self-assigned this Oct 30, 2018
Copy link
Contributor

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about naming so it's clear that private/public can be used.

Also, it would be really good to have an integration test in this PR a) to validate that both the PSK and private/public paths work and b) to illustrate the workflow as a basis for documentation.

@merlimat
Copy link
Contributor Author

@ivankelly @maskit Please take another look

@merlimat
Copy link
Contributor Author

Also, it would be really good to have an integration test in this PR a) to validate that both the PSK and private/public paths work

The 2 paths are tested in the auth provider unit test. It's not the same as integration test but it covers all the plugin code, from getting variables from ServerConfiguration to validating the tokens in both cases.

conf/broker.conf Outdated Show resolved Hide resolved
@merlimat
Copy link
Contributor Author

@maskit @ivankelly Updates to support env:, data: and file: as sources for keys and tokens


if (validationKeyConfig.startsWith("data:")) {
log.info("Reading token validation key from config file");
validationKey = validationKeyConfig.substring("data:".length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be valid for a super simple case, but it wouldn’t work for cases that has mime-type or base64 encoding.

Why didn’t you use the URL class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because I've never used URL.openStream() before :)

Though, one thing is that the key is expected to be already base64 encoded.

$ bin/pulsar tokens create-secret-key
secret-key CBqtuReZe+8nKqtJBbmuPTIg6b4W1KjOS21pcJkTJ/8=

Then one would just have to store the key string somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m ok with parsing the schemes here and having a limitation in its format for now, but I still think this data scheme parsing is incorrect. It always need a comma after the colon.
https://tools.ietf.org/html/rfc2397#section-3

Also, base64 encoding is not mandatory. You can pass URL encoded string instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maskit I have updated to conform with data: standard url, though I had to add manual parsing since java URL class doesn't support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look into #1212 I commented on another line? The URL class I suggested is org.apache.pulsar.client.api.url.URL. The parser already exists in it.

@merlimat
Copy link
Contributor Author

@maskit @ivankelly please take another look. I have moved to use standard data: uri and to not force base64 encodings.

@maskit
Copy link
Member

maskit commented Nov 26, 2018

The "data:" parser you wrote would be a duplicate code because it's already implemented in the in-house URL class I wrote. I'm not going to make a change request because I don't want to delay this feature, but either of the parsers should be removed eventually to eliminate duplicate code.

Also, because environment variables are general enough as data sources, I think "env:" parser should be added as a URLStreamHandler so that other places can use it too via the in-house URL class.

As for "token:", I'm ok with having it in this plugin as a special case. But if you renamed it to "raw:", it could be used on other places.

@merlimat
Copy link
Contributor Author

run integration tests

Copy link
Contributor

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no change here that shows the whole thing working. It doesn't even have to be an integration test, but there should be a test where a client connects to a server using token based auth.

conf/broker.conf Outdated Show resolved Hide resolved
String outputFile;

@Parameter(names = {
"-b", "--base-64" }, description = "Encode the key in base64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--base64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be enabled by default in the case that -o is not specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but don't know which option is more confusing

token = r.readLine();
} else if (tokenFile != null) {
token = new String(Files.readAllBytes(Paths.get(tokenFile)), Charsets.UTF_8);
} else if (System.getenv("TOKEN") != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here? Seems unnecessary. Anyone who wanted to use this would just pass $TOKEN as the parameter on the shell

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that a "token" if effectively like a password (or a private key): whoever has it, can access the system. Passing that as a CLI argument will expose to other users on the same machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and how would they get it into the environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I'm fine with it staying, just not sure how much real security it offers and trying to reduce options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and how would they get it into the environment?

I guess you can do:

TOKEN=xyz pulsar tokens show

The env variable will not be visible on the process list

public void configure(String encodedAuthParamString) {
// Interpret the whole param string as the token. If the string contains the notation `token:xxxxx` then strip
// the prefix
if (encodedAuthParamString.startsWith("token:")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different to data: on the server side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is meant to carry arbitrary binary data. A token is guaranteed to be a valid ASCII string (being composed of base64 encoded parts).

The encodedAuthParamString are common to all the auth providers and are a way to pass implementation specific configs. In case of token auth, we're taking the "token" string itself.

I didn't want to use data: here to simplify the user input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be all or nothing, no? I don't see reasons to support token, file and env but not data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, token instead of data makes sense. token: is used for passing the token to the client. data: is used to allow the server to load the signing key. They're different usecases. The signing key could be quite large in case of public-private key, whereas the token is generally quite small.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we shouldn't support data: for small data? Even if it's not a suitable scheme, I don't see reasons to drop it. If we allowed using any schemes for any cases and used the URL class to support them, we wouldn't need to maintain this if-else block.

@maskit
Copy link
Member

maskit commented Nov 26, 2018

@merlimat

There is just one problem with that class in that it assumes the data will be decoded into a String,

getContent() returns a InputStream so you should be able to read raw data.

Will continue this in a different PR since this will need to change on Athenz plugin too.

Sure, I think I can help you out on this since Pulsar seems like compilable with Java 11 :)

@merlimat
Copy link
Contributor Author

@ivankelly @maskit Addressed most of comments and added simple integration tests, Please one more look

Copy link
Contributor

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but nothing blocking


@Cleanup
PulsarClient client = PulsarClient.builder()
.serviceUrl(pulsarCluster.getPlainTextServiceUrl())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR, but do we have a requireTLS option? We should have an option to block authorization unless TLS is in use to avoid people sending tokens over cleartext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was another PR that was meant to allow to disable the clear-text service endpoints.

return ByteStreams.toByteArray((InputStream) new URL(keyConfUrl).getContent());
} catch (Exception e) {
throw new IOException(e);
}
} else if (keyConfUrl.startsWith("env:")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you have a URL handler for this also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's one tricky part.

The key, passed as an env variable, needs to be base64 encoded. That might not be needed in general case for env: (eg: a token would be fine as it is).

It's kind of confusing and we'd have to define something similar to env:base64,MY_SECRET_KEY_VAR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shoudn’t define encoding field on env scheme to keep it simple. The requirement comes from application use case (which is this plugin here), so the application should take care of it.
If you get URLConnection from the URL object, you can check the scheme name as protocol name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t read key generation part, but if the key is in PEM format we can use the same format for all scheme and decode the raw data read from the InputStream in a consistent way, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the env: schema as it was not working properly. Java seems to be mangling long var values. We can add later when we can make it work.

@ivankelly
Copy link
Contributor

When you do the integration tests, we need a test for proxy (it seems to be missing in this and c++ change)

if (charset == null) {
charset = "US-ASCII";
if (contentType == null) {
this.contentType = "application/data";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC2397 says:

If <mediatype> is omitted, it defaults to text/plain;charset=US-ASCII.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.contentType = "application/data";
this.contentType = "text/plain;charset=US-ASCII";

if (matcher.group("base64") == null) {
// Support Urlencode but not decode here because already decoded by URI class.
this.data = new String(matcher.group("data").getBytes(), charset);
this.data = matcher.group("data").getBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is probably equivalent to the original code, so this change is fine.

However, I realized that the both code have the same issue. Because we drop charset information here, users cannot read the data as a string with the specified charset. I think we should stop parsing parameters in mimetype and return the whole mimetype part as content-type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with original code is that it was getting a list of bytes and converting them into a String (which in case of keys is not possible) and then exposing back to a ByteArrayInputStream. I just cut through the String conversion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can see that, and we shoud fix that part. So this would be out of this PR’s scope but we need to return charset as a part of contentType so that URL class users can convert the data into String after reading the raw data from InputStream. I think we can do this by removing charset part from the regex.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 because there's no blockers. I can improve the small stuffs around scheme and URL class later. The behavior in normal use cases wouldn't be changed so much.

@merlimat
Copy link
Contributor Author

When you do the integration tests, we need a test for proxy (it seems to be missing in this and c++ change)

@ivankelly the test is actually going trough the proxy already since pulsarCluster.getPlainTextServiceUrl() returns the proxy container address

@merlimat
Copy link
Contributor Author

@maskit @ivankelly Few more changes:

  • Fixed integration tests (test is also going against proxy)
  • Added CLI tools to validate a token against a key
  • Removed env: as it's not working properly.

String cmdString = Arrays.stream(cmd).collect(Collectors.joining(" "));
ByteBuf stdout = Unpooled.buffer();
ByteBuf stderr = Unpooled.buffer();
docker.execStartCmd(execid).withDetach(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code in common with runCommand here. Doesn't need to be in this patch, but it should be factors out to something that takes a Consumer<byte[]>

@merlimat
Copy link
Contributor Author

run cpp tests

@merlimat merlimat merged commit a99f733 into apache:master Nov 28, 2018
@merlimat merlimat deleted the jwt branch November 28, 2018 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants