Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blocking by hostname patterns (--block-hostname) #1532

Merged
merged 8 commits into from
Oct 16, 2020

Conversation

krashanoff
Copy link
Contributor

@krashanoff krashanoff commented Jul 3, 2020

This is an implementation of the --block-hostname option as described in #972.

It provides:

  • The ability to block tests from accessing particular hostnames that match provided patterns
  • A limited wildcard syntax for hostname blocking: wildcards are permitted only at the very start of the pattern string, which should consist only of valid letters in any language, numbers, or .
  • Parameters through the environment variable K6_BLOCK_HOSTNAMES, the JSON field blockHostnames, and the CLI option --block-hostname
  • A new error (code 1111) for request failure due to the hostname being blocked.
  • An implementation of a trie to store and search blocked hostname patterns quickly
  • Tests for the feature

Implementation details

The trie is implemented as typename HostnameTrie in lib/options.go. It is designed to handle internationalized hostnames and the aforementioned limited wildcard syntax.

type HostnameTrie struct {
	children []*HostnameTrie
	r        rune
	terminal bool // end of a valid match
}

The only thing notable is that a rune is held at each node instead of a byte to address internationalized hostnames.

The provided interface for the type is:

  • (*HostnameTrie) Insert(s string) error
    • Insert a new pattern into the HostnameTrie. Returns a non-nil error if the pattern is illegal (e.g. breaks the aforementioned syntax).
  • (*HostnameTrie) Contains(s string) (bool, string)
    • Check if a hostname or string matches a pattern in the HostnameTrie. Returns whether a match was found, and the matched path if one was found.
  • (*HostnameTrie) UnmarshalJSON([]byte) error
  • (*HostnameTrie) UnmarshalText([]byte) error

A pointer to this type is passed to the Dialer as BlockedHostnames and to the Options as the same.

golintci warnings

There are two golintci warnings in this PR, being a line that's too long in cmd/options.go and use of a global variable in lib/options.go.

The long line is just the description of the option. I figured it wasn't too big a deal since all option descriptions were about as long.

The global variable I would like to remove, but eliminating it would mean either:

  • A regex would need to be compiled on function call somewhere in HostnameTrie's implementation.
  • One would have to manually implement hostname pattern validation.

So it might be a necessary evil.

Other notes

Since the interface for inserting and checking strings against the hostname trie lends itself to using the string type, but the trie uses runes, type conversions are carried out on function call. This makes the code less elegant, but the interface cleaner.

This PR closes #972.

@CLAassistant
Copy link

CLAassistant commented Jul 3, 2020

CLA assistant check
All committers have signed the CLA.

@na-- na-- requested review from imiric and mstoykov and removed request for imiric July 3, 2020 12:21
@na--
Copy link
Member

na-- commented Jul 6, 2020

Hey, thanks for making this PR! I've only skimmed it, so this is not a proper PR review, but it looks very good to me from this initial glance! Regarding the linter errors, you can safely add //nolint:gochecknoglobals to the global variable warning - it's difficult to avoid such warnings at this point in time, before a lot of things in #883 are resolved. And you can split the long line like sumTrendStatsHelp is done, a few lines below.

I didn't spot any architectural-level issues, but we'll probably have to delay a full, implementation-details-level, code review at least a week or two, sorry... 😞 We're currently very focused on polishing #1007, merging it in master in the next day or two and releasing k6 v0.27.0 soon after that, hopefully next week. Since we're in the final phases of that very, very long effort, it's unlikely that we'll fully review and release your PR in k6 v0.27.0 - it's either going to be v0.27.1 or v0.28.0. And, given the multitude of changes in #1007, you'd probably need to rebase the PR after we merge, sorry again... 😞

Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for that!!!
As @na-- we don't really have the capacity to code review this in full, but I also took 10 minutes to look at it.

I don't see any blockers so far. I left some small comments, and would really like it if we have a bigger
test with:

  1. more matching hostnames
  2. a few not matching, preferably ones that are close to what is blocklisted so if *.k6.io is blocked .. that k6.io isn't and like if we match test.k6.io something.test.k6.io isn't matched and so on :D

Thanks again for this PR, and hopefully we will be able to code review this in full in two weeks :D

js/runner_test.go Outdated Show resolved Hide resolved
js/runner_test.go Outdated Show resolved Hide resolved
lib/options_test.go Outdated Show resolved Hide resolved
@krashanoff
Copy link
Contributor Author

Thanks @na--, @mstoykov for taking time out of what seems to be a hectic handful of weeks to check up on this PR. It's much appreciated!

we'll probably have to delay a full, implementation-details-level, code review at least a week or two, sorry... 😞

No worries at all, thank you for taking time to provide even ten minutes' review. Besides, I don't think there's a huge rush on this seeing as it is a >1 year old issue now 👍

And you can split the long line like sumTrendStatsHelp is done, a few lines below.

I'll do so then. Thanks ^_^

I don't see any blockers so far. I left some small comments, and would really like it if we have a bigger
test with:

  • more matching hostnames
  • a few not matching, preferably ones that are close to what is blocklisted so if *.k6.io is blocked .. that k6.io isn't and like if we match test.k6.io something.test.k6.io isn't matched and so on :D

Roger that, I'll add some more tests in with the next commit then. Also, I'll implement all the specific changes in your review up top.

Take it easy!

@na-- na-- added this to the v0.28.0 milestone Jul 22, 2020
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the super late review, and thank you again for making this PR!

I just realized that domains are supposed to be case-insensitive... 😞 So, I think you should just use something like strings.ToLower() or unicode.ToLower() everywhere, both when you're adding domains to the blocklist, and when checking domains against it.

lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
lib/options_test.go Outdated Show resolved Hide resolved
@krashanoff
Copy link
Contributor Author

Hey, thank you for reviewing the changes and leaving feedback! Great suggestion on the map. I'll modify the implementation of the trie and add those tests over this weekend 👍

Comment on lines +213 to +215
// UnmarshalJSON forms a HostnameTrie from the provided hostname pattern
// list.
func (t *HostnameTrie) UnmarshalJSON(data []byte) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you should also implement the MarshalJSON() method 😞 k6 needs to be able to marshal all of the options in a JSON format when it's making an archive bundle.

Try running k6 archive --archive-out test.tar --block-hostname="*" github.com/loadimpact/k6/samples/http_get.js and then k6 run test.tar. You'd get an error like this:

ERRO[0000] json: cannot unmarshal object into Go struct field Options.options.blockHostnames of type []string 

this is because the metadata.json in the archive will have "blockHostnames": {}, in it, which is not the array this code expects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it! Will go ahead and implement this. Is there anything else I've missed in my changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, having MarshalJSON() that produces an array should take care of the issue, since your code can already deal with arrays (after the envconfig issue is worked around). I think you don't need to use k6 archive every time though, k6 inspect --block-hostname="*" github.com/loadimpact/k6/samples/http_get.js should mostly do the same things but give you faster feedback.

Copy link
Member

@na-- na-- Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a unit test that makes an archive on the fly and runs it, for example take a look at https://github.com/loadimpact/k6/blob/4c39bcccf606aa3b4324ec5d4f3adfdfcf7d2fa5/js/runner_test.go#L179-L227

Or some of the other tests that make use of Runner.MakeArchive() and run the result: https://github.com/loadimpact/k6/blob/4c39bcccf606aa3b4324ec5d4f3adfdfcf7d2fa5/js/runner.go#L110-L112

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest fixes! This is very close to merging, I just noticed a few remaining issues when I checked out the code and played with the binary compiled from it. Also, can you rebase your branch on the latest k6 master, there have been some substantial changes there since this was published.

@@ -242,6 +341,9 @@ type Options struct {
// Blacklist IP ranges that tests may not contact. Mainly useful in hosted setups.
BlacklistIPs []*IPNet `json:"blacklistIPs" envconfig:"K6_BLACKLIST_IPS"`

// Block hostname patterns that tests may not contact.
BlockedHostnames *HostnameTrie `json:"blockHostnames" envconfig:"K6_BLOCK_HOSTNAMES"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is somewhat problematic. Because of a bug in the envconfig library, it initializes BlockedHostnames, so the if opts.BlockedHostnames != nil check in Apply() will always be true. So, you're never able to set the blocked hostnames in the JS script, because Apply will always overwrite them, even if it's with an empty list.

Try running this script with k6:

import http from "k6/http";

export let options = {
    throw: true,
    blockHostnames: ["*.io"],
};

export default function () {
    http.get("https://test.k6.io/");
}

k6 will not block the domain and the request will happen! 😞 The proper fix for this would be to replace the envconfig library with something saner, but that's a huge refactoring task we've been postponing for a long time... 😭 For now, maybe we shouldn't use a pointer here, and instead have the pointer HostnameTrie struct? Or, maybe remove envconfig:"K6_BLOCK_HOSTNAMES" and add support for environment variables whenever we fix envconfig?

See kelseyhightower/envconfig#113 and #1560 and other envconfig issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envconfig strikes again. I will experiment with best options here since having configuration through environment variables is desirable for this sort of thing. Worst comes to worst, I will remove the envconfig support.

// HostnameTrie is a tree-structured list of hostname matches with support
// for wildcards exclusively at the start of the pattern. Items may only
// be inserted and searched. Internationalized hostnames are valid.
type HostnameTrie struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that you don't have other properties in this struct, you might be able to have something like type HostnameTrie map[rune]HostnameTrie, which might resolve the envconfig issue I posted below. Though I'm not sure if this wouldn't cause other issues 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing something similar when I was working on 372f140 and ran into some issues. Will try again in the next commit 👍

@na-- na-- mentioned this pull request Aug 17, 2020
@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2020

Codecov Report

Merging #1532 into master will decrease coverage by 0.36%.
The diff coverage is 67.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1532      +/-   ##
==========================================
- Coverage   75.23%   74.86%   -0.37%     
==========================================
  Files         150      164      +14     
  Lines       10924    14027    +3103     
==========================================
+ Hits         8219    10502    +2283     
- Misses       2238     2993     +755     
- Partials      467      532      +65     
Impacted Files Coverage Δ
api/v1/client/client.go 0.00% <0.00%> (ø)
api/v1/client/metrics.go 0.00% <0.00%> (ø)
api/v1/client/status.go 0.00% <0.00%> (ø)
api/v1/group.go 90.90% <ø> (+19.48%) ⬆️
api/v1/metric.go 100.00% <ø> (ø)
cmd/archive.go 26.19% <0.00%> (ø)
cmd/cloud.go 8.10% <0.00%> (-1.42%) ⬇️
cmd/collectors.go 0.00% <0.00%> (ø)
cmd/common.go 50.00% <ø> (+10.00%) ⬆️
cmd/convert.go 46.15% <ø> (ø)
... and 190 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 479707f...372f140. Read the comment docs.

}

if _, wild := t.children['*']; wild {
return "*", true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow "*" as a valid pattern? Right now this blocks any requests from being made, which doesn't seem useful or necessary. It also blocks requests to IPs, which seems confusing:

WARN[0000] Request Failed                                error="Get \"http://127.0.0.1:8000/\": hostname (127.0.0.1) is in a blocked pattern (*)"

I guess it would be useful to block IP ranges, but that seems out of scope for this PR, and it steps on the current K6_BLACKLIST_IPS option, so that would need addressing.

So we should probably check whether host is an IP in dialer.go, and skip hostname-based blocking if it is. This will have a minor conflict with #1489, but it should be easily resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow "*" as a valid pattern? Right now this blocks any requests from being made, which doesn't seem useful or necessary.

I suppose it would be useless - I sought to keep the implementation as literally as possible to what was discussed in #972. I can change this, if deemed necessary.

It also blocks requests to IPs, which seems confusing ... I guess it would be useful to block IP ranges, but that seems out of scope for this PR, and it steps on the current K6_BLACKLIST_IPS option, so that would need addressing.

I agree. I see a few potential solutions to this problem:

  1. Distinguish the dialing process for hostnames and IPs
  2. Allow this option to subsume IP blocking
  3. Resolve all hostnames to their IPs when reading options, rather than blocking hostnames directly on-dial.

The latter seems more appealing in my opinion, but that would fail to address the naive blocking method currently employed. Another issue I see is that any of these solutions, the two options will end up stepping on top of one another, like you said. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 3 is not possible really, as hostnames can have multiple IPs and they can change over the course of the test. While currently, k6 caches resolved IPs indefinitely, this is going to change (actually pretty soon if everything works all right ;) ) .

I do think that 1 is the correct way, and I also think that maybe we can leave it for ... after you merge it and we merge #1489 . As far as I can see this will have a slight performance impact, but nothing big.

The other problem I propose is fixed before the actual trie implementation, but in the config checking - we just don't let the pattern to be .. just *. Maybe we even should require it to be at least 4 5 characters as I am pretty sure there are no longer any 1 letter domains ... actually they are apparently reserved

@na-- na-- modified the milestones: v0.28.0, v0.29.0 Sep 8, 2020
@mstoykov mstoykov mentioned this pull request Oct 9, 2020
@mstoykov mstoykov merged commit 887f583 into grafana:master Oct 16, 2020
@eugercek eugercek mentioned this pull request Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add blacklisting based on hostname
6 participants