-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve log scrubbing: #2358
Improve log scrubbing: #2358
Conversation
Jenkins Build SummaryBuilt from this commit Built at 20180130 - 02:13:15 Test Results
|
src/ripple/basics/impl/Log.cpp
Outdated
|
||
s.replace (first, last - first, last - first, '*'); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the find('\"',...)
there is no need to limit the starting search position to the size of the string. The find
implementation does this check already.
I recommend searching for the second "
instead of the ,
after the first "
because the sensitive value may be the last key/value pair in that level of the jason.
Here is a minor tweak which takes these suggestions into account:
auto scrubber = [&s](char const* token)
{
auto first = s.find(token);
// If we have found the specified token, then attempt to isolate the
// sensitive data (it's enclosed by double quotes)
// and mask it:
if (first != std::string::npos)
{
first = s.find ('\"', first + std::strlen(token));
if (first != std::string::npos)
{
auto last = s.find('\"', first+1);
if (last == std::string::npos)
last = s.size();
else
++last;
s.replace (first, last - first, last - first, '*');
}
}
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely to happen but worth mentioning that subsequent occurrences of the token do not get masked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is called for every write to the log, and as written it does seven passes of the string, how about using a regex to do one pass. Something like:
std::string
scrub (std::string s)
{
static std::regex const r{
R"regex("(seed|seed_hex|secret|master_key|master_seed|master_seed_hex|passphrase)")regex",
std::regex_constants::optimize};
std::smatch m;
std::smatch::difference_type p = 0;
while (1)
{
if (!std::regex_search(s.cbegin() + p, s.cend(), m, r))
return s;
p = m.position() + p;
p = s.find('\"', p + m.length());
if (p == std::string::npos)
return s;
auto last = s.find(',', p);
if (last == std::string::npos)
last = s.size();
s.replace (p, last - p, last - p, '*');
p = last;
}
return s;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data point:
I took my implementation and tested it against the input string in the bug report that is prefixed by "2018-Jan-25 15:02:38 Server:DBG Reply:". Then I took Scott's implementation, changed it to search for the second "
instead of ,
, and tested it against the same string. Compiled with clang/libc++ at -O3
. Mine took about 50us and Scott's about 230us. Other regex implementations might be much faster. I wrote the libc++ regex, and I know it sucks. :-\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HowardHinnant Does that include the time to compile the regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HowardHinnant Ah I see. I incorrectly assumed we only wanted to mask within the quotes to keep the json valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think "***"
would look better, but I'm not going to loose any sleep either way. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It would be, theoretically, marginally more secure (you don't leak the length). However, that would necessitate a lot more copying. I'm happy to tweak that if others think it's important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think masking only within the quotes is a good idea. The tweak I previously mentioned to @HowardHinnant may be all we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote another version...
auto scrubber = [&s](char const* prefix,
std::vector<char const*> const& endings)
{
auto first = s.find(prefix);
if (first == std::string::npos)
return;
size_t const offset = first + std::strlen(prefix);
for (auto const& e : endings)
{
auto const len = std::strlen(e);
if ((s.compare(offset, len, e)) == 0)
{
first = s.find('\"', offset + len);
if (first != std::string::npos)
{
auto last = s.find('\"', first + 1);
if (last == std::string::npos)
last = s.size();
else
++last;
s.replace(first, last - first, last - first, '*');
}
return;
}
}
};
called as such...
static std::vector<std::pair<char const*, std::vector<char const*>>> const tokens
{
{{"\"seed"}, {"\"", "ed_hex\""}},
{{"\"secret"}, {"\""}},
{{"\"master_"}, {"key\"", "seed\"", "seed_hex\""}},
{{"\"passphrase"}, {"\""}}
};
for (auto const& t : tokens)
scrubber(t.first, t.second);
This version is about 80% faster than @HowardHinnant's version but probably not worth having the extra complexity.
src/ripple/basics/impl/Log.cpp
Outdated
scrubber ("\"master_seed\""); | ||
scrubber ("\"master_seed_hex\""); | ||
scrubber ("\"passphrase\""); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor exercise to increase our confidence this is a complete list, how about commenting the list of JSS
in src/ripple/protocol/JsonFields.h with an attribute for each field that contains sensitive information. For example:
JSS ( passphrase ); // in [sensitive]: WalletPropose
As an example, the act of writing this comment caused me to wonder if the next JSS
down from passphrase
is also "sensitive" (password
).
Haha #bikeshedding
|
We have this thing about performance... |
One issue is we create an unnecessary copy of the message string when there are no secrets to be scrubbed (and the vast majority will have no secrets). How about an interface that "appendsScrubbed" instead of "return scrubbed". This makes it easier to avoid the copy. Something like (note untested code, just look at the interface and how the copy is avoided). Another alternative would be remove the |
I've addressed all review comments and restructured the code to reduce the amount of copying by inlining the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me and tests out fine here.
The code is passable but I will mention a few weaknesses.
|
Per issue #2354, when the log level of a server was configured at "trace", sensitive keying meterial generated by the `wallet_propose` command could be written to the server's log file, if one was configured. This commit improves the log scrubbing code to account for the sensitive information generated by a `wallet_propose`. ** Important security consideration ** We still caution everyone *against* executing this command on a server that they do not control: a malicious server operator could intercept the generated keypair, or operate a modified server that returns keypairs that are not securely generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the current state of this change. That said, I'm slightly apprehensive about the string matching for every log line, when in fact, we only care about certain messages that actually have json in them. To that end, I threw together a possibly alternative approach by modifying StyledStreamWriter: mellery451@592749a. This is incomplete and untested, but you get the idea. The main downside here is that you have to know to mod teh stream if you are streaming json to the log, but that should be a relatively small number of locations (?). In any event, I'm fine with the current approach and only threw this out to see if it's worth considering.
@miguelportilla: Thanks. I think that this isn't really meant to be a comprehensive solution. We can't possibly redact everything and if we could today, we couldn't guarantee that wouldn't change tomorrow. To address your points more specifically:
|
Codecov Report
@@ Coverage Diff @@
## develop #2358 +/- ##
===========================================
+ Coverage 70.04% 70.05% +0.01%
===========================================
Files 704 703 -1
Lines 53342 53036 -306
===========================================
- Hits 37361 37155 -206
+ Misses 15981 15881 -100
Continue to review full report at Codecov.
|
In 0.90.0-b5 |
We may want to use the |
Per issue #2354, when the log level of a server was configured at "trace", sensitive keying meterial generated by the
wallet_propose
command could be written to the server's log file, if one was configured.This commit improves the log scrubbing code to account for the sensitive information generated by a
wallet_propose
.Important security consideration
We still caution everyone against executing this command on a server that they do not control: a malicious server operator could intercept the generated keypair, or operate a modified server that returns keypairs that are not securely generated.