Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shard download and import RPC #2561

Closed
wants to merge 6 commits into from

Conversation

miguelportilla
Copy link
Contributor

@miguelportilla miguelportilla commented May 29, 2018

Adds an RPC that allows admins to download and import shard archives from HTTPS web servers. A shard archive is a tar of a complete shard directory compressed with lz4. The libarchive library was added to rippled for its multi-format archive and compression functionality. The pull request consists of four commits. The first adds a free function that uses libarchive to extract the archives (tar.lz4). The second commit adds an HTTPS client designed to download files using the boost beast library and asynchronous coroutines. The third commit adds hot shard import capability to the shard store. The fourth adds the actual RPC command allowing rippled to download and import archives.

Admin privilege is necessary for the RPC.
The RPC takes the following form

    {
      shards: [{index: <integer>, url: <string>}]
      validate: <bool> // optional, default is true
    }

For testing purposes I have created the following archives in NuDB and RocksDB formats.

NuDB format

{
  "command": "download_shard",
  "shards": [
    {"index": 1, "url": "https://dl.dropboxusercontent.com/s/bwfu5q3l1301hmk/1.tar.lz4"},
    {"index": 2, "url": "https://dl.dropboxusercontent.com/s/l4rgpqhod25twld/2.tar.lz4"},
    {"index": 5, "url": "https://dl.dropboxusercontent.com/s/nvi91jvpjh5ox32/5.tar.lz4"},
    {"index": 8, "url": "https://dl.dropboxusercontent.com/s/iogb048ggbzhzyn/8.tar.lz4"},
    {"index": 9, "url": "https://dl.dropboxusercontent.com/s/2vuznabw1pp9w8l/9.tar.lz4"}
  ]
}

RocksDB format

{
  "command": "download_shard",
  "shards": [
  	{"index": 1, "url": "https://dl.dropboxusercontent.com/s/ru8fp7qghgh6pd8/1.tar.lz4"},
  	{"index": 2, "url": "https://dl.dropboxusercontent.com/s/gjxcxzpwqtl8lz4/2.tar.lz4"},
  	{"index": 5, "url": "https://dl.dropboxusercontent.com/s/cm6grhsh8xlv27m/5.tar.lz4"},
  	{"index": 8, "url": "https://dl.dropboxusercontent.com/s/ah8h14zm8h1zqip/8.tar.lz4"},
  	{"index": 9, "url": "https://dl.dropboxusercontent.com/s/1uad5aiq81yfqom/9.tar.lz4"}
  ]
}

For testing purposes, the RPC can be issued using a browser and this site http://www.websocket.org/echo.html or from the command line using curl eg.

curl -X POST -d '{ "method" : "download_shard", "params" : [ { "shards" : [{"index": 1, "url": "https://dl.dropboxusercontent.com/s/bwfu5q3l1301hmk/1.tar.lz4"}] } ] }' http://localhost:5005

@scottschurr
Copy link
Collaborator

@miguelportilla, I'm honored by the review request. You should be aware that I'll be leaving for a C++ Standards Committee meeting on Friday and I won't be done with the meeting work until June 11th. So I'm unlikely to get started on the review work until June 11th. I'm fine with that, but if you want some review attention sooner than that you may want to select a different reviewer. Sorry for the inconvenience.

@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented May 29, 2018

Jenkins Build Summary

Built from this commit

Built at 20180807 - 23:52:53

Test Results

Build Type Log Result Status
rpm logfile 1034 cases, 0 failed, t: n/a PASS ✅
gcc.Debug
-Dcoverage=ON
logfile 1034 cases, 0 failed, t: 857s PASS ✅
docs logfile 1 cases, 0 failed, t: 2s PASS ✅
gcc.Release
-Dassert=ON,
MANUAL_TESTS=true
logfile 1101 cases, 0 failed, t: 931s PASS ✅
clang.Debug
-Dunity=OFF
logfile no test results, t: 171s FAIL 🔴
clang.Debug logfile 1034 cases, 0 failed, t: 281s PASS ✅
gcc.Debug
-Dunity=OFF
logfile no test results, t: 50s FAIL 🔴
gcc.Debug logfile 1034 cases, 0 failed, t: 292s PASS ✅
clang.Release
-Dassert=ON
logfile 1034 cases, 0 failed, t: 404s PASS ✅
gcc.Release
-Dassert=ON
logfile 1034 cases, 0 failed, t: 427s PASS ✅
gcc.Debug
-Dstatic=OFF
logfile 1034 cases, 0 failed, t: 294s PASS ✅
gcc.Debug
-Dstatic=OFF -DBUILD_SHARED_LIBS=ON
logfile 1034 cases, 0 failed, t: 295s PASS ✅
gcc.Debug,
NINJA_BUILD=true
logfile 1034 cases, 0 failed, t: 275s PASS ✅
msvc.Debug logfile 1031 cases, 0 failed, t: 541s PASS ✅
clang.Debug
-Dunity=OFF -Dsan=undefined,
PARALLEL_TESTS=false
logfile no test results, t: 146s FAIL 🔴
clang.Debug
-Dunity=OFF -Dsan=address,
PARALLEL_TESTS=false,
DEBUGGER=false
logfile no test results, t: 240s FAIL 🔴
msvc.Debug,
NINJA_BUILD=true
logfile 1031 cases, 0 failed, t: 627s PASS ✅
msvc.Debug
-Dunity=OFF
console no test results, t: n/a FAIL 🔴
msvc.Release logfile 1031 cases, 0 failed, t: 391s PASS ✅

@miguelportilla
Copy link
Contributor Author

@scottschurr Thanks for that, I'll see if I can get someone else to start before that.

@miguelportilla miguelportilla requested review from ximinez and removed request for scottschurr May 29, 2018 18:38
@MarkusTeufelberger
Copy link
Collaborator

Are nodes that are being imported checked/validated for being actually part of the current chain?

@miguelportilla
Copy link
Contributor Author

@MarkusTeufelberger Yes. By default, each imported shard is checked and validated as being part of the current chain. Because validating all nodes of the SHAmaps is expensive, the validation can be skipped (validate: <bool> // optional, default is true) if the source is trusted.

@MarkusTeufelberger
Copy link
Collaborator

Is there some deterministic way to generate a shard database? Both NuDB and RocksDB don't seem very deterministic to me regarding the storage format... it would be nice to have a way to generate/verify static historic files that contain shard contents. Probably out of scope for this PR though.

Maybe a bit more in scope - would HTTP transport be possible too? I could imagine hosting some extensive shard databases on IPFS (https://ipfs.io) and while it probably would be possible to also host a local mirror with https, it would likely be easier and secure enough to do so with http.

@miguelportilla
Copy link
Contributor Author

@MarkusTeufelberger Thanks for your feedback. HTTP transport can be supported. I have been investigating IPFS. Unfortunately, there isn't a c++ protocol implementation. There is, however, a c implementation that may be useful.

@MarkusTeufelberger
Copy link
Collaborator

It would be enough for now to have HTTP, since it is relatively easy to run a local instance that makes files available via HTTP (similar to maybe a FUSE filesystem)

boost::filesystem::path const& dst)
{
auto ar {archive_read_new()};
archive_read_support_format_tar(ar);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typical of c-APIs, all of the methods return status codes. Although I'm not sure what failure mode could cause these unchecked methods to fail, I wonder if we should at least wrap them in asserts to catch the totally unexpected? I just hate for any rc to be ignored if it's being returned...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will address.

archive_write_close(aw);
archive_write_free(aw);

if (!err.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an option to use unique_ptr to close/free these resources and it cleans-up the code here a little bit : mellery451@07415db if you want
to consider that approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good approach, will use.

complete();
}

}// ripple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this download class, but I'm wondering how it differs or overlaps with the functionality in HTTPClient.h/cpp - would there be an opportunity to merge the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference is that this class specifically downloads files where HTTPClient can't. It also uses the latest boost beast whereas HTTPClient does not use beast. In a future PR, HTTPClient could be replaced with this class.

auto const shardIndex {seqToShardIndex(seq)};
std::unique_lock<std::mutex> l(m_);
assert(init_);
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this scope block is especially helpful now that the lock is moved out, but it's your call...

Copy link
Contributor Author

@miguelportilla miguelportilla Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to avoid renaming the pre shard iterator. I am not sure if renaming the pre iterator and breaking the name convention is an improvement. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that makes sense.

auto fail = [&](std::string msg)
{
if (!msg.empty())
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you have single-line conditionals with and without braces here...I'd prefer it consistent either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single line conditionals using the JLOG macro make gcc unhappy. I'd prefer single line but the price is silencing the compiler on the warning.

return RPC::missing_field_error(jss::url);
parsedURL url;
if (!parseUrl(url, it[jss::url].asString()) ||
url.scheme != "https" ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for scheme in particular, it might be nice to return "only https supported" because I don't know if that would be obvious to me as a caller otherwise.

return RPC::make_param_error("invalid shard ids");

// Remove unsuitable shards
for (auto it = dCtx->shards.begin(); it != dCtx->shards.end();)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What might cause a shard to be "unsuitable" - would it be if it's already in the shardstore and/or is in progress via peernet fetch? Just wondering if there is any reason to report this info back to the user...like "we already have shards x,y" or "we're already working on z".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those cases make shard ids unsuitable. Keep in mind this is an admin command and the ids that will be downloaded are reported. I can add the omitted ids to the report but not sure if there is sufficient reason to. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at a minimum, I guess we should make a note to document the behavior - that is, if you ask for x,y,z only only get z back, then x and y were skipped for and see the logs for more info (?). I think the documentation approach is reasonable for this fairly specialized command.

dCtx->validate = context.params[jss::validate].asBool();
}
if (dCtx->validate &&
context.app.getOPs().getOperatingMode() != NetworkOPs::omFULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since validation happens sometime later (after download, unpack, and move), is the omFULL condition implicitly/explicitly checked again when the shard validation logic is called? To put it another way, what if the server falls out of sync between this point and when the shard is actually validated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state is checked again when the validation occurs. If we've fallen out of sync at the time of validation, the import is skipped for the id. You bring up a good point. It's probably a good idea to defer each import/validation attempt if not synced. That would also remove the need to be synced when the RPC is received.

CMakeLists.txt Outdated
@@ -156,6 +156,115 @@ use_protobuf()

setup_build_boilerplate()

include(ExternalProject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this EP doesn't work if you happen to set DESTDIR when to run make, so I've made a few fixes in mellery451@f861f7c. That commit also stores the EP in a dir separate from the binary dir, so it's slightly more likely to be reused and not rebuilt each time. Please let me know if you have any issues with it.

Copy link
Contributor

@mellery451 mellery451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions mostly - 👍 for me

@miguelportilla
Copy link
Contributor Author

@mellery451 I pushed several commits to address your feedback. One primary change is that the free handler function that was in the downloadshard RPC was moved into its own class. It is now not necessary to be synced when the RPC is received and shard validation will be deferred if not synced. The RPC error reporting was improved and the archive function checks return codes. There is one exception to the return code checking and that is in the free call. I suppose if we passed in a log ref we would be able to log it but not sure if its worth it.

Copy link
Contributor

@mellery451 mellery451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good.

auto it {archives_.find(shardIndex)};
if (it != archives_.end())
{
return url.scheme == it->second.scheme &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make sense to add operator == for parsedURL, but not a big deal since this is the only comparison in our code at this point.

ShardArchiveHandler::add(std::uint32_t shardIndex, parsedURL&& url)
{
assert(downloader_);
auto it {archives_.find(shardIndex)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super micro nit, but this can be const.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JoeLoser.

CMakeLists.txt Outdated
include(ExternalProject)
set(RIPPLED_NIH_LIBRARIES)
######################################################################
### NIH dep: lz4 ###
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does NIH stand for? "Not Invented Here"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - it's shorter than "third-party" or "external-dependency".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works! I can safely swap out "National Institutes of Health" which made much less sense. :)

@miguelportilla miguelportilla force-pushed the shard_transfer branch 3 times, most recently from 2f3202c to edb5c61 Compare July 3, 2018 16:15
@codecov-io
Copy link

codecov-io commented Jul 3, 2018

Codecov Report

Merging #2561 into develop will decrease coverage by 0.67%.
The diff coverage is 2.31%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2561      +/-   ##
===========================================
- Coverage    70.83%   70.15%   -0.68%     
===========================================
  Files          698      703       +5     
  Lines        54367    54883     +516     
===========================================
- Hits         38511    38505       -6     
- Misses       15856    16378     +522
Impacted Files Coverage Δ
src/ripple/nodestore/Backend.h 100% <ø> (ø) ⬆️
src/ripple/nodestore/DatabaseShard.h 0% <ø> (ø) ⬆️
src/ripple/net/impl/RPCCall.cpp 59.9% <0%> (-1.7%) ⬇️
src/ripple/basics/StringUtilities.h 79.31% <0%> (-16.53%) ⬇️
src/ripple/rpc/handlers/DownloadShard.cpp 0% <0%> (ø)
src/ripple/nodestore/backend/NullFactory.cpp 26.82% <0%> (ø) ⬆️
src/ripple/app/main/Main.cpp 45.32% <0%> (ø) ⬆️
src/ripple/app/main/Application.cpp 58.55% <0%> (ø) ⬆️
src/ripple/nodestore/impl/DatabaseShardImp.h 0% <0%> (ø) ⬆️
src/ripple/nodestore/impl/Shard.cpp 0.36% <0%> (-0.03%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 381a1b9...8ce82db. Read the comment docs.

jvParams[0u].size() == 0)
{
return rpcError(rpcINVALID_PARAMS);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried invoking download_shard on the command line, and I think this check is wrong.

First of all, the commands[] array on 1099 is already checking that there is at least one argument, so if I provide 0, it doesn't even get this far. Second, if there are arguments, I don't think jvParams[jss::shards] will ever be set.

Examples:

eah@badger:~/dev/rippled-merge (miguelportilla--shard_transfer=)$ rippled download_shard
Loading: "/home/eah/.config/ripple/rippled.cfg"
2018-Jul-09 23:14:28.490922982 HTTPClient:NFO Connecting to 127.0.0.1:5015

{
   "result" : {
      "error" : "invalidParams",
      "error_code" : 31,
      "error_message" : "Missing field 'shards'.",
      "request" : {
         "command" : "download_shard"
      },
      "status" : "error"
   }
}
eah@badger:~/dev/rippled-merge (miguelportilla--shard_transfer=)$ rippled download_shard 1
Loading: "/home/eah/.config/ripple/rippled.cfg"
2018-Jul-09 23:14:37.181363866 HTTPClient:NFO Connecting to 127.0.0.1:5015

{
   "result" : {
      "error" : "invalidParams",
      "error_code" : 31,
      "error_message" : "Missing field 'shards'.",
      "request" : {
         "command" : "download_shard",
         "params" : [ "1" ]
      },
      "status" : "error"
   }
}
eah@badger:~/dev/rippled-merge (miguelportilla--shard_transfer=)$ rippled download_shard 1 https://dl.dropboxusercontent.com/s/bwfu5q3l1301hmk/1.tar.lz4
Loading: "/home/eah/.config/ripple/rippled.cfg"
2018-Jul-09 23:15:19.958912395 HTTPClient:NFO Connecting to 127.0.0.1:5015

{
   "result" : {
      "error" : "invalidParams",
      "error_code" : 31,
      "error_message" : "Missing field 'shards'.",
      "request" : {
         "command" : "download_shard",
         "params" : [ "1", "https://dl.dropboxusercontent.com/s/bwfu5q3l1301hmk/1.tar.lz4" ]
      },
      "status" : "error"
   }
}
eah@badger:~/dev/rippled-merge (miguelportilla--shard_transfer=)$ rippled download_shard 1 https://dl.dropboxusercontent.com/s/bwfu5q3l1301hmk/1.tar.lz4 validate
Loading: "/home/eah/.config/ripple/rippled.cfg"
2018-Jul-09 23:15:24.651213680 HTTPClient:NFO Connecting to 127.0.0.1:5015

{
   "result" : {
      "error" : "invalidParams",
      "error_code" : 31,
      "error_message" : "Missing field 'shards'.",
      "request" : {
         "command" : "download_shard",
         "params" : [
            "1",
            "https://dl.dropboxusercontent.com/s/bwfu5q3l1301hmk/1.tar.lz4",
            "validate"
         ]
      },
      "status" : "error"
   }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ximinez thanks for testing and finding that, fixed in the latest commit.

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done yet, but here are some more comments so you don't have to sit around waiting while I go through the whole thing.

Also, I noticed something weird when looking at my rippled shards directory. Somewhere in the process, file creation is not respecting my umask. I haven't spotted where it's happening yet, but I thought I'd alert you to it. 1, 2, and 5 were downloaded using download_shards. The others were through the existing process.

$ ls -l ~/rippled/db/shards/
total 52
drwxrwxrwx 2 eah eah 4096 Jul 11 20:01 1
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 146
drwxrwxrwx 2 eah eah 4096 Jul 11 20:01 2
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 2002
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 232
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 402
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 487
drwxrwxrwx 2 eah eah 4096 Jul 11 20:01 5
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 538
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 587
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 661
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 791
drwxr-xr-x 2 eah eah 4096 Jul 11 20:01 834

$ umask
0022


// If odd number of params then 'validate' may have been specified
if (sz & 1)
jvResult[jss::validate] = jvParams[--sz].asBool();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to do any input validation. It's just checking that a value was passed. I would suggest that you explicitly check that the value is validate because if it's not, the user could have made an error (eg. provided another URL without an index), or passed in garbage. Either way, this may not be an optimal outcome. If it's not validate, that might justify an rpcINVALID_PARAMS.

}
// Check if greater or equal to the current shard
auto seqCheck = [&](std::uint32_t seq){
if (seq > earliestSeq() && shardIndex >= seqToShardIndex(seq))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this conditional smells to me. However unlikely it is, if the sequence is < earliestSeq, then this lambda is going to indicated things are fine, and it's only checked here. I think that would allow the shard to be prepped even if rippled is not synced. Is that what you want? (And if it is, could you add a comment explaining the reasoning?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we try to validate the shard index using the current ledger but only if available (synced). I'll document.

{
assert(!backend_);
using namespace boost::filesystem;
dir_ = dir / std::to_string(index_);
auto const newShard {!is_directory(dir_) || is_empty(dir_)};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if dir_ exists as a plain file? I believe that newShard would be true, so backend_->open() will try to create folders, which will fail, so open will throw, that will be caught and fail will be called. Since newShard is true, fail will delete the file. Now chances are the file doesn't belong there, and there's an argument it should be deleted, but my concern is that rippled will delete an arbitrary file that something else put there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directories are created exclusively for the internal use of shards by rippled and should not be manually modified. In the scenario you described, a file or sym link created by an external event in the shard root directory bearing an index name would be deleted. Although in practice that should not happen, I agree with your concern. Thanks, I will address the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that nobody should be in there, but people do weird things...

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting there... One more suggestion from today.

std::string(jss::validate), "a bool");
}
validate = context.params[jss::validate].asBool();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since validate defaults to true, there's no point in including it as an option for the command line, unless you change parseDownloadShard to default jvResult[jss::validate] to false if it's not specified. I do not endorse this.

In fact, what I'd suggest, is to change the command-line optional parameter to novalidate, and reverse the logic in parseDownloadShard. The justification is that if we want to validate by default, then that should be the default on the command line, too, and it should require extra action to disable validation. See also, the comment I left in parseDownloadShard about validating the provided value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to respond to your earlier comment and mention validate defaults to true. I like the novalidate suggestion, will do.

@miguelportilla miguelportilla force-pushed the shard_transfer branch 3 times, most recently from e415893 to c5a5ae3 Compare July 17, 2018 00:12
@miguelportilla
Copy link
Contributor Author

@ximinez feedback addressed. Thanks for the deep and thorough look. The directory permissions issue is tricky. Apparently, boost has a known issue creating directories recursively with create_directories and doesn't set permissions at creation. I changed the code to use create_directory hoping that it may help, please let me know if it does. Otherwise, I may have to add code to set permissions after creation, which is commonly frowned upon.

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty impressed by these new features. I left a couple more comments, but I'm approving the PR as a whole.

std::to_string(url.port.get_value_or(443)),
url.path,
11,
dstDir / url.path.substr(url.path.find_last_of("/\\") + 1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably pretty unlikely, but this computed directory name could cause some shenanigans if the url is poorly formed and not validated before being added. I'm thinking specifically of a case where new code calls add without the checks that are currently in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, fixed! Thanks.

++i;
else if (!iequals(jvParams[--sz].asString(), "novalidate"))
return rpcError(rpcINVALID_PARAMS);
jvResult[jss::validate] = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty clever. Allow just a little bit of room for the user not to do it perfectly. I like it.

@miguelportilla miguelportilla force-pushed the shard_transfer branch 2 times, most recently from df3ca8f to 35ff3af Compare July 18, 2018 00:39
@miguelportilla miguelportilla added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jul 18, 2018
Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good. Still 👍

@miguelportilla miguelportilla force-pushed the shard_transfer branch 2 times, most recently from 4b2c28d to 384c6b8 Compare August 7, 2018 21:11
@seelabs
Copy link
Collaborator

seelabs commented Aug 9, 2018

In 1.1.0-rc1

@seelabs seelabs closed this Aug 9, 2018
@miguelportilla miguelportilla deleted the shard_transfer branch September 6, 2018 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants