-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RPC command shard crawl (RIPD-1663) #2697
Conversation
Jenkins Build SummaryBuilt from this commit Built at 20181010 - 16:50:50 Test Results
|
if ((seq != 0) && (seq >= minLedger_) && (seq <= maxLedger_) && | ||
(sanity_.load() == Sanity::sane)) | ||
return true; | ||
if (std::find(recentLedgers_.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider
return std::find(recentLedgers_begin.(), recentLedgers_.end(), hash) != recentLedgers.end());
instead of branching and then returning true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not found we want to continue and check the shard store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I missed that. Sorry!
src/ripple/overlay/impl/PeerImp.cpp
Outdated
} | ||
return {}; | ||
std::lock_guard<std::mutex> l {shardInfoMutex_}; | ||
auto it {shardInfo_.find(publicKey_)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this can be const
.
src/ripple/overlay/impl/PeerImp.h
Outdated
#include <cstdint> | ||
#include <deque> | ||
#include <queue> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the extra newline here.
f58054e
to
deb0384
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. I left some comments, most of which are at your discretion; a couple will require changes.
message.set_nodepubkey(publicKey.data(), publicKey.size()); | ||
message.set_shardindexes(std::to_string(shardIndex)); | ||
app_.overlay().foreach(send_always( | ||
std::make_shared<Message>(message, protocol::mtSHARD_INFO))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't arbitrarily send this message out to other peers; if they don't support mtSHARD_INFO
it will result in the closing their connection to this server.
We need to know if a peer supports sharding before we send them shard-related messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nbougalis I can't find the code that disconnects upon receiving an unknown message and I've been unable to reproduce that behavior using the tip of develop. Peerimp::onMessageUnknown
is called by invokeProtocolMessage
when an unknown message is received but the function is just a stub with a TODO
. It seems someone intended on adding the behavior or I did I miss something else entirely? Thanks!
// Relay request to active peers | ||
protocol::TMGetShardInfo tmGS; | ||
tmGS.set_hops(hops); | ||
foreach(send_always(std::make_shared<Message>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern as above: we can't just send this to every peer we have unless we are prepared to have that connection close if they aren't running a version capable of understanding TMGetShardInfo
.
if (! version.empty ()) | ||
pv[jss::version] = version; | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra scope? If it's lifetime management, why not just change to:
if(auto version = sp->getVersion())
pv[jss::version] = std::move(version);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nbougalis a std::string
won't convert to a bool. I'm in favor of the extra scope because of the move. As a side note, this would be a good place for c++-17's init-statements for if when we move to 17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, d'oh.
// Request shard info from peer | ||
protocol::TMGetShardInfo tmGS; | ||
tmGS.set_hops(0); | ||
send(std::make_shared<Message>(tmGS, protocol::mtGET_SHARD_INFO)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here.
@@ -64,6 +64,8 @@ TrafficCount::category TrafficCount::categorize ( | |||
|
|||
if ((type == protocol::mtMANIFESTS) || | |||
(type == protocol::mtENDPOINTS) || | |||
(type == protocol::mtGET_SHARD_INFO) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider adding a special category for shard-related data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shard info gathered lives in the overlay and peers. At this time, I am not sure there is a real benefit to adding a new category, although I can see that changing as we add more shard centric functionality. I am on the fence. If you feel strongly about it, the swing vote would defintely persuade me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Fine as is right now.
src/ripple/proto/ripple.proto
Outdated
@@ -8,6 +8,8 @@ enum MessageType | |||
mtPING = 3; | |||
mtPROOFOFWORK = 4; | |||
mtCLUSTER = 5; | |||
mtGET_SHARD_INFO = 10; | |||
mtSHARD_INFO = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These messages were previously used for peerfinder messages; while those messages are obsolete (and I doubt that any server out there is still sending or receiving them) I think we should avoid reusing message identifiers.
std::uint32_t hops {0}; | ||
if (auto const& jv = context.params[jss::limit]) | ||
{ | ||
if (!(jv.isUInt() || (jv.isInt() && jv.asInt() >= 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just hard to read as is. My first read-through, I thought this was a bug, and you meant to have < 0
instead of >= 0
. It's unfortunate that our JSON library forces us to write such ugly code, or that we don't have a good way to enforce a schema.
No change needed, but if you can do something better, that'd be awesome.
hops = std::min(jv.asUInt(), hopLimit); | ||
} | ||
|
||
bool const pubKey {context.params.isMember(jss::public_key) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a random thought; no change needed: I get that uniform initialization is all the rage these days, but honestly, it's such an awkward, ugly syntax. I guess it highlights the difference between an assignment and an instantiation, but ugh...
{ | ||
// Prevent crawl spamming | ||
clock_type::time_point const last(csLast_.load()); | ||
if (duration_cast<seconds>(clock_type::now() - last) > 60s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the duration_cast
needed here? I think that @HowardHinnant's standardese magic ensures that chrono
will do the right thing even in the absence of the cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compilers are happy without it and I am all for less.
src/ripple/rpc/impl/Handler.cpp
Outdated
@@ -104,6 +104,7 @@ Handler const handlerArray[] { | |||
{ "submit_multisigned", byRef (&doSubmitMultiSigned), Role::USER, NEEDS_CURRENT_LEDGER }, | |||
{ "server_info", byRef (&doServerInfo), Role::USER, NO_CONDITION }, | |||
{ "server_state", byRef (&doServerState), Role::USER, NO_CONDITION }, | |||
{ "crawl_shards", byRef (&doCrawlShards), Role::USER, NO_CONDITION }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this should be Role::USER
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should only admins have access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, it seems requiring an admin role is a good idea. We can always reduce the requirement in a future release.
The nonunity build has a link error: |
|
||
// Update peers with new shard index | ||
protocol::TMShardInfo message; | ||
auto const& publicKey {app_.nodeIdentity().first}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like auto
here. Rational: both public key and secret key have a similar interface (replacing first
with second
here does not trigger a compiler error). Using auto
here loses the type check of mistaking a public key with a secret key. It's especially important when the keypair is stored in a tuple, and the key is retrieved with first
and second
where a misuse is harder to spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
|
||
// Combine the shard info from peers and their sub peers | ||
hash_map<PublicKey, PeerImp::ShardInfo> peerShardInfo; | ||
for_each([&](std::shared_ptr<PeerImp>&& peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make peer
an rvalue ref here? Since it's not moved from I'd just const&
(if it were moved from I'd still change it to a value type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, no need for the rvalue ref.
if (! version.empty ()) | ||
pv[jss::version] = version; | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nbougalis a std::string
won't convert to a bool. I'm in favor of the extra scope because of the move. As a side note, this would be a good place for c++-17's init-statements for if when we move to 17.
if (! shards.empty()) | ||
pv[jss::complete_shards] = shards; | ||
if (auto shardIndexes = sp->getShardIndexes()) | ||
pv[jss::complete_shards] = to_string(shardIndexes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't shardIndexes
a boost::optional
? How does that compile? I expected to see to_string(*shardIndexes)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be dereferenced, will change.
Its works now because to_string
dereferences the argument (operator *()
) when it streams it to an std::ostringstream.
@@ -123,6 +123,13 @@ class OverlayImpl : public Overlay | |||
std::atomic <uint64_t> peerDisconnects_ {0}; | |||
std::atomic <uint64_t> peerDisconnectsCharges_ {0}; | |||
|
|||
// Last time we crawled peers for shard info | |||
std::atomic<std::chrono::seconds> csLast_{std::chrono::seconds{0}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does cs
stand for? "Crawled shards"? I'm OK with the member name, but I'd like a quick comment to define what cs
is meant to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is a duration and not a time_point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::atomic
doesn't play well time_point. I can't remember the reason why. It may have been that atomic requires trivially copyable types, and time_point isn't trivially copyable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paging @HowardHinnant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this happening, and I don't recall the reason either. One can always load the duration into a time_point, do the computations, and then extract the duration from the time_point for storing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HowardHinnant Yeah, that is the solution I went with.
src/ripple/overlay/impl/PeerImp.cpp
Outdated
if (!shards.empty()) | ||
{ | ||
protocol::TMShardInfo reply; | ||
auto const& publicKey {app_.nodeIdentity().first}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the other file for using auto
for keys. I'd rather explictly call out the type here.
src/ripple/overlay/impl/PeerImp.cpp
Outdated
ShardInfo shardInfo; | ||
shardInfo.endpoint = address; | ||
shardIndexes = &(shardInfo_.emplace(std::move(publicKey), | ||
std::move(shardInfo)).first->second.shardIndexes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
publicKey
is const, so this move won't do anything. In addition, the public key type doesn't benefit from moves anyway (it always needs to copy the buffer).
I'd also break out the result of emplace
into its own variable. There's a lot going on in this line - esp with the first->second
that it's easy to lose track of where I am.
@@ -180,7 +197,6 @@ message TMStatusChange | |||
optional uint64 networkTime = 6; | |||
optional uint32 firstSeq = 7; | |||
optional uint32 lastSeq = 8; | |||
optional string shardSeqs = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know protobufs that well. Is it OK to remove a field like this? What happens if a peer sends a message with that field set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is OK to remove this field. Non-required fields can be removed, as long as the field number is not used again in the updated message type. (https://developers.google.com/protocol-buffers/docs/proto) Being able to update old message types is one of the neat features about protocol buffers.
|
||
namespace ripple { | ||
|
||
static std::uint32_t constexpr hopLimit = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the rippled code I see uses an "west" constexpr
. I don't think we have a style rule for it though.
jvResult[jss::public_key] = toBase58( | ||
TokenType::NodePublic, context.app.nodeIdentity().first); | ||
jvResult[jss::complete_shards] = std::move( | ||
shardStore->getCompleteShards()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a std::move
here. The return value is already an rvalue ref
Since I didn't quickly see it when skimming through - this does make sure not to leak pubkeys or peer IDs of private peers, right? What does the output of this command look like and why is it necessary (especially for users) to run it? |
src/ripple/net/impl/RPCCall.cpp
Outdated
@@ -1146,6 +1146,7 @@ class RPCParser | |||
{ "submit_multisigned", &RPCParser::parseSubmitMultiSigned, 1, 1 }, | |||
{ "server_info", &RPCParser::parseServerInfo, 0, 1 }, | |||
{ "server_state", &RPCParser::parseServerInfo, 0, 1 }, | |||
{ "shards", &RPCParser::parseAsIs, 0, 0 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this CLI command and it said command not supported...shouldn't it be crawl_shards
to match the RPC handler name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be crawl_shards
thanks!
@@ -127,6 +127,23 @@ message TMCluster | |||
repeated TMLoadSource loadSources = 2; | |||
} | |||
|
|||
// Request info on shards held | |||
message TMGetShardInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high level question - why not implement this using strictly http methods? If each server provided its shard info (either in a new API or in server_info, for example) -- that combined with peers
should be enough to gather this same data, right?
Adding peer messages for something that is (1) strictly informational and (2) initiated by user request - seems less than ideal. If nodes needed to know the state of shards on its peers for operational/internal reasons, then it makes sense as part of the peer exchange, but seems like already had some of that info in the status message. Was that peer status message shard info not needed or not working out for some other reason?
If we did provide the basic (per server) info via one or more RPC methods, then the crawler could be written in any language with a decent http/json client, and we could of course add a crawler method to our own RPC impl. if it's broadly useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data could also be added to the /crawl
endpoint...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old HTTPS crawl command provides shard info. Using that, a crawler can be written in any language and provide similar data.
Nodes need to know about the shards their peers hold. That information is used in the ledger acquire process for things like history backfilling. Knowing what the surrounding network has will also be beneficial when we enable IPFS, direct peer shard transfers or improving the index selection algorithm. We currently share some info via the status message, but it is not ideal. That design is limited and unable to discover shards beyond immediate peers. So the status field was migrated to a new message that overcomes the deficiency. But the actual shard index information is still packaged and used in the same manner. It is worth noting that peers automatically exchange shard info with each other upon connecting. They also update their peers with changes when necessary. Those exchanges are not initiated by user requests. The RPC may cause shard info caches to refresh but the info retrieved is utilized by the node. In actuality, this PR may be more about improving the method in which peers share their shard info on the network. The command allows sharing that info with users but at this time I can't say how useful that will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable sharing shards via IPFS in a meaningful way, you'll need a stable representation of shard data first (see #2688), or you'll need to invest quite a bit of work into writing an IPFS native abstraction of a SHAmap (the relevant technique is called IPLD, there's examples for git or Ethereum available). If you go down that route, then this might actually obsolete the work on shards anyways, since all of history could be stored, accessed and shared that way.
I'd personally rather see a way to get rippled to sync a chain of ledger headers down to 32570 first (to be able to verify shard contents) before enabling them to be shared with peers or even peers of peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkusTeufelberger The /crawl
endpoint has the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying.
@MarkusTeufelberger the existing
|
deb0384
to
05275ee
Compare
In 1.2.0-b3 |
Adds an RPC that crawls the network to find node stored shards. The command accepts two optional fields,
pubkey
andlimit
.pubkey
defaults to false and if set, will include the public keys of each node crawled.limit
defaults to 0 and determines the maximum number of hops the crawler will attempt. Since the number of hops can grow the result exponentially, internallylimit
is capped at 3. If the average node has 10 peers, alimit
of three will incur visiting up to 10k nodes.The RPC takes the following form
For testing purposes, the RPC can be issued using a browser and this site http://www.websocket.org/echo.html
{"command": "crawl_shards" }
or from the command line using curl eg.
curl -X POST -d '{ "method" : "crawl_shards", "params" : [ { "limit" : 2 } ] }' http://localhost:5005
A request message originating from rippled will broadcast to its peers and propagate. Each time the message is relayed from one peer to another (a hop), the relaying peer adds the ID of the requesting peer ID to a chain. This chain is used to route the replies of each peer back to the original requestor. The final link in the chain is referred to as the last link. As an example, a two-hop crawl in the diagram could result in the following chain [id 11]-[id 5]
Sample output: