-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GIE/Runtime] Redesign PartitionerInfo
, ClusterInfo
, and Router
trait to better support parallel processing in Runtime
#2744
Conversation
Partitioner
trait to support parallel scanPartitioner
trait to support parallel scan
WholePartitions(Vec<u64>), | ||
// PartialPartitions indicates **partial partitions** to query, specified as `(i, n, partition_id)`, | ||
// means that to query the first `i`-th part out of `n` parts, of the partition with the given `partition_id`. | ||
PartialPartition(u32, u32, u64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这三个integer项目定义在struct里面。不然太不清楚了。
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2744 +/- ##
=======================================
Coverage 42.44% 42.44%
=======================================
Files 99 99
Lines 10654 10654
=======================================
Hits 4522 4522
Misses 6132 6132 Continue to review full report in Codecov by Sentry.
|
pub trait Router: Send + Sync + 'static { | ||
/// Given the element id and job_workers (number of workers per server), | ||
/// return the worker id that is going to do the query. | ||
fn route(&self, id: &ID, job_workers: usize) -> GraphProxyResult<u64>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the following suggestions for this api:
- Design a structure, such like
ClusterInfo
here, to include how the whole cluster is managed, including how many servers (we actually need this, currently, we simply leverage a pegasus static function to obtain this information, which couples implementation with this api) in the cluster and how many workers in a server. The ClusterInfo can be initiated while starting the server, and then pass into this function as a reference. - Make the item of
id
as an abstraction ofRouteData
, whereRouteData
implements the functionget_route_id()
. Then we probably can directly implementVertex
andEdge
(or evenID
) asRouteData
.ID
is a little bit ambiguous given we actually want to route any data, not justID
. - Make
u64
a type ofWorkerId
. - I think Graph partition information should also be aware to the
Router
, no? - Comment this structure as:
A `Router` is used to route the data to the destination worker so that it can be properly processed, especially
when the underlying data has been partitioned across the cluster. Given the partition information as well as how our cluster is managed (by `ClusterInfo`) and co-located with the graph data, we can implement the corresponding `route` function to guide the system to transfer the data to a proper destination worker.
For example, suppose our computer server contains 10 servers, each further forking 10 workers for processing queries. In addition, the graph is partitioned into these 10 servers by the following strategy: vertex of give ID is placed in the server with id i (0 to 9) i given ID % 10 == i, the vertex's adjacent edges are also placed with the vertex. Then the router can decide which worker should process the vertex of ID 25534 as follows:
- It first do `25534 % 10 == 4`, which means it must be routed to the 4-th server.
- Any worker in the 4-th server can process the vertex. Thus it randomly picks a worker, saying 5-th worker, which has ID 4 * 10 + 5 = 45.
- Then 45-th worker will be returned for routing this vertex.
…erInfo` trait for cluster, to get necessary info in Runtime.
/// A `PartitionInfo` is used to query the partition information when the data has been partitioned. | ||
pub trait PartitionInfo: Send + Sync + 'static { | ||
/// Given the data, return the id of the partition that holds the data. | ||
fn get_partition_id(&self, data: &ID) -> GraphProxyResult<PartitionId>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ID for data is not very nice. We may want to define trait PartitionedData
, even we may not need any function for them.
pub trait PartitionedData { }
impl PartitionedData for Vertex { }
impl PartitionedData for Edge { }
pub trait PartitionInfo: Send + Sync + 'static {
fn get_partition_id<D: PartitionedData>(&self, data: &Data) -> GraphProxyResult<PartitionId>;
}
/// - Then 45-th worker will be returned for routing this vertex. | ||
pub trait Router: Send + Sync + 'static { | ||
/// a route function that given the data, return the worker id that is going to do the query. | ||
fn route(&self, data: &ID, job_workers: usize) -> GraphProxyResult<WorkerId>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, ID
for data, not very well-designed. Consider:
pub trait RouteData {
/// Comment this
fn get_route_key(&self) -> ID;
}
impl RouteData for Vertex { ... }
impl RouteData for Edge { ... }
/// - It first do `25534 % 10 == 4`, which means it must be routed to the 4-th server. | ||
/// - Any worker in the 4-th server can process the vertex. Thus it randomly picks a worker, saying 5-th worker, which has ID 4 * 10 + 5 = 45. | ||
/// - Then 45-th worker will be returned for routing this vertex. | ||
pub trait Router: Send + Sync + 'static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub trait Router {
// Make PartitionInfo and ClusterInfo as type of Router to tell the implementation that Router needs these two information
type P: PartitionInfo;
type C: ClusterInfo;
}
fn route(&self, data: &ID, job_workers: usize) -> GraphProxyResult<WorkerId>; | ||
} | ||
|
||
pub struct DistributedDataRouter<P: PartitionInfo, C: ClusterInfo> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this DefaultRouter
} | ||
|
||
impl<P: PartitionInfo, C: ClusterInfo> Router for DistributedDataRouter<P, C> { | ||
fn route(&self, data: &i64, job_workers: usize) -> GraphProxyResult<u64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the above comment:
/// For example, suppose our computer server contains 10 servers, each further forking 10 workers for processing queries.
/// In addition, the graph is partitioned into these 10 servers by the following strategy:
/// vertex of give ID is placed in the server with id i (0 to 9) i given ID % 10 == i, the vertex's adjacent edges are also placed with the vertex.
/// Then the router can decide which worker should process the vertex of ID 25534 as follows:
/// - It first do `25534 % 10 == 4`, which means it must be routed to the 4-th server.
/// - Any worker in the 4-th server can process the vertex. Thus it randomly picks a worker, saying 5-th worker, which has ID 4 * 10 + 5 = 45.
/// - Then 45-th worker will be returned for routing this vertex.
to here.
b6fab15
to
0669b24
Compare
3012302
to
44b2821
Compare
44b2821
to
590521a
Compare
Partitioner
trait to support parallel scanPartitionerInfo
, ClusterInfo
, and Router
trait to support parallel query in Runtime
PartitionerInfo
, ClusterInfo
, and Router
trait to support parallel query in RuntimePartitionerInfo
, ClusterInfo
, and Router
trait to better support parallel processing in Runtime
… `Router`, and some refinement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What do these changes do?
Redesign
PartitionInfo
,ClusterInfo
, andRouter
trait to better support parallel processing in Runtime, where:PartitionInfo
is used to query the partition information when the data has been partitioned.ClusterInfo
is used to query the cluster information when the system is running on a cluster.Router
is used to route the data to the destination worker so that it can be properly processed, withPartitionInfo
andClusterInfo
as input.Related issue number
Fixes #2753