Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough draft of SectorSet #333

Closed
wants to merge 1 commit into from
Closed

Rough draft of SectorSet #333

wants to merge 1 commit into from

Conversation

ZenGround0
Copy link
Contributor

@ZenGround0 ZenGround0 commented Jun 7, 2019

Plowing ahead with these assumptions I've drafted a rough draft of the SectorSet spec. Please take a look.

The prose is not great or final-spec material but I'm hoping it gets across the basics of what I'm planning to implement based on #116 so that we can sync state and catch anything I'm missing or address inefficiencies.

@@ -221,7 +221,107 @@ Bitfields are a set of bits encoded using a custom run length encoding: rle+. r

#### `SectorSet`

TODO
The `SectorSet` is an integer set implemented with a simple array mapped tree.
Integer indexes range from 0 to infinity (TODO practical bounds / bounds implied by encoding?).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 64 bit int limitation OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth setting some maximum based on the maximum number of sectors we expect to impose on miners

the `Pointer`. There is also a `RootNode`. The `Node` is as follows:
```go
type Node struct {
Pointers [S]*Pointer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new node always allocates S empty Pointers

```
The `Pointer` is serialized as a cbor object (major type 5). The `Value`
is serialized { in some way } with `v` as its object key. The `SectorMeta`
type carries all needed sector metadata (TODO, define a type for commD commR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this (commR, commD)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @laser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SectorMeta will need:

  • CommR so that we can generate PoSts (the generate_post routines takes as input all CommRs in the miner's proving set).
  • CommD for piece inclusion proof verification.

of nodes until reaching the leaf node and set the node's pointer Value at
`index % childRange` to the insert value.

##### Delete Value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of bitfields and constant sized arrays might make this prohibitively expensive as we have search the whole array for emptyness (we could include a simple count to make this better). Also note that as specced out here users can arrive at two identical sets with different representations. Id appeciate comments on the importance of addressing these issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating over some small constant number of items shouldnt be prohibitively expensive. Since we have to parse bytes into a datastructure anyways, we can keep track of the item count in memory (but not for the serialized version)

`index % childRange` to the insert value.

##### Delete Value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not covering range deletes here. Let me know if that's a requirement for v1.

the root that the `SectorSet` has capacity for its existing indices and `b`.
Pointers are then updated in these new nodes so that there is a path from
the new node with the highest height to the existing node pointed to by root.
Finally the root node updates to point to the node with highest height.
Copy link
Contributor Author

@ZenGround0 ZenGround0 Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as the SectorSet grows the first insert into the new region of the tree will start to require more and more overhead. Making this scale better probably requires a redesign of the datastructure (we could do something closer to the HAMT with more complexity to improve this). Is the current design acceptable or should we try to address this now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logarithmic scaling is about optimal here.

NodePointer Cid
}
```
The `RootNode` is serialized as a cbor object (major type 5) where `NodePointer`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think the rootnode needs to be serialized ever.

is serialized { in some way } with `v` as its object key. The `SectorMeta`
type carries all needed sector metadata (TODO, define a type for commD commR
pair and serialization). A `Pointer` is conceptually a union type, it can
carry a `Link` or a `Value` but never both in a well formed `SectorSet`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the object key used for the Link?

Pseudocode:
```go
func (rn *RootNode) Lookup(index int) (v Anything, error) {
return recLookup(rn.MaxDepth, index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesnt this mention the expansion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is just for lookup, i see

@@ -221,7 +221,107 @@ Bitfields are a set of bits encoded using a custom run length encoding: rle+. r

#### `SectorSet`

TODO
The `SectorSet` is an integer set implemented with a simple array mapped tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call the data structure AMT and say that a

type SectorSet {UInt:SectorMeta}<AMT>

@dignifiedquire
Copy link
Contributor

Moved content to ipld/specs#137 and the definition of the type into #367

@pooja pooja mentioned this pull request Jul 30, 2019
18 tasks
@pooja pooja mentioned this pull request Aug 27, 2019
22 tasks
@mishmosh
Copy link
Contributor

mishmosh commented Jan 6, 2020

The big overhaul in Sept/Oct means none of these changed files exist anymore. But, [at least some of] this prose is still valuable in these new locations:

Rendering: https://filecoin-project.github.io/specs/#systems__filecoin_mining__sector__sectorset
Code: https://github.com/filecoin-project/specs/tree/master/src/systems/filecoin_mining/sector

Proposal: let's create a new PR with the prose that's still true & useful from this, and close this one. @ZenGround0 do you prefer to do it yourself? @sternhenri/I can also create a first pass for you to review.

Also, #367 contains followup work.

@ZenGround0
Copy link
Contributor Author

I think this became AMT regardless this is very out of date

@ZenGround0 ZenGround0 closed this Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants