-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add support for multiple blockstores #3257
Conversation
67da189
to
3cf5695
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for doing this at the blockstore level? I thought you were doing this with a multi-datastore type thing?
} | ||
|
||
func NewBlockstoreWPrefix(d ds.Batching, prefix string) *blockstore { | ||
if prefix == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like this, lets force the user to give valid input, in the function above this we can just pass the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
if err == nil && exists { | ||
return nil // already stored. | ||
} | ||
// Note: The Has Check is now done by the MultiBlockstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree its probably okay to remove this and do it intentionally in a certain place (instead of at every layer). But this gets a bit weird... For example, if someone just wants to use the blockstore on its own, they don't get this nice optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about that? MultiBlockstore just calls Has method of Blockstore but it doesn't mean that Blockstore (on Datastore) will check the Has of datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE: ahh, you call an explicit Has
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can I take it that this is resolved?
@@ -104,11 +104,11 @@ func TestHasIsBloomCached(t *testing.T) { | |||
block := blocks.NewBlock([]byte("newBlock")) | |||
|
|||
cachedbs.PutMany([]blocks.Block{block}) | |||
if cacheFails != 2 { | |||
t.Fatalf("expected two datastore hits: %d", cacheFails) | |||
if cacheFails != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this was correct, the change was due to an implementation detail. Unfortually, I can't remember why. If it is important enough I will spend an hour or two looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always weirds me out when i see tests changing. Maybe @Kubuxu is more familiar with this and can check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to because PutMany
on default blockstore was accessing datastore two times, once checking Has
for a key and then calling Put
.
This means that now we don't check the datastore with Has
we just Put
the value.
https://github.com/ipfs/go-ipfs/pull/3257/files#r80438802
@@ -557,6 +565,27 @@ func (r *FSRepo) Datastore() repo.Datastore { | |||
return d | |||
} | |||
|
|||
func (r *FSRepo) DirectMount(prefix string) ds.Datastore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this DatastoreAt
or GetMount
. DirectMount
just leaves me confused about what this is
To better interact with caching. In particular the bloom filter. |
func (bs *multiblockstore) Locate(key key.Key) []LocateInfo { | ||
res := make([]LocateInfo, 0, len(bs.mounts)) | ||
for _, m := range bs.mounts { | ||
_, err := m.Blocks.Get(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use Has here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, the problem is with the filestore simply calling Has() will not guarantee the block is available as the backing file might have changed or no longer be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be on filestore to figure out, IMO.
Why should we load X MiB into memory from slow harddrive if other blockstores can figure it out without loading whole content or even from Has caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allright. I agree will fix this and the Filestore's Has() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing is that Get() will return more information than Has() for the filestore. In particular it will return if the key doesn't exist, or if the key exists but there was a problem reconstructing the block.
I am holding off on this change for a bit while I think about it.
} | ||
|
||
func (bs *multiblockstore) Put(blk blocks.Block) error { | ||
// Has is cheaper than Put, so see if we already have it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is misleading. You don't check Has before Put because it is cheaper..
If you don't do it, you can end up with data being duplicated in two blockstores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your right, I will fix it to be clearer.
out <- key | ||
} | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @whyrusleeping agrees with me I would open a goroutine per BS and make them pipe from one AllKeysChan to one external. This way if first BS is slow and second is fast, the first won't slow down whole process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if err == nil { | ||
return blk, nil | ||
} | ||
if firstErr == nil || firstErr == ErrNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace it with firstErr == nil && err != ErrNotFound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do that than firstErr could end up being nil even if the block was not found. The idea is to return ErrNotFound only if that is the case for all the blockstore's. If not than return the more serious error.
1be4ba9
to
145180a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just feels very complicated to me. not sure how best to make it less so. It might help if i look at exactly what is needed by the filestore here
func (bs *multiblockstore) Locate(c *cid.Cid) []LocateInfo { | ||
res := make([]LocateInfo, 0, len(bs.mounts)) | ||
for _, m := range bs.mounts { | ||
_, err := m.Blocks.Get(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use Has
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that. Not 100% happy with it as filestore Has doesn't really guarantee the block is available (and changing it so it does will make the Has() call really expensive), but its not a major problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I partly take the Has() comment back. The filestore Has() needs fixing or the Put below needs fixing.
return res | ||
} | ||
|
||
func (bs *multiblockstore) Put(blk blocks.Block) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if i add a file with the filestore, then i add another file normally that overlaps a block with it. one block from my 'normal' add will be referencing a file on disk?
That feels odd to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but rather difficult to avoid.
out <- key | ||
} | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
func RmBlocks(mbs bs.MultiBlockstore, pins pin.Pinner, out chan<- interface{}, cids []*cid.Cid, opts RmBlocksOpts) error { | ||
prefix := opts.Prefix | ||
if prefix == "" { | ||
prefix = mbs.Mounts()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this only remove from one blockstore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
if err != nil { | ||
return err | ||
} | ||
|
||
mounts := []bstore.Mount{{fsrepo.CacheMount, cbs}} | ||
|
||
if n.Repo.DirectMount(fsrepo.FilestoreMount) != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filestore mount? does this do anything yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this P.R. Looks like I missed that one when factoring out the code.
@@ -78,6 +80,7 @@ You can now refer to the added file in a gateway, like so: | |||
cmds.BoolOption(hiddenOptionName, "H", "Include files that are hidden. Only takes effect on recursive add.").Default(false), | |||
cmds.StringOption(chunkerOptionName, "s", "Chunking algorithm to use."), | |||
cmds.BoolOption(pinOptionName, "Pin this object when adding.").Default(true), | |||
cmds.BoolOption(allowDupName, "Add even if blocks are in non-cache blockstore.").Default(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really complicates things... thoughts @Kubuxu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i brought this up in IRC.
DAG merkledag.DAGService // the merkle dag service, get/add objects. | ||
Resolver *path.Resolver // the path resolution system | ||
Peerstore pstore.Peerstore // storage for other Peer instances | ||
Blockstore bstore.MultiBlockstore // the block store (lower level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be an interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should be an interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the blockstore should just be of type Blockstore
. That way its easy to swap out with any other blockstore implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping I did not design the MultiBlockstore to be able to be swapped out. (As I see it it, it was designed to allow use of other Blockstores so why would you want to swap it out, but I can understand your viewpoint) If that is something you want it is likely possible. I image I can have something done in a few days.
func openDefaultDatastore(r *FSRepo) (repo.Datastore, error) { | ||
const ( | ||
RootMount = "/" | ||
CacheMount = "/blocks" // needs to be the same as blockstore.DefaultPrefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this called the cache mount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #3119.
Factored out of ipfs#3257 (Add support for multiple blockstores). License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
Each datastore is mounted under a different mount point and a multi-blockstore is used to check each mount point for the block. The first mount checked of the multi-blockstore is considered the "cache", all others are considered read-only. This implies that the garbage collector only removes block from the first mount. This change also factors out the pinlock from the blockstore into its own structure. Only the multi-datastore now implements the GCBlockstore interface. In the future this could be separated out from the blockstore completely. For now caching is only done on the first mount, in the future this could be reworked. The bloom filter is the most problematic as the read-only mounts are not necessary immutable and can be changed by methods outside of the blockstore. Right now there is only one mount, but that will soon change once support for the filestore is added. License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
This option adds a files the the primary blockstore even if the block is in another blockstore such as the filestore. License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
145180a
to
008de00
Compare
This is no longer needed |
@whyrusleeping I beg to differ. Something like this might still be needed. For example if you want multiple ipfs-packs in a single node. |
Alright, I'll keep the branch around (its also archived on the branch archive repo). But keeping the PR closed until we have an actionable usecase need for it. |
Closes #3119.
Required for #2634.