Skip to content

Commit

Permalink
Merge pull request #279 from more-storage-implementations-and-benchmarks
Browse files Browse the repository at this point in the history
  • Loading branch information
warpfork committed Nov 11, 2021
2 parents 54be030 + e99bcd1 commit db4fbe6
Show file tree
Hide file tree
Showing 27 changed files with 2,333 additions and 34 deletions.
6 changes: 6 additions & 0 deletions linking/functions.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,9 @@ func (lsys *LinkSystem) LoadRaw(lnkCtx LinkContext, lnk datamodel.Link) ([]byte,
if err != nil {
return nil, err
}
if closer, ok := reader.(io.Closer); ok {
defer closer.Close()
}
var buf bytes.Buffer
if _, err := io.Copy(&buf, reader); err != nil {
return nil, err
Expand Down Expand Up @@ -174,6 +177,9 @@ func (lsys *LinkSystem) Fill(lnkCtx LinkContext, lnk datamodel.Link, na datamode
if err != nil {
return err
}
if closer, ok := reader.(io.Closer); ok {
defer closer.Close()
}
// TrustedStorage indicates the data coming out of this reader has already been hashed and verified earlier.
// As a result, we can skip rehashing it
if lsys.TrustedStorage {
Expand Down
11 changes: 7 additions & 4 deletions linking/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,11 @@ type (
// and hash in the Link parameter before returning.
// (This is something that the LinkSystem composition will handle if you're using it.)
//
// Some implementations of BlockWriteOpener and BlockReadOpener may be
// found in the storage package. Applications are also free to write their own.
// BlockReadOpener can also be created out of storage.ReadableStorage and attached to a LinkSystem
// via the LinkSystem.SetReadStorage method.
//
// Users of a BlockReadOpener function should also check the io.Reader
// for matching the io.Closer interface, and use the Close function as appropriate if present.
BlockReadOpener func(LinkContext, datamodel.Link) (io.Reader, error)

// BlockWriteOpener defines the shape of a function used to open a writer
Expand Down Expand Up @@ -118,8 +121,8 @@ type (
// and when the BlockWriteCommiter is called, it will flush the writes
// and then use a rename operation to place the tempfile in a permanent path based the Link.)
//
// Some implementations of BlockWriteOpener and BlockReadOpener may be
// found in the storage package. Applications are also free to write their own.
// BlockWriteOpener can also be created out of storage.WritableStorage and attached to a LinkSystem
// via the LinkSystem.SetWriteStorage method.
BlockWriteOpener func(LinkContext) (io.Writer, BlockWriteCommitter, error)

// BlockWriteCommitter defines the shape of a function which, together
Expand Down
125 changes: 125 additions & 0 deletions storage/README_adapters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
Storage Adapters
================

The go-ipld-prime storage APIs were introduced in the v0.14.x ranges of go-ipld-prime,
which happened in fall 2021.

There are many other pieces of code in the IPLD (and even more so, the IPFS) ecosystem
which predate this, and have interfaces that are very _similar_, but not quite exactly the same.

In order to keep using that code, we've built a series of adapters.

You can see these in packages beneath this one:

- `go-ipld-prime/storage/bsadapter` is an adapter to `github.com/ipfs/go-ipfs-blockstore`.
- `go-ipld-prime/storage/dsadapter` is an adapter to `github.com/ipfs/go-datastore`.
- `go-ipld-prime/storage/bsrvadapter` is an adapter to `github.com/ipfs/go-blockservice`.

Note that there are also other packages which implement the go-ipld-prime storage APIs,
but are not considered "adapters" -- these just implement the storage APIs directly:

- `go-ipld-prime/storage/memstore` is a simple in-memory storage system.
- `go-ipld-prime/storage/fsstore` is a simple filesystem-backed storage system
(comparable to, and compatible with [flatfs](https://pkg.go.dev/github.com/ipfs/go-ds-flatfs),
if you're familiar with that -- but higher efficiency).

Finally, note that there are some shared benchmarks across all this:

- check out `go-ipld-prime/storage/benchmarks`!


Why structured like this?
-------------------------

### Why is there adapter code at all?

The `go-ipld-prime/storage` interfaces are a newer generation.

A new generation of APIs was desirable because it unifies the old APIs,
and also because we were able to improves and update several things in the process.
(You can see some of the list of improvements in https://github.com/ipld/go-ipld-prime/pull/265,
where these APIs were first introduced.)
The new generation of APIs avoids several types present in the old APIs which forced otherwise-avoidable allocations.
(See notes later in this document about "which adapter should I use" for more on that.)
Finally, the new generation of APIs is carefully designed to support minimal implementations,
by carefully avoiding use of non-standard-library types in key API definitions,
and by keeping most advanced features behind a standardized convention of feature detection.

Because the newer generation of APIs are not exactly the same as the multiple older APIs we're unifying and updating,
some amount of adapter code is necessary.

(Fortunately, it's not much! But it's not "none", either.)

### Why have this code in a shared place?

The glue code to connect `go-datastore` and the other older APIs
to the new `go-ipld-prime/storage` APIs is fairly minimal...
but there's also no reason for anyone to write it twice,
so we want to put it somewhere easy to share.

### Why do the adapters have their own go modules?

A separate module is used because it's important that go-ipld-prime can be used
without forming a dependency on `go-datastore` (or the other relevant modules, per adapter).

We want this so that there's a reasonable deprecation pathway -- it must be
possible to write new code that doesn't take on transitive dependencies to old code.

(As a bonus, looking at the module dependency graphs makes an interestingly
clear statement about why minimal APIs that don't force transitive dependencies are a good idea!)

### Why is this code all together in this repo?

We put these separate modules in the same git repo as `go-ipld-prime`... because we can.

Technically, neither the storage adapter modules nor the `go-ipld-prime` module depend on each other --
they just have interfaces that are aligned with each other -- so it's very easy to
hold them as separate go modules in the same repo, even though that can otherwise sometimes be tricky.

You may want to make a point of pulling updated versions of the storage adapters that you use
when pulling updates to go-ipld-prime, though.

### Could we put these adapters upstream into the other relevant repos?

Certainly!

We started with them here because it seemed developmentally lower-friction.
That may change; these APIs could move.
This code is just interface satisfaction, so even having multiple copies of it is utterly harmless.


Which of `dsadapter` vs `bsadapter` vs `bsrvadapter` should I use?
------------------------------------------------------------------

None of them, ideally.
A direct implementation of the storage APIs will almost certainly be able to perform better than any of these adapters.
(Check out the `fsstore` package, for example.)

Failing that: use the adapter matching whatever you've got on hand in your code.

There is no correct choice.

`dsadapter` suffers avoidable excessive allocs in processing its key type,
due to choices in the interior of `github.com/ipfs/go-datastore`.
It is also unable to support streaming operation, should you desire it.

`bsadapter` and `bsrvadapter` both also suffer overhead due to their key type,
because they require a transformation back from the plain binary strings used in the storage API to the concrete go-cid type,
which spends some avoidable CPU time (and also, at present, causes avoidable allocs because of some interesting absenses in `go-cid`).
Additionally, they suffer avoidable allocs because they wrap the raw binary data in a "block" type,
which is an interface, and thus heap-escapes; and we need none of that in the storage APIs, and just return the raw data.
They are also unable to support streaming operation, should you desire it.

It's best to choose the shortest path and use the adapter to whatever layer you need to get to --
for example, if you really want to use a `go-datastore` implementation,
*don't* use `bsadapter` and have it wrap a `go-blockstore` that wraps a `go-datastore` if you can help it:
instead, use `dsadapter` and wrap the `go-datastore` without any extra layers of indirection.
You should prefer this because most of the notes above about avoidable allocs are true when
the legacy interfaces are communicating with each other, as well...
so the less you use the internal layering of the legacy interfaces, the better off you'll be.

Using a direct implementation of the storage APIs will suffer none of these overheads,
and so will always be your best bet if possible.

If you have to use one of these adapters, hopefully the performance overheads fall within an acceptable margin.
If not: we'll be overjoyed to accept help porting things.
3 changes: 1 addition & 2 deletions storage/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,7 @@ type WritableStorage interface {
// --- streaming --->

type StreamingReadableStorage interface {
// Note that the returned io.Reader may also be an io.ReadCloser -- check for this.
GetStream(ctx context.Context, key string) (io.Reader, error)
GetStream(ctx context.Context, key string) (io.ReadCloser, error)
}

// StreamingWritableStorage is a feature-detection interface that advertises support for streaming writes.
Expand Down
40 changes: 40 additions & 0 deletions storage/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
benchmarks
==========

This is a small module that pulls in a bunch of storage implementations,
as well as legacy implementations via the adapter modules,
and benchmarks all of them on the same benchmarks.

There's no reason to import this code,
so the go.mod file uses relative paths shamelessly.
(You can create your own benchmarks using the code in `../tests`,
which contains most of the engine; this package is just tables of setup.)


What variations do the benchmarks exercise?
------------------------------------------

- the various storage implementations!
- in some cases: variations of parameters to individual storage implementations. (TODO)
- puts and gets. (TODO: currently only puts.)
- various distributions of data size. (TODO)
- block mode vs streaming mode. (TODO)
- end-to-end use via linksystem with small cbor objects. (TODO)
- (this measures a lot of things that aren't to do with the storage itself -- but is useful to contextualize things.)

Running the benchmarks on variations in hardware and filesystem may also be important!
Many of these storage systems use the disk in some way.


Why is the module structured like this?
----------------------------------------

Because many of the storage implementations are also their own modules,
and we don't want to have the go-ipld-prime module pull in a huge universe of transitive dependencies.

See similar discussion in `../README_adapters.md`.

It may be worth pulling this out into a new git repo in the future,
especially if we want to add more and more implementations to what we benchmark,
or develop additional tools for deploying the benchmark on varying hardware, etc.
For now, it incubates here.
13 changes: 13 additions & 0 deletions storage/benchmarks/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
module github.com/ipld/go-ipld-prime/storage/benchmarks

go 1.16

replace github.com/ipld/go-ipld-prime => ../..

replace github.com/ipld/go-ipld-prime/storage/dsadapter => ../dsadapter

require (
github.com/ipfs/go-ds-flatfs v0.4.5
github.com/ipld/go-ipld-prime v0.12.3
github.com/ipld/go-ipld-prime/storage/dsadapter v0.0.0-20211022093231-ebf675a9bd6d
)
Loading

0 comments on commit db4fbe6

Please sign in to comment.