Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: introducing sqlite as a datastore #605

Closed
raulk opened this issue Apr 19, 2019 · 9 comments
Closed

Proposal: introducing sqlite as a datastore #605

raulk opened this issue Apr 19, 2019 · 9 comments

Comments

@raulk
Copy link
Member

raulk commented Apr 19, 2019

Some of our use cases become excruciating to implement efficiently with KV stores:

  1. address book: attach TTLs to each address (now), or confidence/quality scores (future), and being able to efficiently query by those criteria. GC.
  2. peer metadata / peer exchange: find peers supporting protocol X.
  3. pruning expired DHT entries or provider records.
  4. rendezvous.

The lack of indices hurts us badly. We are penalised by protobuf serde on every query and iteration. Ad-hoc GC logic has sprouted across various modules. I'm not confident KV stores award us the flexibility to support future use cases.


Should we start an experiment to explore adopting sqlite as our main datastore? We can define together what our evaluation, scoring and success criteria should be.


I doubt we'll need an abstraction layer analogous to go-datastore, because the other two alternatives (Apache Derby and H2) are JVM-based, so no chance to use those as a backend anyway.

go stdlib already supports sql, and if we want to get fancy, we can evaluate squirrel for SQL generation. I discourage ORMs.

@raulk
Copy link
Member Author

raulk commented Apr 19, 2019

cc @yusefnapora, as discussed today during the call with @Warchant and @kamilsa (cpp-libp2p).

@raulk
Copy link
Member Author

raulk commented Apr 19, 2019

@Stebalien pretty sure you have thoughts about this?

@vyzo
Copy link
Contributor

vyzo commented Apr 19, 2019

+1 I am very fond of sqlite.

@raulk
Copy link
Member Author

raulk commented Apr 19, 2019

A drawback is the logistics around Cgo compilation :-(

@Stebalien
Copy link
Member

My concerns with sqlite are:

  • Cgo build
  • dynamic libraries
  • binary size (if we go with static bindings)
  • the go<->C latency.

Ideally we'd have some kind of pure-go solution available.


My concerns with anything not "datastore" is flexibility:

  • We have a postgres datastore and even an s3 datastore. The simple interface makes this possible.
  • We can switch between backends at any time.

In terms of performance:

  • Decoding protobufs is actually pretty fast, we probably just need to better reuse intermediate buffers.
  • The current Get(key) -> []byte interface is problematic from an allocations standpoint and should probably be Get(key, func([]byte)). However, this would be a massively breaking change.
  • We can probably mandate TTL support and/or write an auto-TTL wrapper that writes some TTL metadata to the database.
    • Specifically, we'd need to write:
      • /ttl/expirations/DATE -> RECORD_KEY
      • /ttl/records/RECORD_KEY -> /ttl/expirations/DATE
      • RECORD_KEY -> RECORD_DATA
    • However, updating a TTL would require deleting one key and writing two. Deleting any record would require reading one key and deleting 1 to 3 keys.

All that aside, we feel the pain in go-ipfs and an actual database would make quite a few things quite a bit simpler. It also looks like there's a pretty reasonable SQL abstraction for golang (https://golang.org/pkg/database/sql/).

TL;DR: Let's do it as long as we abstract over any SQL database. In go-ipfs, my primary motivation is that it makes pinning, GC, graph traversal, reverse graph traversal, etc. much easier. Really, we need a graph database but SQL is probably the best general-purpose solution.

@yusefnapora
Copy link
Contributor

I'm definitely in favor, for all the reasons described above.

@vyzo - just like old times! sqlite for metadata indices + rocksdb for blob storage worked really well for mediachain.

@Warchant
Copy link

Warchant commented Apr 24, 2019

To store clearly defined entities I would recommend repository pattern:

(below is a pseudocode):

class User {
  int id;
  string name;
  date birthdate;
}

// you don't need single parent interface "Repository". At least, create
// one per Entity
interface UserRepository {
  User[] findAll();
  User findById(int id);
  User findByName(string name); 
  ...
}

Implementation(s) of this interface may use SQLite, postgres, or even multiple KV stores.

  • implementation can leverage indexing, so there is no performance penalty
  • easy to extend
  • easy to mock
  • easy to swap implementations

From my experience, it is very hard to define clear and generic interface for SQL/KV storages and Repository works well every time.

@raulk
Copy link
Member Author

raulk commented Apr 24, 2019

@Warchant definitely something to consider! Abstracting away how we store entities vs. abstracting how we generically access a database. The former allows us to use database-specific optimisations and indexing – something the latter struggles to enable.

@marten-seemann
Copy link
Contributor

We're planning a big redesign of the peerstore, see #2355. We're also planning to deprecate the database-backed peerstore, see #2329.

@marten-seemann marten-seemann closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants