Proposal: introducing sqlite as a datastore #605

raulk · 2019-04-19T13:21:17Z

Some of our use cases become excruciating to implement efficiently with KV stores:

address book: attach TTLs to each address (now), or confidence/quality scores (future), and being able to efficiently query by those criteria. GC.
peer metadata / peer exchange: find peers supporting protocol X.
pruning expired DHT entries or provider records.
rendezvous.

The lack of indices hurts us badly. We are penalised by protobuf serde on every query and iteration. Ad-hoc GC logic has sprouted across various modules. I'm not confident KV stores award us the flexibility to support future use cases.

Should we start an experiment to explore adopting sqlite as our main datastore? We can define together what our evaluation, scoring and success criteria should be.

I doubt we'll need an abstraction layer analogous to go-datastore, because the other two alternatives (Apache Derby and H2) are JVM-based, so no chance to use those as a backend anyway.

go stdlib already supports sql, and if we want to get fancy, we can evaluate squirrel for SQL generation. I discourage ORMs.

raulk · 2019-04-19T13:22:49Z

cc @yusefnapora, as discussed today during the call with @Warchant and @kamilsa (cpp-libp2p).

raulk · 2019-04-19T13:23:58Z

@Stebalien pretty sure you have thoughts about this?

vyzo · 2019-04-19T13:31:18Z

+1 I am very fond of sqlite.

raulk · 2019-04-19T13:44:27Z

A drawback is the logistics around Cgo compilation :-(

Stebalien · 2019-04-19T16:01:35Z

My concerns with sqlite are:

Cgo build
dynamic libraries
binary size (if we go with static bindings)
the go<->C latency.

Ideally we'd have some kind of pure-go solution available.

My concerns with anything not "datastore" is flexibility:

We have a postgres datastore and even an s3 datastore. The simple interface makes this possible.
We can switch between backends at any time.

In terms of performance:

Decoding protobufs is actually pretty fast, we probably just need to better reuse intermediate buffers.
The current Get(key) -> []byte interface is problematic from an allocations standpoint and should probably be Get(key, func([]byte)). However, this would be a massively breaking change.
We can probably mandate TTL support and/or write an auto-TTL wrapper that writes some TTL metadata to the database.
- Specifically, we'd need to write:
  - /ttl/expirations/DATE -> RECORD_KEY
  - /ttl/records/RECORD_KEY -> /ttl/expirations/DATE
  - RECORD_KEY -> RECORD_DATA
- However, updating a TTL would require deleting one key and writing two. Deleting any record would require reading one key and deleting 1 to 3 keys.

All that aside, we feel the pain in go-ipfs and an actual database would make quite a few things quite a bit simpler. It also looks like there's a pretty reasonable SQL abstraction for golang (https://golang.org/pkg/database/sql/).

TL;DR: Let's do it as long as we abstract over any SQL database. In go-ipfs, my primary motivation is that it makes pinning, GC, graph traversal, reverse graph traversal, etc. much easier. Really, we need a graph database but SQL is probably the best general-purpose solution.

yusefnapora · 2019-04-24T18:01:37Z

I'm definitely in favor, for all the reasons described above.

@vyzo - just like old times! sqlite for metadata indices + rocksdb for blob storage worked really well for mediachain.

Warchant · 2019-04-24T20:16:38Z

To store clearly defined entities I would recommend repository pattern:

(below is a pseudocode):

class User {
  int id;
  string name;
  date birthdate;
}

// you don't need single parent interface "Repository". At least, create
// one per Entity
interface UserRepository {
  User[] findAll();
  User findById(int id);
  User findByName(string name); 
  ...
}

Implementation(s) of this interface may use SQLite, postgres, or even multiple KV stores.

implementation can leverage indexing, so there is no performance penalty
easy to extend
easy to mock
easy to swap implementations

From my experience, it is very hard to define clear and generic interface for SQL/KV storages and Repository works well every time.

raulk · 2019-04-24T20:59:40Z

@Warchant definitely something to consider! Abstracting away how we store entities vs. abstracting how we generically access a database. The former allows us to use database-specific optimisations and indexing – something the latter struggles to enable.

marten-seemann · 2023-06-28T02:33:54Z

We're planning a big redesign of the peerstore, see #2355. We're also planning to deprecate the database-backed peerstore, see #2329.

marten-seemann closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: introducing sqlite as a datastore #605

Proposal: introducing sqlite as a datastore #605

raulk commented Apr 19, 2019 •

edited

Loading

raulk commented Apr 19, 2019 •

edited

Loading

raulk commented Apr 19, 2019

vyzo commented Apr 19, 2019

raulk commented Apr 19, 2019

Stebalien commented Apr 19, 2019

yusefnapora commented Apr 24, 2019

Warchant commented Apr 24, 2019 •

edited

Loading

raulk commented Apr 24, 2019

marten-seemann commented Jun 28, 2023

Proposal: introducing sqlite as a datastore #605

Proposal: introducing sqlite as a datastore #605

Comments

raulk commented Apr 19, 2019 • edited Loading

raulk commented Apr 19, 2019 • edited Loading

raulk commented Apr 19, 2019

vyzo commented Apr 19, 2019

raulk commented Apr 19, 2019

Stebalien commented Apr 19, 2019

yusefnapora commented Apr 24, 2019

Warchant commented Apr 24, 2019 • edited Loading

raulk commented Apr 24, 2019

marten-seemann commented Jun 28, 2023

raulk commented Apr 19, 2019 •

edited

Loading

raulk commented Apr 19, 2019 •

edited

Loading

Warchant commented Apr 24, 2019 •

edited

Loading