-
Notifications
You must be signed in to change notification settings - Fork 1
Datomic from the ground up
Data models differ based on the way you view the data. For eg, consider the data of a person
- name
and age
. Traditional object systems tend to model the person
as a complex structure of data, consisting of smaller data-parts, namely name
and age
. The person
would be of a complex type Person
and the individual "attributes" would be of inbuilt types - string
and integer
respectively.
Now, consider an Entity-Relationship view like in SQL systems. Here, the focus is turned on to the real units of information, which are name
and age
. These become "attributes" and the person
is viewed an "Entity" who holds these attributes. Entities are connected to each other through "Relationships" which are implicit in the attributes they hold. Therefore, the data model is based on what constitutes an entity - eg. SQL schema definition may look like Person (name string, age integer)
. Note that this essentially freezes the definition of an entity. Many problems may arise because of a rigid schema:
-
In an entity-class, some entities may have extra attributes. To accommodate this, we need to associate those attributes will all entities in that class and fill them with
NULL
s wherever not applicable. Eg. Some people may have addresses and others might not, but that might mean altering the schema definition toPerson (name string, age integer, address string)
, thus applying it to all entities. -
The entity-class might itself change as business requirements change. This means overhauling the data-model and adapting all existing data to the new definition
-
New relationships cannot be discovered. We need to analyze and model all entity relationships a priori. For eg, I cannot relate a
Person
and anEmployee
though they may be the same entity if I had not designed for this relationship within either entity definition. i.e. there is no way to insert a row inPerson
andEmployee
and assert that it's the same entity if those definitions did not include a foreign-key relationship between the two entities
In his pathbreaking paper, E.F. Codd defines a relation as an n-tuple over n sets of values. For example, when we pick a value from the set of possible ages {0, 1, 2 ..}
and a value from the set of possible names {"Fred", "Simon" ..}
we can construct a relation like {32, "Lucy"}
which can represent an entity in our system. The great thing about such a model is that I could also construct a relation {44, "Jerry", "32 Hounds St"}
and that can represent a new entity without affecting other entities provided I have a set of addresses to pick from.
These sets of values are the primary focus in this system of data modeling. Such a set of values is actually an attribute of a particular type, from which it draws its values. We should be able to abstractly define an attribute by specify a name and a type like this: {id: name type: string}
. Attribute definitions in Datomic comes very close to this:
{:db/id 101 :db/ident :person/name :db/valueType :db.type/string
:db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}
Points to note:
-
The keywords are namespaced to prevent collision.
:db
starts for internal identifiers and:person
happens to be the namespace we have chosen for the new attribute. -
The first three pairs are fairly self-explanatory. We need an
id
to point to this attribute, anident
to give it a specific name and avalueType
to specify the kind of values it can take. -
:db/cardinality
specifies if this attribute can take one or many values and we'll see about this later. -
The underscore in
:db.install/_attribute
provides a useful facility in referencing, but we'll have to wait until later to discuss it in detail -
In all, this makes for a good definition for a person's
name
attribute. Let's define more attributes and play with them in the repl.
Install Datomic and open the repl by running bin/repl
in the directory where datomic was installed. Then,
user=> (use '[datomic.api :only [q db] :as d])
user=> (def uri "datomic:mem://trial")
user=> (d/create-database uri)
user=> (def conn (d/connect uri))
Let's try adding some attribute definitions for name
and age
:
user=> ; define attributes
user=> @(d/transact conn [{:db/id 101 :db/ident :person/name :db/valueType :db.type/string :db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}])
user=> @(d/transact conn [{:db/id 102 :db/ident :person/age :db/valueType :db.type/long :db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}])
Now, we have two attributes in the system. Going by the relational model, we could define new entities simply by choosing some attributes and picking some values from them. For eg. valid entities could look like {name: "Lucy" age: 32}
or {name: "Fred"}
. Datomic entities are exactly of the same form:
user=> ; add data
user=> @(d/transact conn [{:db/id 103 :person/name "Lucy" :person/age 32}])
user=> @(d/transact conn [{:db/id 104 :person/name "Fred"}])
user=> ; Now, add a new attribute
user=> @(d/transact conn [{:db/id 105 :db/ident :person/address :db/valueType :db.type/string :db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}])
user=> ; add person including new attribute
user=> @(d/transact conn [{:db/id 106 :person/name "Jerry" :person/age 44 :person/address "32 Hounds St"}])
-
Note that we were able to create an entity simply by picking a subset of the available attributes and assigning values to them. We did not have to predefine a "schema" for our entities.
-
The same entity can participate in different relations. i.e. we could also have added the values separately:
@(d/transact conn [{:db/id 103 :person/name "Lucy"}])
and@(d/transact conn [{:db/id 103 :person/age 32}])
and the result would have been the same -
Also, we could add a new attribute on the go and use them in a new entity without affecting existing entities
-
It will quickly get tiring to manually assign unique ids to entities. Luckily, Datomic has a solution for this:
{:db/id #db/id[db.part/db]}
will automatically generate unique ids.db.part/db
indicates a "partition" within which this id will be unique and is usually used for attribute definitions. For user data, we usedb.part/user
It is useful to think of each Datomic transaction as addition of a fact. A query simply associates these facts together to form a relation, "restricting" the relation to a subset using conditions and "projecting" necessary attributes. Let's start with constructing a relation from attributes. This is of the form: [<entity-id> <attribute name> <attribute value>]
. Since we are querying for unknowns, some of these will be constants and others will be Datomic variables like ?e
. For eg. [?e :person/name ?n]
is a data-clause which is relating unknown entity ?e
to an unknown value of ?n
through the attribute :person/name
. This will produce a relation of one column :person/name
and will be populated with all the entities that have corresponding attribute facts.
It is easy to see how to extend this to create more complex relations. For eg.
[?e :person/name ?n] [?e :person/age ?a] ; entities with both name and age
[?e :person/name ?n] [?e :person/age 44] ; entities with any name but age equal to 44
Such a constructed relation constitutes a :where
clause. Additionally, we would want to "project" a subset of columns out. This makes for a :find
clause whereby we specify the list of variables whose values we seek. There are variations that return a single scalar value, a list of tuples or a list of scalar values
:find ?n :where [?e :person/name ?n] [?e :person/age ?a] ; #{["Lucy"] ["Jerry"]}
:find ?n . :where [?e :person/name ?n] [?e :person/age ?a] ; "Lucy"
:find ?n ?a :where [?e :person/name ?n] [?e :person/age ?a] ; #{["Jerry" 44] ["Lucy" 32]}
:find [?n ...] :where [?e :person/name ?n] [?e :person/age ?a] ; ["Lucy" "Jerry"]
You can verify the results in the repl:
user=> (q '[:find [?n ...] :where [?e :person/name ?n] [?e :person/age ?a]] (db conn))
We promised you that we would look at that weird underscore thingy - :db.install/_attribute :db.part/db
while defining an attribute. We always keep our promises, so here we go ..
Let's assume we want to model occupational details of a person. It'll be clumsy to include all job information as attributes of the person entity. Instead we can model a job entity and allow the person to "point" to it. This is where db.type/ref
makes its entry.
user=> ; job attributes
user=> @(d/transact conn [{:db/id #db/id[db.part/db] :db/ident :job/title :db/valueType :db.type/string :db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}])
user=> @(d/transact conn [{:db/id #db/id[db.part/db] :db/ident :job/salary :db/valueType :db.type/double :db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}])
user=>
user=> ; person references job
user=> @(d/transact conn [{:db/id #db/id[db.part/db] :db/ident :person/job :db/valueType :db.type/ref :db/cardinality :db.cardinality/one :db.install/_attribute :db.part/db}])
user=>
user=> ; Lucy finds a job
user=> @(d/transact conn [{:db/id 500 :job/title "Rockstar programmer" :job/salary 500000.00}])
user=> @(d/transact conn [{:db/id 103 :person/job 500}])
user=> (q '[:find ?j . :where [?e :person/name "Lucy"] [?e :person/job ?jb] [?jb :job/title ?j]] (db conn)) ; "Rockstar programmer"
- There's a better way to tie an entity to a reference without having to spell out its id. The expression
:db/id #db/id[db.part/db -501]
always generates the same unique id so we could also associateLucy
with a job this way:
user=> @(d/transact conn [{:db/id #db/id[db.part/db -501] :job/title "Rockstar programmer" :job/salary 500000.00}])
user=> @(d/transact conn [{:db/id 103 :person/job #db/id[db.part/db -501]}])
- Here, we created a job entity and pointed a person to it. Instead, we could create a job and reverse-point a person to it, all in one step. i.e.
user=> @(d/transact conn [{:db/id #db/id[:db.part/user] :job/title "Startup founder" :job/salary 50000.00 :person/_job 103}])
This creates a job, finds the person
entity and points back its job
ref to this newly created entity. The underscore makes it like a "reverse reference"
Now, we are well placed to decipher the mysterious expression in every attribute definition: :db.install/_attribute :db.part/db
. We know that the underscore makes it a reverse reference and therefore, the reference :db.install/attribute
runs from the entity :db.part/db
to the newly defined attribute. That means that if we look into the (:db.part/db
) entity's attribute values for :db.install/attribute
, we will find all the attributes in the system. Let's try:
user=> (q '[:find ?n :where [:db.part/db :db.install/attribute ?a] [?a :db/ident ?n]] (db conn))
#{[:db/code] [:db/doc] [:job/title] [:db/fn] [:db.install/function] [:db/excise] [:db/cardinality] [:db/txInstant] [:db.excise/attrs] [:db.alter/attribute] [:db/noHistory] [:db/isComponent] [:db/fulltext] [:fressian/tag] [:db/index] [:person/job] [:db/lang] [:db.excise/before] [:job/salary] [:db.excise/beforeT] [:person/address] [:db.install/valueType] [:db.install/partition] [:db/valueType] [:db/unique] [:job/dtitle] [:db/ident] [:person/age] [:db.install/attribute] [:person/name]}
That spits out all the internal attributes plus the special ones we defined. Hurray!
Admit it! Didn't it tickle you to find out that the list of attributes are themselves maintained as an attribute? You are in luck, there's more in store.
Let's dig deeper into :db.install/attribute
user=> ; a pull within the find clause causes the entire entity to be projected
user=> (q '[:find (pull ?a [*]) :where [?a :db/ident :db.install/attribute]] (db conn))
[[{:db/id 13, :db/ident :db.install/attribute, :db/valueType {:db/id 20}, :db/cardinality {:db/id 36}, :db/doc "System attribute with type :db.type/ref. Asserting this attribute on :db.part/db with value v will install v as an attribute."}]]
The type and cardinality are just opaque integers. But don't worry, the entity
API comes to the rescuse
user=> (-> (db conn) (d/entity 36) (:db/ident))
:db.cardinality/many
user=> (-> (db conn) (d/entity 20) (:db/ident))
:db.type/ref
That makes sense. :db.install/attribute
is a ref
because it "points" to other attribute definitions and it's cardinality is many
obviously because there are "many" attributes.
By now, something should be obvious. Everything is an entity in Datomic, even attribute definitions. The system is bootstrapped using some basic entities which set up the attribute types etc and everything else is built on top of that
user=> (for [id (range 0 20)] (-> (db conn) (d/entity id) (:db/ident)))
(:db.part/db :db/add :db/retract :db.part/tx :db.part/user nil nil nil nil nil :db/ident :db.install/partition :db.install/valueType :db.install/attribute :db.install/function :db/excise :db.excise/attrs :db.excise/beforeT :db.excise/before :db.alter/attribute)