- A map implementation based on DAWG (Directed Acyclic Word Graph)
- Maps are serialized to sequences of bytes using double array trie format
- This takes static key set, and assigns the unique identifiers for the each keys
- Note that the input keys must be unique and lexically ordered
- The identifier assigned to a key is the zero-origin index of the key in the input sequence
- This library aims to provide a handy way for building maps which have tens of millions of elements in Common Lisp
(require :asdf)
(push *default-pathname-defaults* asdf:*central-registry*)
(asdf:load-system :dawg)
(dawg:build :input "/usr/share/dict/words" :output "words.dawg")
(defparameter *dawg* (dawg:load "words.dawg"))
(dawg:member? "hello" *dawg*)
T
(dawg:get-id "hello" *dawg*)
50195
(dawg:each-common-prefix (id end) ("hello" *dawg*)
(print (list id (subseq "hello" 0 end))))
(49012 "h")
(49845 "he")
(50183 "hell")
(50195 "hello")
Builds a DAWG index file from the input key set.
input
:- The pathname of a key set file or a list of keys
- "key set file" is line delimitered plain text file (a line represents a key)
- Restrictions:
- The input keys must be unique and lexically ordered
- A key cannot contain null characters
- Type:
(or string pathname list)
- The pathname of a key set file or a list of keys
output
:- The pathname of the resulting DAWG index file
- Type:
(or string pathname)
byte-order
:- The endianness of the output file
- Type:
(member :native :little :big)
- Default:
:native
show-progress
:- Indicates whether or not to show the progress
- Type:
boolean
- Default:
nil
Loads the DAWG map from the specified index file.
index-path
:- The pathname of an index file that built via
dawg:build
function - Type:
(or string pathname file-stream)
- The pathname of an index file that built via
byte-order
:- The endianness of the input file
- Type:
(member :native :little :big)
- Default:
:native
Returns t
if dawg
contains the given key, otherwise nil
.
key
:- Type:
(simple-array character)
- Type:
dawg
:- Type:
dawg:dawg
- Type:
start
:- The start position in
key
- Type:
positive-fixnum
- Default:
0
- The start position in
end
:- The end position in
key
- Type:
positive-fixnum
- Default:
(length key)
- The end position in
Returns the identifier assigned to the given key.
If the key does not exist in dawg
, this function will return nil
.
key
:- Type:
(simple-array character)
- Type:
dawg
:- Type:
dawg:dawg
- Type:
start
:- The start position in
key
- Type:
positive-fixnum
- Default:
0
- The start position in
end
:- The end position in
key
- Type:
positive-fixnum
- Default:
(length key)
- The end position in
Executes common-prefix search for the given key.
For each key in dawg
that matches the prefix part of key
, match-id
and match-end
are bound then body
is exeucted.
By using the return
function, it is possible to break the loop halfway.
match-id
:- The identifier of the key matched with the prefix part of the input
key
- Type:
positive-fixnum
- The identifier of the key matched with the prefix part of the input
match-end
:- The end position of the matched part in the input
key
(i.e., the length of the matched key) - Type:
positive-fixnum
- The end position of the matched part in the input
key
:- Type:
(simple-array character)
- Type:
dawg
:- Type:
dawg:dawg
- Type:
start
:- The start position in
key
- Type:
positive-fixnum
- Default:
0
- The start position in
end
:- The end position in
key
- Type:
positive-fixnum
- Default:
(length key)
- The end position in
body
:- The expression to be executed in each iteration
A simplified version of dawg:each-common-prefix
that does not bind the match-end
variable in each iteration.