Skip to content

Performance Comparison

Shunsuke Kanda edited this page Sep 30, 2022 · 4 revisions

This wiki shows the experimental results in time- and memory-efficiency of crawdad and other Rust libraries.

Experimental setup

Implementations

We compare the following implementations

Machine

The following is the specification of the used machine:

  • CPU: Intel Core i9-12900K (L3: 30MB cache, 16 Core, 3.2GHz-5.2GHz)
  • RAM: 64GB (2×32GB, DDR5)
  • OS: Ubuntu 22.04

Methodology

We measure elapsed times for the two types of queries:

  • Exact match: Gets the value associated with a query key.
  • Enumeration: Finds all occurrences included in a query text.

We evaluate six datasets of keywords (as shown below) and perform the exact match for 1K randomly sampled keys. We use lines from I Am a Cat (by Soseki Natsume) available at Aozora Bunko as search texts to perform the enumeration query.

The benchmark code can be found here.

Datasets of words

Dataset File size (MiB) #keys #bytes/key #chars/key
unidic-cwj 8.4 674,927 12.0 4.03
mecab-ipadic 3.7 325,871 10.9 3.65
mecab-ipadic-neologd 119.5 5,493,506 21.8 7.71
jawiki-all-titles-in-ns0 47.1 2,159,582 21.9 8.31
zhwiki-all-titles-in-ns0 40.2 2,419,334 16.4 6.90
kowiki-all-titles-in-ns0 25.1 1,324,917 18.8 7.63

Experimental result

unidic-cwj

Implementation Exact (ns/query) Enum (us/line) Build (sec) Memory (MiB)
crawdad::Trie 11 2.2 1.42 9.4
crawdad::MpTrie 21 2.4 1.88 11.0
yada 28 3.9 0.53 9.4
fst::map 207 18.9 0.11 2.8
daachorse::bytewise n/a 3.8 0.35 28.2
daachorse::charwise n/a 2.2 1.63 25.3
std::BTreeMap 343 n/a 0.11 n/a
std::HashMap 25 n/a 0.07 n/a
hashbrown::HashMap 11 n/a 0.04 n/a

mecab-ipadic

Implementation Exact (ns/query) Enum (us/line) Build (sec) Memory (MiB)
crawdad::Trie 9 2.0 0.22 4.4
crawdad::MpTrie 18 2.3 0.59 5.2
yada 22 3.7 0.32 5.2
fst::map 190 18.6 0.06 1.9
daachorse::bytewise n/a 3.5 0.20 15.5
daachorse::charwise n/a 2.1 0.68 11.2
std::BTreeMap 273 n/a 0.05 n/a
std::HashMap 25 n/a 0.02 n/a
hashbrown::HashMap 10 n/a 0.01 n/a

mecab-ipadic-neologd

Implementation Exact (ns/query) Enum (us/line) Build (sec) Memory (MiB)
crawdad::Trie 28 2.6 1.93 121
crawdad::MpTrie 46 2.9 5.73 96
yada 97 5.3 34.74 153
fst::map 409 21.9 2.12 47
daachorse::bytewise n/a 5.4 10.72 460
daachorse::charwise n/a 2.9 6.39 286
std::BTreeMap 568 n/a 0.98 n/a
std::HashMap 39 n/a 1.31 n/a
hashbrown::HashMap 17 n/a 0.63 n/a

jawiki-all-titles-in-ns0

Implementation Exact (ns/query) Build (sec) Memory (MiB)
crawdad::Trie 28 0.75 71.8
crawdad::MpTrie 48 2.43 46.1
yada 84 17.27 91.6
fst::map 363 1.49 36.2
std::BTreeMap 478 0.38 n/a
std::HashMap 35 0.54 n/a
hashbrown::HashMap 15 0.26 n/a

zhwiki-all-titles-in-ns0

Implementation Exact (ns/query) Build (sec) Memory (MiB)
crawdad::Trie 23 1.14 71.3
crawdad::MpTrie 39 2.75 50.4
yada 54 10.00 82.0
fst::map 298 1.26 33.5
std::BTreeMap 460 0.40 n/a
std::HashMap 33 0.49 n/a
hashbrown::HashMap 14 0.24 n/a

kowiki-all-titles-in-ns0

Implementation Exact (ns/query) Build (sec) Memory (MiB)
crawdad::Trie 22 0.32 38.4
crawdad::MpTrie 41 0.38 26.5
yada 62 5.69 47.8
fst::map 319 0.70 17.2
std::BTreeMap 419 0.22 n/a
std::HashMap 32 0.21 n/a
hashbrown::HashMap 14 0.11 n/a