-
Notifications
You must be signed in to change notification settings - Fork 2
Performance Comparison
Shunsuke Kanda edited this page Sep 30, 2022
·
4 revisions
This wiki shows the experimental results in time- and memory-efficiency of crawdad and other Rust libraries.
We compare the following implementations
- crawdad (v0.3.0, this library)
- yada (v0.5.0)
- fst::map (v0.4.7)
- daachorse (v1.0.0-rc.1)
- std::collections
- hashbrown::HashMap (v0.12.1)
The following is the specification of the used machine:
- CPU: Intel Core i9-12900K (L3: 30MB cache, 16 Core, 3.2GHz-5.2GHz)
- RAM: 64GB (2×32GB, DDR5)
- OS: Ubuntu 22.04
We measure elapsed times for the two types of queries:
- Exact match: Gets the value associated with a query key.
- Enumeration: Finds all occurrences included in a query text.
We evaluate six datasets of keywords (as shown below) and perform the exact match for 1K randomly sampled keys. We use lines from I Am a Cat (by Soseki Natsume) available at Aozora Bunko as search texts to perform the enumeration query.
The benchmark code can be found here.
- unidic-cwj (v3.1.0)
- mecab-ipadic (v2.7.0)
- mecab-ipadic-neologd (v0.0.7)
- jawiki-all-titles-in-ns0 (20220720)
- zhwiki-all-titles-in-ns0 (20220720)
- kowiki-all-titles-in-ns0 (20220720)
Dataset | File size (MiB) | #keys | #bytes/key | #chars/key |
---|---|---|---|---|
unidic-cwj | 8.4 | 674,927 | 12.0 | 4.03 |
mecab-ipadic | 3.7 | 325,871 | 10.9 | 3.65 |
mecab-ipadic-neologd | 119.5 | 5,493,506 | 21.8 | 7.71 |
jawiki-all-titles-in-ns0 | 47.1 | 2,159,582 | 21.9 | 8.31 |
zhwiki-all-titles-in-ns0 | 40.2 | 2,419,334 | 16.4 | 6.90 |
kowiki-all-titles-in-ns0 | 25.1 | 1,324,917 | 18.8 | 7.63 |
Implementation | Exact (ns/query) | Enum (us/line) | Build (sec) | Memory (MiB) |
---|---|---|---|---|
crawdad::Trie | 11 | 2.2 | 1.42 | 9.4 |
crawdad::MpTrie | 21 | 2.4 | 1.88 | 11.0 |
yada | 28 | 3.9 | 0.53 | 9.4 |
fst::map | 207 | 18.9 | 0.11 | 2.8 |
daachorse::bytewise | n/a | 3.8 | 0.35 | 28.2 |
daachorse::charwise | n/a | 2.2 | 1.63 | 25.3 |
std::BTreeMap | 343 | n/a | 0.11 | n/a |
std::HashMap | 25 | n/a | 0.07 | n/a |
hashbrown::HashMap | 11 | n/a | 0.04 | n/a |
Implementation | Exact (ns/query) | Enum (us/line) | Build (sec) | Memory (MiB) |
---|---|---|---|---|
crawdad::Trie | 9 | 2.0 | 0.22 | 4.4 |
crawdad::MpTrie | 18 | 2.3 | 0.59 | 5.2 |
yada | 22 | 3.7 | 0.32 | 5.2 |
fst::map | 190 | 18.6 | 0.06 | 1.9 |
daachorse::bytewise | n/a | 3.5 | 0.20 | 15.5 |
daachorse::charwise | n/a | 2.1 | 0.68 | 11.2 |
std::BTreeMap | 273 | n/a | 0.05 | n/a |
std::HashMap | 25 | n/a | 0.02 | n/a |
hashbrown::HashMap | 10 | n/a | 0.01 | n/a |
Implementation | Exact (ns/query) | Enum (us/line) | Build (sec) | Memory (MiB) |
---|---|---|---|---|
crawdad::Trie | 28 | 2.6 | 1.93 | 121 |
crawdad::MpTrie | 46 | 2.9 | 5.73 | 96 |
yada | 97 | 5.3 | 34.74 | 153 |
fst::map | 409 | 21.9 | 2.12 | 47 |
daachorse::bytewise | n/a | 5.4 | 10.72 | 460 |
daachorse::charwise | n/a | 2.9 | 6.39 | 286 |
std::BTreeMap | 568 | n/a | 0.98 | n/a |
std::HashMap | 39 | n/a | 1.31 | n/a |
hashbrown::HashMap | 17 | n/a | 0.63 | n/a |
Implementation | Exact (ns/query) | Build (sec) | Memory (MiB) |
---|---|---|---|
crawdad::Trie | 28 | 0.75 | 71.8 |
crawdad::MpTrie | 48 | 2.43 | 46.1 |
yada | 84 | 17.27 | 91.6 |
fst::map | 363 | 1.49 | 36.2 |
std::BTreeMap | 478 | 0.38 | n/a |
std::HashMap | 35 | 0.54 | n/a |
hashbrown::HashMap | 15 | 0.26 | n/a |
Implementation | Exact (ns/query) | Build (sec) | Memory (MiB) |
---|---|---|---|
crawdad::Trie | 23 | 1.14 | 71.3 |
crawdad::MpTrie | 39 | 2.75 | 50.4 |
yada | 54 | 10.00 | 82.0 |
fst::map | 298 | 1.26 | 33.5 |
std::BTreeMap | 460 | 0.40 | n/a |
std::HashMap | 33 | 0.49 | n/a |
hashbrown::HashMap | 14 | 0.24 | n/a |
Implementation | Exact (ns/query) | Build (sec) | Memory (MiB) |
---|---|---|---|
crawdad::Trie | 22 | 0.32 | 38.4 |
crawdad::MpTrie | 41 | 0.38 | 26.5 |
yada | 62 | 5.69 | 47.8 |
fst::map | 319 | 0.70 | 17.2 |
std::BTreeMap | 419 | 0.22 | n/a |
std::HashMap | 32 | 0.21 | n/a |
hashbrown::HashMap | 14 | 0.11 | n/a |