Skip to content
grammophone edited this page Mar 24, 2016 · 8 revisions

This .NET library offers:

  • Generalized edit distance functionality, for any type of object in place of characters, using arbitrary distance metrics.
  • Various forms of a generalized radix tree. The trees are generalized in the sense that they use any object type as a character, and they can contain more than one items along with optional user-defined extra data per item stored. Items will be mentioned as 'words' hereon.
  • Combination of the above to offer approximate search.
  • Implementation of the "all-substrings" kernel, see below.

The concrete implementations of the trees spawn from a common abstract ancestor, the RadixTree. The following UML diagram summarizes the class hierarchy of the trees.

Trees class diagram

The generic argument C is the type of the character, D is the type of the user data stored per indexed word, and N is the type of data stored per node of the tree. These types can be full-blown objects on the heap or just value types, like primitives such as char.

The WordTree subclass indexes whole words as they are added to it. In contrast, the SuffixTree contains all the suffixes of the words added to it. The suffix tree is built in O(n) time where n is the sum of the lengths of all words added to it using Ukkonen's method (1995). Any type C offering HashCode and Equals methods of O(1) complexity preserves the tree's O(n) build time.

Clone this wiki locally