Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: NodeTrie/NodeTrie_Py
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: e08883e2067b1732fb11d2395860f364e09497ab
Choose a base ref
...
head repository: NodeTrie/NodeTrie_Py
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 8c173c5e99b7029f7c8a59a47d6cc65e86ff24c4
Choose a head ref
  • 7 commits
  • 11 files changed
  • 1 contributor

Commits on Mar 16, 2017

  1. Updated readme

    pkittenis committed Mar 16, 2017
    Copy the full SHA
    968b0bd View commit details
  2. Updated versioneer names

    pkittenis committed Mar 16, 2017
    Copy the full SHA
    8499d4e View commit details

Commits on Mar 17, 2017

  1. Updated travis config

    pkittenis committed Mar 17, 2017
    Copy the full SHA
    5aacd56 View commit details

Commits on Mar 21, 2017

  1. Copy the full SHA
    7bd9837 View commit details
  2. Copy the full SHA
    d756f6d View commit details
  3. Bumped submodule

    pkittenis committed Mar 21, 2017
    Copy the full SHA
    b8d8c53 View commit details

Commits on Mar 24, 2017

  1. Copy the full SHA
    8c173c5 View commit details
Showing with 556 additions and 417 deletions.
  1. +1 −0 .gitignore
  2. +2 −2 .travis.yml
  3. +3 −1 MANIFEST.in
  4. +83 −18 README.rst
  5. +1 −1 nodetrie/__init__.py
  6. +1 −1 nodetrie/_version.py
  7. +448 −365 nodetrie/nodetrie.c
  8. +4 −2 nodetrie/nodetrie.pyx
  9. +1 −1 nodetrie_c
  10. +1 −1 setup.cfg
  11. +11 −25 setup.py
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -87,3 +87,4 @@ ENV/

# Rope project settings
.ropeproject
*~
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -17,7 +17,7 @@ deploy:
provider: pypi
on:
tags: true
distributions: "sdist"
distributions: sdist
user: pkittenis
password:
secure: JOX5bxlVltM05qNAT4HkbLsGaRj2C7g1i9p0ITAp23mvR6veyLCdBsoFD1FQ2AKR2sjGZWZCgLHW5yThJ9RdBe5s5x6E37937VtzXi03BhfbzExxcWei/x7O2DH9Fg1qhPvM9GHlSz6H/r4q9wnnL1LfRqWovbLrppEdKYvs5faB6pT1Ymrum5krTe1SRxfbjnShuqFXvr+SP78PKrzD8FciHHhxo4x4fmaO4/N6bMQeqWOrp3xg9a9x7xlqoCQZbSI2ClGeoKWHX0uYgHHC0FfC5PcSVvJhfSoZMW4MmsDvN3S7rYand3cmV0DE4njPWIYi9czwxVd1HuQQBzXpnwvs3faAIoXs9WepYLoPycWjiXHWxcfaJdCyJYaZouUHugKIDNuYS6Zdz03KXcatH0kOVWKiA3sGTz+20bZYNcfxWQtd1IC2lKKcVkXwNAZreII1nTg8rscxjCMSSV3JQFCzd/1dUKevL512ksQdyCSATQQZ7ZY9aIN+6SXgPsxGiyCPk+VL/pvkqQX2rHWU1wrgjmCwcKmGyon4aamyQb0Exe2KuyVFJ+ArHo7w2ZGV8GLPPG6z50rjavgljOqBDlUGipFO/CaweHIcqg2OxZHm3wAryr5LtJot8Sn+sb5LMB91mICM/AwisxghTLyCqvBSIPvuC/h9n0UW6NdKtjo=
secure: sKuPyiakp4nUMvYI1V671PrJ/A+E8KzB9Hf8iwmsfhelOMmVKjDhWze1V+2goMsJE2CcnlV5DmvQYdeCVqXEye5m/PauW3A+4WohGbxI0XdYJSQZPdYE/LHCyNdvDrdNt1kKefwkis+VFmpsVPeT89E4xOn0eCestYaz1r0ptdio1Nx8RARATwTKXbuB4OucTvUWtmRlrJtXV/j/eJWGf+mN/UVwRSeVd0qLqGQwpu+jGzKbpnPgctTYw0XuzGZy14LYpv5ecWeg9FyEFizcvtb9I2jAcVDdmRxOM/9VzQGXhRlHmJ9w9XTMmf1tf0lVcF6rgb1+hvbmb4xHlBVPYzIZfDzrYB7kzvHYLTI/pjcDYtU50GSlZjgjWZ+c23RF5Q2jFSsrJKx68hNlbUW0nTf1SPhLSDiyTE5k7X6Cb1l4G/xzT+qSTT8Erwf1PO1osOitwFY7YNT3md6Dzv9tWRoeC8UI84t5997jt6QqjFjWVXh/5l1hUiZJRbYUkvhB3d/ZGS2Fc3P7CKMrtGdau/F7NioLAefzYeB5K80S/SxgOcvN1NPDxBCJ7R0OadIdLNI4DxdAD+GQpzU7QXi0TQyDP1USB1h4gep9fB9sPi7mq61MS57avEQSYetcZuQaZmBoMDSymNwYerIWEf5Kb2Pe8zcD4dFYQpt6Nazy+tU=
4 changes: 3 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
include versioneer.py
include node_trie/_version.py
include nodetrie/_version.py
include *.c
include *.h
101 changes: 83 additions & 18 deletions README.rst
Original file line number Diff line number Diff line change
@@ -17,32 +17,95 @@ Installation

pip install nodetrie

Motivation, design goals
==========================

NodeTrie is a Python extension to a native C library written for this purpose.

It came about from a lack of viable alternatives for Python. While other trie library implementations exist, they suffer from severe limitations such as

* Read only structures, no insertions
* High memory use for large trees
* Lack of searching, particularly file mask or wild card style searching
* Slow inserts

Existing implementations on PyPi fall into these broad categories, including `Marissa-Trie <https://github.com/pytries/marisa-trie>`_ (read only) and `datrie <https://github.com/pytries/datrie>`_ (slow inserts, very high memory use for large trees).

NodeTrie's C library is designed to minimize memory use as much as possible and still allow arbitrary length trees that can be searched.

Each node has a name associated with it as its data, along with children list and number of children.

Features and design notes
==========================

* NodeTrie is an n-ary tree, meaning any one node can have any number of children
* Node children arrays are dynamically resized *as needed on insertion* on a per node basis. No fixed minimum nor maximum size
* Node names can be of arbitrary length, available memory allowing
* Node names from ``Node.name`` are always unicode in either Python 2/3
* Any python string type may be used on insertion
* Node names are implicitly decoded from unicode on insertion, if needed, with ``nodetrie.ENCODING`` (`utf-8`) default encoding which can be overridden
* New Python ``Node`` objects are created from the underlying C pointers every time ``Node.children`` is called. There is overhead on the Python interpreter to create these objects. It is safe and better performing to keep and re-use children references instead, see examples below

Limitations
=============

* Deletions are not implemented
* The C library implementation uses pointer arrays for children to reduce search space complexity and character pointers for names to allow for arbitrary name lengths. This may lead to memory fragmentation
* ``Node`` objects in python are read only. It is not possible to override the name of an existing ``Node`` object nor modify its attributes
* Character encodings that allow for null characters such as UCS-2 *should not be used*

Example Usage
==============

.. code-block:: python
from nodetrie import Node
# This is the head of the trie, keep a reference to it
# This is the root of the tree, keep a reference to it.
# Deleting or letting the root node go out of scope will de-allocate
# the entire tree
node = Node()
# Insert a linked tree so that a->b->c->d where -> means 'has child node'
node.insert_split_path(['a', 'b', 'c', 'd'])
node.children[0].name == 'a'
# Sub-trees can be referred to by child nodes
a_node = node.children[0]
a_node.name == 'a'
a_node.children[0].name == 'b'
a_node.is_leaf() == False
# Insertions create only new nodes
# Insert linked tree so that a->b->c->dd
node.insert_split_path(['a', 'b', 'c', 'dd'])
# Only one 'a' node
len(node.children) == 1
node.children_size == 1
# Existing references to nodes will have correct children
# after insertion without recreating the node object.
# Here, a_node is an existing object prior to more nodes
# being added to its sub-tree. After insertion, a's sub-tree contains newly
# inserted nodes as expected
# 'c' node is first child of 'b' which is first child of 'a'
# 'c' node has two children, 'd' and 'dd'
c_node = node.children[0].children[0].children[0]
len(c_node.children) == 2
c_node = a_node.children[0].children[0]
c_node.children_size == 2
c_node.is_leaf() == False
# 'd' and 'dd' are both leaf nodes
leaf_nodes = [c for c in c_node.children if c.is_leaf()]
len(leaf_nodes) == 2
.. note:: De-allocation

Tree is de-allocated when and only when root node goes out of scope or is deleted. Letting sub-tree objects go out of scope or explicitly deleting them will *not de-allocate that sub-tree*.

.. note:: Sub-tree insertions

Insertions on non-root nodes work as expected. However, ``Node.insert`` does *not* check if a node is already present, unlike ``Node.insert_split_path``

Searching
----------
@@ -59,18 +122,18 @@ NodeTrie supports exact name as well as file mask matching tree search.
['a', 'b', 'c2', 'd1'], ['a', 'b', 'c2', 'd2']]:
node.insert_split_path(paths)
for path, _node in node.search(node, ['a', 'b', '*', '*'], []):
print(path, _node.name)
print(path, _node)
Output

.. code-block:: python
[u'a', u'b', u'c1', u'd1'] d1
[u'a', u'b', u'c1', u'd2'] d2
[u'a', u'b', u'c2', u'd1'] d1
[u'a', u'b', u'c2', u'd2'] d2
[u'a', u'b', u'c1', u'd1'] Node: 'd1'
[u'a', u'b', u'c1', u'd2'] Node: 'd2'
[u'a', u'b', u'c2', u'd1'] Node: 'd1'
[u'a', u'b', u'c2', u'd2'] Node: 'd2'
A separator joined path list is return by the query function.
Separator joined node names for a matched sub-tree are returned by the query function.

.. code:: python
@@ -84,12 +147,14 @@ Output

.. code:: python
(u'a.b.c1.d1', <nodetrie.nodetrie.Node at 0x7f1899fa7730>),
(u'a.b.c1.d2', <nodetrie.nodetrie.Node at 0x7f1899fa7130>),
(u'a.b.c2.d1', <nodetrie.nodetrie.Node at 0x7f1899fa7110>),
(u'a.b.c2.d2', <nodetrie.nodetrie.Node at 0x7f1899fa73f0>)
(u'a.b.c1.d1', Node: 'd1')
(u'a.b.c1.d2', Node: 'd2')
(u'a.b.c2.d1', Node: 'd1')
(u'a.b.c2.d2', Node: 'd2')
(u'a|b|c1|d1', Node: 'd1')
(u'a|b|c1|d2', Node: 'd2')
(u'a|b|c2|d1', Node: 'd1')
(u'a|b|c2|d2', Node: 'd2')
(u'a|b|c1|d1', <nodetrie.nodetrie.Node object at 0x7f436d09c750>)
(u'a|b|c1|d2', <nodetrie.nodetrie.Node object at 0x7f436d09c770>)
(u'a|b|c2|d1', <nodetrie.nodetrie.Node object at 0x7f436d09c790>)
(u'a|b|c2|d2', <nodetrie.nodetrie.Node object at 0x7f436d09c7b0>)
Contributions are most welcome.
2 changes: 1 addition & 1 deletion nodetrie/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from .nodetrie import Node
from .nodetrie import Node, ENCODING
2 changes: 1 addition & 1 deletion nodetrie/_version.py
Original file line number Diff line number Diff line change
@@ -43,7 +43,7 @@ def get_config():
cfg.style = "pep440"
cfg.tag_prefix = ""
cfg.parentdir_prefix = "None"
cfg.versionfile_source = "node_trie/_version.py"
cfg.versionfile_source = "nodetrie/_version.py"
cfg.verbose = False
return cfg

Loading