Trie

Trie or prefix tree is a tree-like data structure that implements dictionary abstract data type. In other words, trie allows to store key/value pairs in hierarchical structure. For example, let say we have words {a, as, air, aim, ail, car, i, milk, be, bag}. These words can be represented as follows.

trie_intro4

As you can see the root of the trie is an empty string that has five direct children. Every child represents first letter of a given word so that the words with the same prefix share the common nodes. As a result, a key can consist of one or more nodes, but values stored in one node.

Applications

The most common application of the trie is autocomplete and spell-checking subroutines in web browsers, mobile phones and text editors.

The tries have several advantages as compared to hash tables. For instance, weak point of hash tables is hash function that can have collisions during generation of hash codes. As a result, the worst case in lookup data is O(N) where N is number of entries. Lookup in the trie has O(L) time complexity, where L is a length of the searching word. Another advantage is that a trie can provide an alphabetical ordering of the entries by key.

Implementation and operations on tries

In my implementation I used array lists to represent the trie. Every node contains array list of child nodes, character symbol and value that can be null in case if the node is not a key. There are three main operations that can be applied on the trie: insert a new entry into the trie, find entry and delete entry.

Here goes my implementation of the trie data structure on github: https://github.com/alexvolov/algorithms-and-data-structures/blob/master/src/main/java/com/alexvolov/ads/ds/impl/ArrayListTrie.java

Improvements

In general all improvements boils down to compression of the trie. A compressed trie a is trie in which all intermediate nodes that goes to the one single node are deleted. For example, on the diagram below for key “CAR” we have three nodes that can be compressed into one. The same for the key “MILK”.

trie_zip

Time complexity and memory usage

In terms of memory consumption and time complexity the trie doesn’t yield to hash tables and balanced trees. The main operations  such as find, delete, insert has O(|Key|) time complexity. For example, lookup from hash table is constant but generation of hash code has O(|Key|) time complexity.

By memory consumption compressed prefix trees surpass hash tables and balanced tries. Because, the keys with the same prefix uses the same memory.

Leave a Reply