Wednesday, July 21, 2010

C# Trie

For anyone interested, I have also written a Java Trie here.

In a recent project I have ventured into, I had to develop a set of classes in C# for a simple trie which are now released as open source (http://code.google.com/p/typocalypse/source/browse/#hg/Trie). This blog post shall describe how it works and how to use it.

Introduction to tries
Simple tries are a data structure for mapping strings to values such that it enables you to efficiently retrieve all the values whose key strings have a particular prefix. This works by storing all the strings in a tree where each node is a prefix to a number of strings and each edge is a character to append to the prefix in the parent node. The root node is the empty string which is a prefix to all strings.

As you traverse a path from the root downwards, you follow the edges which lead to prefixes of the string you are trying to match. Eventually if such a string exists, a node will be pointing to the value mapped by the string. By traversing the descendants of a particular node, you can retrieve all the stored strings which have that prefix in common and their corresponding value.

Here's a diagram example from wikipedia:
http://upload.wikimedia.org/wikipedia/commons/b/be/Trie_example.svg

Implemented trie
The implemented trie supports adding, removing and character-by-character prefix searching of string-value pairs.

The way it works is by keeping a reference in each node to a value object such that if the reference is null then the node does not point to a value object and hence the prefix is not a complete string. The type of the values is generic, any referencable type is allowed. Each node points to its children via a Dictionary object which maps characters to nodes, thereby allowing a fast search by following the characters.

The following is a description of each class:

TrieNode
This is a non-public class which represents the nodes of the trie, that is, the prefixes. However in this implementation, the prefix itself is not stored, only the last character of the prefix called the Key is stored in order to know which character was followed to access this node. It also stores the value (generic type), a pointer to the parent node and a dictionary which maps characters to children nodes. If this node terminates a string then it's value field points to an object, otherwise it is null.

PrefixMatcher
This is a non-public class which is dedicated to allow the user to search the trie for a particular string or strings with a particular prefix. It was designed for enabling the user to search for the values of said strings on a character by character basis rather than providing the whole string immediately so that it can be used to search for user input as the user types in the query. This requires the maintenance of a state so that it is known which prefix has been entered thus far as well as which node that prefix is found in so as to enable an efficient search. For this reason the search feature was kept in a class of its own. Each time the trie is modified with the removal of a string, the current search is cancelled and must be redone from first character so as to avoid the user from searching non-existent strings.

IPrefixMatcher
This is a public interface which abstracts the methods of the PrefixMatcher class so that the user is provided with an abstraction rather than a concrete class in order to search.

Trie
This is the main class which provides the features of the trie data structure. Searching is done through the public property "Matcher" which returns an instance of IPrefixMatcher. When a string is removed, the current search is cleared so as to avoid the user from searching non-existent strings.

Examples
Here is an example of how to use the implemented trie:
//Create trie
Trie < string > trie = new Trie < string > ();

//Add some key-value pairs to the trie
trie.Put("James", "112");
trie.Put("Jake", "222");
trie.Put("Fred", "326");

//Search the trie
trie.Matcher.NextMatch('J'); //Prefix thus far: "J"
trie.Matcher.GetPrefixMatches(); //[112, 222]
trie.Matcher.IsExactMatch(); //false
trie.Matcher.NextMatch('a');
trie.Matcher.NextMatch('m'); //Prefix thus far: "Jam"
trie.Matcher.GetPrefixMatches(); //[112]
trie.Matcher.NextMatch('e');
trie.Matcher.NextMatch('s'); //Prefix thus far: "James"
trie.Matcher.IsExactMatch(); //true
trie.Matcher.GetExactMatch(); //112

//Remove a string-value pair
trie.Remove("James");

Link to classes
http://code.google.com/p/typocalypse/source/browse/#hg/Trie

5 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This example code does not seem to work, since the library classes require that the template parameter for Trie be a reference type...

    ReplyDelete
  3. Thanks for the heads up Eric! Fixed.

    ReplyDelete
  4. Hi, link to the class is broken.

    ReplyDelete
    Replies
    1. Are you sure naren? It works fine from my end. You're supposed to be taken to the directory containing the class files. It's more than one class.

      Delete