This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email email@example.com to talk privately.
Exercise 23: Ternary Search Trees
The final data structure we'll investigate is called a Ternary Search Tree (TSTree), and it's useful for quickly finding a string in a set of strings. It's similar to a BSTree but instead of two children it has three children, and each child is only a single character rather than an entire string. In the BSTree you had a left and right child for the "less-than" and "greater-than" branches of the tree. With a TSTree you have left, middle, and right branches for "less-than", "equal-to", and "greater-than" branches. This lets you take a string, break it into characters, and then walk the TSTree one character at a time until you find it or you run out.
The TSTree is effectively trading space for speed by breaking the set of possible keys you need to search into one character nodes. Each of these nodes will take up more space than the same keys in a BSTree, but this enables you to find keys by only comparing the characters in the key you want. With a BSTree you have to compare most of the characters in both the search key and node key once each node. With a TSTree you only compare each letter of the search key, and when you get to the end that's it.
The other thing that a TSTree is very good for is knowing when a key does not exist in the set. Imagine you have a key that's 10 characters long and you need to find it in a set of other keys, but you need to stop fast if the key does not exist. With a TSTree you could stop at one or two characters, reach the end of the tree, and know this key does not exist. You'll at most compare only 10 characters in the key to find this out, which is much fewer character comparisons than a BSTree would make. A BSTree could compare all the characters in a 10 character key once for every single node in the most pathological case (where the BSTree is basically a linked list) before deciding the key does not exist.
In this exercise you're going to partially do another master copy and then complete the TSTree on your own. The code you'll need is:
You'll need to study this using the master copy method you learned. Pay special attention to how the node.eq path is handled and how the node.value is set. Once you have an understanding of how get and set work, you'll then implement the remaining functions and all the tests. The functions to implement are:
- Given a key K, find the shortest key/value pair that starts with K. This means if you have "apple" and "application" in your set, then a call to find_shortest("appl") will return "apple" and the value associated with it.
- Given a key K, find the longest key/value pair that starts with K. Using the example of "apple" and "application", a call to find_longest("appl") would return "application" and the value associated with it.
- Given a key K, find all of the key/value pairs that start with K. I would implement this first, and then base find_shortest and find_longest on that.
- Given a key K, find the shortest key that has at least a part of the beginning of K. Investigate how where you set the node.value makes this work.
- Look at the comments in the original code regarding where you place the value during _set. Does changing this change the meaning of get and why?
- Make sure you thrash this with randomness and create some performance measurements.
- You can also do partial matching matching in a TSTree. I consider this extra credit, so try to implement them and see what you come up with. Partial matching is where you match 'a.p.e' on "apple", "anpxe", and "ajpqe".
- How can you search for endings of strings? Hint: Don't over think this.