This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email email@example.com to talk privately.
Exercise 20: Binary Search Trees
In this exercise I'm going to teach you to translate an English description of a data structure into working code. You already know how to analyze the code for an algorithm or data structure using the master copy method. You also know how to read a p-code (psuedo-code) description of an algorithm. Now you will combine the two and learn how to break down a rather loose English description of the Binary Search Tree.
I'm going to start off right away and warn you to not visit the Wikipedia page when you do this exercise. The Binary Search Tree description on Wikipedia has mostly working Python code for the data structure, so it would defeat the point of this exercise. If you get stuck then you'll be able to read any resources you can, but first try to do it from just my descriptions here.
Binary Search Trees
In Exercise 16 you saw how a Merge Sort takes a flat, linked list and seems to convert it into a tree of sorted parts. It keeps cutting the list into smaller pieces and then assembles the pieces back together by sorting lesser valued parts on the "left" and greater valued parts on the right. In a way, a Binary Search Tree (BSTree) is a data structure that does this sorting right away and never tries to keep the items in a list. The main usage for a BSTree is to use a tree to organize pairs of key=value nodes in order ahead of time, as you insert or delete them.
A BSTree does this by starting the tree at a root key=value, and having a right or left path (link). If you insert a new key=value, then the BSTree's job is to start at the root and compare the key to each node: going left if your new key is less-than and going right if your key is greater-than. Eventually the BSTree finds a position in the tree that--should you follow the original path--will find it by following the same process. All operations after that do the same thing by comparing any keys to each node, moving left and right until it either finds the node or reaches a dead end.
In this way a BSTree is an alternative to a Dictionary from Exercise 17, and so it should have the same operations. The basic BSTreeNode would need left, right, key, and value attributes to create the tree structure. You may also need a parent attribute depending on how you do this. The BSTree then needs the following operations on a root BSTreeNode:
- Given a key, walk the tree to find the node, or return None if you reach a dead end. You go left if the given key is less-than the node's key. You go right if the key is greater-than the node's key. If you read a node with no left or right, then you're done and the node does not exist. There is a way to do this using recursion and using a while loop.
- This is nearly the same as get except once you reach that dead end node you simply attach a new BSTreeNode on the left or right, thus extending the tree down one more branch.
- Deleting nodes from a BSTree is a complex operation, so I have a whole section just on delete. The short version is you have three conditions: the node is a leaf (no children), has one child, or has two children. If it's a leaf then just remove it. If it has one child, then replace it with the child. If it has two children, then it gets really complicated so read the section on deleting below.
- Walk the tree and print everything out. The important piece to list is that you can walk the tree in different ways to produce different output. If you walk the left, then the right paths, you get something different than if you do the inverse. If you go all the way to the bottom and then print as you come up the tree toward root, you get yet another kind of output. You can also print the nodes as you go down the tree, from root to the "leaves". Try different styles to see which one does what.
Remember we have three conditions to deal with when deleting a node (which I'll call D):
- The D node is a "leaf" node because it has no children (not left or right). Just remove it from the parent.
- The D node has only one child (either left or right but not both). In that case you can simply move the value of this child to the D node, then delete the child. That effectively replaces the D node with the child (or, "moves the child up").
- The D node has both a left and right child, which means it's time to do some major surgery. First, find the minimum child of the D.right node called the successor. Set the D.key to the successor.key, and then do the same delete on this successor's children using its key.
You will most likely also need an operation for find_minimum and for replace_node_in_parent to perform those two operations. I mention that you might need a parent attribute depending on how you implement it. I'd say go with a parent node as that's easier in most cases.
Everyone hates deleting from a tree. It's a complicated operation and even my favorite reference, "The Algorithm Design Manual" by Steven S. Skiena skips over it because the the implementation "looks a little ghastly". Don't be discouraged if you have a hard time figuring delete out.
You are to implement your BSTree using this purposefully vague description. Try not to look at too many references while you make a first attempt, then when you get stuck go read how others have done it. The point of this exercise is to attempt to solve a complicated problem from an admittedly terrible description of it.
The trick to solving this is to first translate the paragraphs of English into rough p-code. Then translate the rough p-code to more exact p-code. Once you have a more exact p-code you can then translate that into Python. Pay special attention to specific words as a single word in English could mean a great many things in Python. Sometimes you just have to make a guess and run your tests to see if that's probably the right one.
Tests will also be very important, and it might be a good idea to apply the "test first" method to this problem. You know what each of these operations should do, so you can write a test for it, then make the test work.
- Can you develop a pathological test that does inserts in such a way as to make the BSTree be nothing more than a fancy linked list?
- What happens when you try to delete from this "pole" of a BSTree?
- How does the speed of the BSTree compare to the speed of your newly optimized Dictionary?
- How fast can you make the BSTree using your performance analysis and tuning process?