Warning
This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email help@learncodethehardway.org to talk privately.
Exercise 21: Binary Search
The Binary Search algorithm is a simple way to find an item in an already sorted list of items. It's easily described as taking a sorted list and continually partitioning it into halves until you either find it or you run out. If you did the complete Exercise 20, then this exercise should be relatively easy.
If we want to find a number X in an already sorted list of numbers we would do this:
- Get the number in the middle of the list (M) and compare it to X.
- If X == M, you're done.
- If X > M, then find the middle of M+1 to the end of the list.
- If X < M, then find the middle of M-1 to the beginning of the list.
- Repeat, until either you find X or you have only 1 element left.
This works for anything that you can compare with equality. It will work on strings, numbers, and anything else you can order consistently.
Exercise Challenge
Your BSTree should already be doing a get operation that's similar to the binary search. The difference is the BSTree is already partitioned, so there's no need to do any more. In this exercise you'll implement the binary search for the DoubleLinkedList, Python's list, and compare those to the BSTree.get performance. Your goal is to learn the following:
- How well does the BSTree compare to Python's list for simply finding some items?
- How poorly does binary search work with DoubleLinkedList?
- Do your pathological cases for BSTree also cause problems for a binary search on list?
When analyzing the performance, don't include the time it takes to sort the numbers. That's important when doing a global optimization, but in this case you only care about how fast the binary search works. You can also use Python's built-in list sorting algorithms to sort your list since that's not the point. This exercise is all about how fast search is between the three data structures.
Study Drills
- Find out what the maximum number of possible comparisons is that this algorithm needs to make. Try to figure it out on your own first, and then research the algorithm to find out what the real answer is. After that just memorize the real answer.
- Can any of your optimizations here be applied to the sorting algorithms?
- Try to visualize what this algorithm is doing in each data structure. For example, in the DoubleLinkedList you can almost think of it as walking back and forth until it finds the answer.
- To give yourself an extra challenge, try to make the DoubleLinkedList into a sorted linked list where every insert is always at the sorted position. Now write your performance analysis to include adding elements and sorting the lists of numbers to see how that improves total performance.
Further Study
Research other search algorithms, especially for strings. Many of these will be difficult to implement in Python because of the way Python's strings are implemented, but give it a try anyway.