This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email email@example.com to talk privately.
Exercise 13: Single Linked Lists
The first data structure you will implement is the Single Linked List. I will describe the data structure, list out all the operations you should implement, and give you a single test for your implementation that needs to pass. You should attempt this data structure on your own at first but then watch the video of my implementation and the audit after so you can understand the process.
These are not efficiently implemented data structures at all! They are purposefully naive and slow so that we can cover measuring and optimizing data code in Exercises 18 and 19. If you are attempting to use these data structures in professional work, then you will have performance problems.
When dealing with many data structures in an object-oriented language such as Python, you have three common concepts to understand:
- A "node", which is usually a container or memory location for the data structure. Your values go in here.
- An "edge", but we'll say "pointer" or "link", that points at other nodes. These are placed inside each node, usually as instance variables.
- A "controller", which is some class that knows how to use the pointers in a node to structure the data correctly.
In Python we will map these concepts like this:
- Nodes are just objects defined by a class.
- Pointers (edges) are just instance variables in the node objects.
- The Controller is simply another class that uses the nodes to store everything and structure the data. This is where all of your operations go (push, pop, list, etc.), and usually the user of the controller never really deals with the Nodes or pointers.
In some books on algorithms you'll see implementations that combine the node and controller into one single class or structure, but this is very confusing and also violates separation of concerns in your design. It's better to separate the nodes from the controlling class so that one thing does one thing well and you know where the bugs are.
Imagine we want to store a list of cars in order. We have a first car, which leads to a second, and so on until the end. Imagining this list we can begin to conceive of a node/pointers/controller design for it:
- Nodes contain each car's description. Maybe it's just a node.value variable to a Car class. We can call this SingleLinkedListNode or SLLNode if you're lazy.
- Each SLLNode then has a next link to the next node in the chain. Doing node.next gets you the next car in the line.
- A Controller simply called SingleLinkedList then has operations like push, pop, first, or count, which take a Car and uses nodes to store them internally. When you push a car onto the SingleLinkedList controller, it works an internal list of linked nodes to store it at the end.
Why would we do this when Python has a perfectly good and very fast list type? Mostly to learn about data structures is all. In the real world you would just use Python's list and move on.
To implement this SingleLinkedListNode we'd need a simple class like this:
1 2 3 4 5 6 7 8 9
We have to use the word nxt since next is a reserved word in Python. Other than that this is a very simple class. The most complicated thing is the __repr__ function. That just prints debugging output when you use the "%r" format, or when you call repr() on the node. It should return a string.
Take the time now to figure out how to manually build a list using just the SingleLinkedListNode class and then manually walk through it. This is a good 45 minute hack spike to try at this point in the exercise.
Once we have our nodes defined in the SingleLinkedListNode class we can figure out exactly what the controller should do. Every data structure has a list of common operations you'd need for them to be useful. Different operations take different amounts of memory (space) and time, with some being expensive and others being fast. The structure of the SingleLinkedListNode makes some operations very quick but many others very slow. You'll be figuring this out as you work on the implementation.
The easiest way to see the operations is to look at a skeleton version of the SingleLinkedList class:
In other exercises I'll only tell you the operations and leave it for you to figure out, but for this one I'm giving you guidance on implementation. Look through the list of functions in SingleLinkedList to see each operation and the comment for how it should work.
I am now going to give you the test that you have to make work when implementing this class. You'll see that I've gone through every operation and tried to cover most of the edge cases, but when I do the audit you'll find out that actually I may have missed a few. It's common that people don't test for cases such as "zero elements" and "one element".
Study this test carefully so that you have a good idea of how each operation should work before trying to implement it. I wouldn't write all of this code into a file at once. Instead it's better to do just one test at a time and make it work in small chunks.
At this point if you are unfamiliar with automated testing then you will want to watch the video to watch me do it.
As you make each test run you will conduct an audit of your code to find defects. Eventually you will track the number of defects you find with your audits, but for now you're going to just practice auditing code after you write it. An "audit" is similar to what a tax agent does when the government thinks you are cheating on your taxes. They go through each transaction, each amount of money coming in, all money going out, and why you spent it the way you did. A code audit is similar as you go through each function, and analyze all parameters coming in, and all values going out.
To conduct a basic audit you will do this:
- Start at the top of your test case. Let's do test_push in this example.
- Look at the first code line and determine what is being called and what is being created. In this case it's colors = SingleLinkeList(). That means we are creating a colors variable, and calling the SingleLinkeList.__init__ function.
- Jump to the top of this __init__ function, keeping your test case and the target function (__init__) side-by-side. Confirm that you have done so. You then confirm that you called it with the correct number and type of function arguments. In this case __init__ only takes self, and it should be the right type.
- You then go into __init__ and go line-by-line, confirming each function call and variable in the same way. Does it have the right number of arguments? The right types?
- At each branch (if-statement, for-loop, while-loop) you confirm that the logic is correct and that it handles any possible conditions in the logic. Do if-statements have else clauses with errors? Do while loops end? Then dive into each branch and track the functions the same way, diving in, checking variables, and coming back, checking returns.
- When you get to the end of a function or any return, you jump back to the test_push caller to check that what is returned matches what you expect when you called it. Remember, though, that you do this for each call in __ini__ as well.
- Finally you are done when you reach the end of the test_push function and you've done this recursive checking into and out of each function it calls.
This process seems tedious at first, and yes it is, but you'll get quicker at it the more often you do it. In the video you'll see me do this before I run each test (or at least I try really hard to do it). I follow a process of:
- Write some test code.
- Write the code to make the test work.
- Audit both.
- Run the test to see if I was right.
We now reach the part where you're ready to try this. First, read through the test and study what it does, and study the code in sllist.py to figure out what you need to do. I suggest that when you attempt to implement a function in SingleLinkeList you first write the comments describing what it does, then fill in the Python code to make those comments work. You'll see me do this in the video.
When you have spent one or two 45-minute sessions hacking on this and trying to make it work, it's time to watch the video. You'll want to attempt it first so that you have a better idea of what I'm trying to do, which makes the video easier to understand. The video will be me just coding and not talking, but I'll do a voice-over to discuss what's going on. The video will also be faster to save time, and I'll edit out any boring mistakes or wasted time.
Once you see how I do it, and you've taken notes (right?), then go and attempt your more serious tries and perform the code auditing process as carefully as you can.
Once you have written your code, make sure you do the auditing process I describe in Part III. I will also be doing it in the video for this exercise if you aren't quite sure how it's done.
Your study drill for this exercise is to try to implement this algorithm again, completely from memory the way I describe in the introduction to Part III. You should also try to think through what operations in this data structure are most likely painfully slow. When you are done, perform your auditing process on what you created.