Warning

This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email help@learncodethehardway.org to talk privately.

Exercise 17: Dictionary

You should be familiar with Python's dict class already. Any time you write code like this:

cars = {'Toyota': 4, 'BMW': 20, 'Audi': 10}

you're using a dict to associate the models of cars ('Toyota', 'BMW', 'Audi') to the total you have (4, 20, 10). Using this data structure should be second nature to you by now, and you probably don't even think about how it works. In this exercise you will learn how a dict works by implementing your very own Dictionary from data structures you already created. Your goal in this exercise is to implement your own version of a Dictionary based on code I've written here.

Exercise Challenge

In this exercise you are going to fully document and understand a piece of code I've written and then write your own version of it from memory as best you can. The purpose of this exercise is to learn to dissect and understand a complicated piece of code. It is also important to be able to internalize, or memorize, what goes into creating a simple data structure like a Dictionary. The best way I've found to learn to dissect and understand a piece of code is to reimplement it based on your own study and memorization.

Think of this as a "Master Copy" class. A Master Copy comes from painting, where you take a painting created by someone better than you and attempt to make a copy of it. Doing this teaches you how that person painted and improves your skills. Code and paintings are similar in that all the information is right there ready to copy, so you can easily learn from someone else by just copying their work.

Doing a "Code Master Copy"

To create a "code master copy" you'll follow this procedure which I'm calling the CASMIR process:

  1. Copy the code and get it working like you normally do. Your copy should be exactly the same. This helps you get an understanding of it and forces you to study it closely.
  2. Annotate the code with comments and write an analysis for all of the code, making sure you understand every line and what it does. This may involve jumping into other code you've written to tie the whole concept together.
  3. Summarize the general structure with succinct notes for what makes this code work. That would be a list of functions and what each function does.
  4. Memorize this succinct description of the algorithm and key pieces of code.
  5. Implement what you can from memory, and when you run out of details, go back to your notes and original code to memorize more.
  6. Repeat this process as many times as you need to finally make a copy from memory. Your copy from memory does not have to be exactly the same but should be close and pass the same test you create.

Doing this will give you a deeper understanding of how the data structure works, but more importantly, help you internalize and recall what this data structure does. You'll be able to understand the concept and also implement the data structure when you need to create one. This will also train your brain to memorize other data structures and algorithms in the future.

Warning

The only warning I have is that this is a very naive, stupid, slow implementation of a Dictionary. You are really copying a simplified stupid Dictionary that has all the basic elements and works but needs vast improvements for production. Those improvements will come when we reach Exercise 19 and study performance tuning. For now, just implement this simple version so you can understand the basics of the data structure.

Copy the Code

First we'll look at the code for Dictionary that you'll have to copy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
from dllist import DoubleLinkedList

class Dictionary(object):
    def __init__(self, num_buckets=256):
        """Initializes a Map with the given number of buckets."""
        self.map = DoubleLinkedList()
        for i in range(0, num_buckets):
            self.map.push(DoubleLinkedList())

    def hash_key(self, key):
        """Given a key this will create a number and then convert it to
        an index for the aMap's buckets."""
        return hash(key) % self.map.count()

    def get_bucket(self, key):
        """Given a key, find the bucket where it would go."""
        bucket_id = self.hash_key(key)
        return self.map.get(bucket_id)

    def get_slot(self, key, default=None):
        """
        Returns either the bucket and node for a slot, or None, None
        """
        bucket = self.get_bucket(key)

        if bucket:
            node = bucket.begin
            i = 0

            while node:
                if key == node.value[0]:
                    return bucket, node
                else:
                    node = node.next
                    i += 1

        # fall through for both if and while above
        return bucket, None

    def get(self, key, default=None):
        """Gets the value in a bucket for the given key, or the default."""
        bucket, node = self.get_slot(key, default=default)
        return node and node.value[1] or node

    def set(self, key, value):
        """Sets the key to the value, replacing any existing value."""
        bucket, slot = self.get_slot(key)

        if slot:
            # the key exists, replace it
            slot.value = (key, value)
        else:
            # the key does not, append to create it
            bucket.push((key, value))

    def delete(self, key):
        """Deletes the given key from the Map."""
        bucket = self.get_bucket(key)
        node = bucket.begin

        while node:
            k, v = node.value
            if key == k:
                bucket.detach_node(node)
                break

    def list(self):
        """Prints out what's in the Map."""
        bucket_node = self.map.begin
        while bucket_node:
            slot_node = bucket_node.value.begin
            while slot_node:
                print(slot_node.value)
                slot_node = slot_node.next
            bucket_node = bucket_node.next

This code implements a dict data structure using your existing DoubleLinkedList code. If you don't fully understand the DoubleLinkedList then you should attempt the Code Master Copy procedure I give you to understand it better. Once you're sure you understand DoubleLinkedList you can type this code in and get it working. Remember that it must be a perfect copy before you start to annotate it. The worst thing you could do is annotate a broken or incorrect copy of my code.

To help you get this code right, I've written a quick and dirty little test script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from dictionary import Dictionary

# create a mapping of state to abbreviation
states = Dictionary()
states.set('Oregon', 'OR')
states.set('Florida', 'FL')
states.set('California', 'CA')
states.set('New York', 'NY')
states.set('Michigan', 'MI')

# create a basic set of states and some cities in them
cities = Dictionary()
cities.set('CA', 'San Francisco')
cities.set('MI', 'Detroit')
cities.set('FL', 'Jacksonville')

# add some more cities
cities.set('NY', 'New York')
cities.set('OR', 'Portland')


# print(out some cities
print('-' * 10)
print("NY State has: %s" % cities.get('NY'))
print("OR State has: %s" % cities.get('OR'))

# print(some states
print('-' * 10)
print("Michigan's abbreviation is: %s" % states.get('Michigan'))
print("Florida's abbreviation is: %s" % states.get('Florida'))

# do it by using the state then cities dict
print('-' * 10)
print("Michigan has: %s" % cities.get(states.get('Michigan')))
print("Florida has: %s" % cities.get(states.get('Florida')))

# print(every state abbreviation
print('-' * 10)
states.list()

# print(every city in state
print('-' * 10)
cities.list()

print('-' * 10)
state = states.get('Texas')

if not state:
  print("Sorry, no Texas.")

# default values using ||= with the nil result
# can you do this on one line?
city = cities.get('TX', 'Does Not Exist')
print("The city for the state 'TX' is: %s" % city)

I want you to also type this code in exactly, but when you move on to the next phase of the master copy you'll turn this into an official automated test you can run with pytest. For now, just get this script working so you can get the Dictionary class working, then you can clean it all up in the next phase.

Annotate the Code

Make sure your copy of my code is exactly the same and that it passes the test script. You can then start annotating the code and studying every line to understand what each line does. A very good way to do this is to write an "official" automated test and annotate the code as you work. Take the dictionary_test.py script and convert each section into a little test function, then annotate the Dictionary class as you go.

For example, the first section of the test in test_dictionary.py creates a dictionary and does a series of Dictionary.set calls. I would convert that to a test_set function and then annotate the Dictionary.set function in the dictionary.py file. As you annotate the Dictionary.set function, you'll have to dive into the Dictionary.get_slot function, then the Dictionary.get_bucket function, and finally Dictionary.hash_key. This forces you to annotate and understand a large chunk of the Dictionary class with just one test and in an organized way.

Summarize the Data Structure

You can now summarize what you've learned from annotating the code in dictionary.py and rewriting the dictionary_test.py file to be a real pytest automated test. Your summary should be a clear and small description of the data structure. If you can fit it on a piece of paper then you're doing good. Not all data structures can be summarized that concisely, but keeping the summary small will help you memorize it. You can use diagrams, drawings, words, or whatever you'll be able to remember.

The purpose of this summary is to give you a set of quick notes that you can "hang" more details on when you memorize in the next step. The summary doesn't have to include everything but should include little bits that trigger your memories of the code from the "Annotate" phase, which should in turn trigger your memories of the "Copy" phase. This is called "chunking", where you attach more detailed memories and information to small pieces of information. Keep this in mind when you're writing the summary. Less is more, but too little is useless.

Memorize the Summary

You're going to memorize the summary and annotated code any way you can, but I'm going to give a basic process for memorizing you can use at first. Honestly, memorizing complex things is an organic trial and error process for everyone, but some tricks help:

  1. Make sure you have a note pad of paper and printouts of the summary and code.
  2. Spend 3 minutes simply reading the summary and trying to remember it. Quietly stare at it, read it out loud, read it then close your eyes and repeat what you read, and even try just remembering the "shape" of the words on the paper. It sounds nuts but trust me, it totally works. Your brain is better at remembering shapes than you think.
  3. Flip the summary over and try to write it again from what you remember, and when you get stuck, flip the summary over real quick and cheat. After you quick glimpse, flip the summary back over and try to complete more.
  4. Once you've written a copy of the summary from (mostly) memory, use the summary to do another 3 minutes trying to memorize the annotated code. Simply read a part of the summary, and then look at the relevant part of the code and try to remember it. You could even do just 3 minutes per little function.
  5. Once you've spent time attempting to remember the annotated code, flip that over and using the summary try to recall the code on your notepad. Again, when you get stuck, flip the annotation over quick and cheat.
  6. Keep doing this until you can do an alright copy of the code on paper. Your paper code doesn't have to be perfect Python but should be pretty close to the original code.

It may seem like this is going to be impossible but you'd be surprised how much you can remember when you do this. You'll also be surprised at how well you understand the concept of a dictionary once you're done doing this. This isn't simple rote memorization but rather building a concept map that you can actually use when you attempt to implement the Dictionary yourself.

Warning

If you are the kind of person who has anxiety about memorizing anything, then this exercise will be a huge help for you in the future. Being able to follow a procedure to memorize something helps overcome any frustration at memorizing a topic. Rather than flounder around "failing" at it, you get to watch slow improvement over a consistent process. As you do this, you'll see ways and hacks to improve your recall and do it better. You'll just have to trust me that this seems like a slow way to learn something, but it ends up being much faster than other techniques.

Implement from Memory

It is now time to walk to your computer--leaving your paper notes in another room or on the floor--and attempt your first implementation from memory. Your first attempt may be a total complete disaster, but that's totally alright. You most likely aren't used to trying to implement anything from memory. Just put down anything you remember, and when you get to the end of your thread, walk back into the other room and do some more memorization. After a few trips to your memory room you'll get into it and the memories will flow better. It's totally alright to need to visit your memory notes over and over. It's all about trying to retain the memories of the code and improve your skills.

I recommend that you write whatever comes to your mind first, whether that's a test, code, or both. Then use what you can recall to implement or recall other parts of the code. If you first sit down and remember the test_set function name and a few lines of code, then write those down. Take advantage of them being in your mind right away. Once you have that, use this test to remember or implement the Dictionary.set function as best you can. Your goal is to use any information you can to build and implement other information.

You should also try to use your understanding of the Dictionary to implement the code. Don't simply try to have photographic recall of each line. That's actually impossible as nobody has photographic memory (look it up, nobody does). What most people have is an okay memory that triggers conceptual understandings they can use. You should do the same thing and use what you know of how a Dictionary works to create your own copy. In the example above, you know that a Dictionary.set functions a certain way, and you'll need a way to get the slots and buckets....so that means you need get_slot and get_bucket. You aren't photographically memorizing each character; you're remembering all the key concepts and using them.

Repeat

The most important part of this exercise is that there is no failure in having to repeat this process a few times to get better at it. You'll do this for the rest of these data structures in the book, so you'll get plenty of practice. If you have to go back and memorize 100 times that's alright. Eventually you'll only need to do 50, and then the next time only 10, and then eventually you'll be able to implement a Dictionary from memory easily. Just keep attempting it, and try to approach it like a meditation so you can relax while you do it.

Study Drills

  1. My tests are very limited. Write a more extensive test.
  2. How would the sorting algorithms from Exercise 16 help this data structure?
  3. What happens when you randomize the keys and values to this data structure? Does a sorting algorithm help?
  4. What impact does the num_buckets have on the data structure?

Break It

Your brain may be fried, but take a break, and then try to break this code. This implementation is easily thrashed and overwhelmed with data. How about strange edge cases? Can you add anything as a key or only strings? What will cause problems? Finally, can you do any sneaky tricks to the code to make it seem like it works fine, but it's actually broken in some clever way?

Pre-order Learn More Python The Hard Way

When you pre-order Learn More Python The Hard Way, you'll receive the Python 3 Edition as it's being created. All files are DRM free and you can download them to your computer for offline viewing. Digital Download Only! You do not get a physical book.

$29.00

Buy the Online Course

Or, you can read Learn More Python the Hard Way for free right here, video lectures not included.

Other Buying Options

Other buying options coming soon.

No content available for this exercise. You can view all available downloads at your customer account page.