Warning

This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email help@learncodethehardway.org to talk privately.

Exercise 26: hexdump

You've done a warmup with xargs and are now working in the cycle of code then audit. You are now going to try to complete the next challenge with a "Test First" approach. This is where you write a test that describes your expected behavior, and then you implement the behavior until the test passes. You are going to be copying the hexdump utility and attempting to match your version's output to the real version's. This is where Test First development really helps since it automates the process of mimicing another piece of software.

This technique is very useful when you need to write a replacement for a terrible piece of software. A common job in software is to work on a project that aims to replace an older system with a more modern implementation. An example is replacing an old COBOL banking system with a fresh, new, hot Django system. The motivation is typically to make it easier to maintain and extend by using something easier to work with than the old system. If you can write a set of automated tests that validate the behavior of the old system and then point that test suite at the new system, then you have a way to confirm that your replacement works...mostly. Trust me, these replacement jobs are nearly impossible and don't succeed too often, but an automated test helps.

In this exercise you'll add to your process the following:

  1. Write a test case that runs the original hexdump in a scenario you need to implement. Let's say the -C option. You'll either need to use subprocess to launch it or simply run it ahead of time and save the results to a file that you load.
  2. Write the code that makes this test work by having the test run your version of hexdump and then compare the results. If they aren't equal, then you didn't do it right.
  3. Then audit both the test code and your code.

I chose hexdump because the difficulty is in replicating its strange output format for viewing binary data. There's nothing too complicated about how it works. It's just matching the output that you need to get right. This helps you practice test first testing.

Note

When I say "write a test first" I do not mean a whole massive test.py file with all the functions and huge amounts of imaginary code. I mean what I've taught in the past. Write a little test case--maybe just 1/10th of one test function--then write the code to make that work, and then bounce back and forth between the two. The more you know about the code the more of the test case you can write, but don't write reams of test code with nothing to run against it. Work incrementally instead.

Exercise Challenge

The hexdump command is useful when you want to see the contents of a file that is not viewable text. It displays the bytes in a file in various useful formats, including hex, octal, and with an ASCII output trailing on the side. The difficulty in implementing your own hexdump isn't reading the data or even converting it to different formats. You can do that easily using the hex, oct, int, and ord functions in Python. The original format string operators are also useful as there's options for doing fixed precision octal and hex formatting.

The real difficulty is in formatting the output correctly for each of the different options so that it streams right and fits on the screen. Here's the first few lines of the hexdump -C output for a Python .pyc file:

00000000  03 f3 0d 0a f0 b5 69 57  63 00 00 00 00 00 00 00  |......iWc.......|
00000010  00 03 00 00 00 40 00 00  00 73 3a 00 00 00 64 00  |.....@...s:...d.|
00000020  00 64 01 00 6c 00 00 6d  01 00 5a 01 00 01 64 00  |.d..l..m..Z...d.|
00000030  00 64 02 00 6c 02 00 6d  03 00 5a 03 00 01 64 03  |.d..l..m..Z...d.|
00000040  00 65 01 00 66 01 00 64  04 00 84 00 00 83 00 00  |.e..f..d........|

The man page for this "canonical" formatting states that it is:

  1. Display input offset in hexadecimal. So 10 is not really 10 in decimal, it's in hex. Do you know hex?
  2. Sixteen space-separated, two column, hexadecimal bytes. That's each byte converted to hex. How many columns represent one byte?
  3. Then the same sixteen bytes in %_p format, which looks like a Python format specifier but it's particular to hexdump. You'll need to read the man page more to find out what it means.

Then hexdump can also receive input from the stdin input, which means you can pipe things into it:

echo "Hello There" | hexdump -C

which produces this on my macOS system:

00000000  48 65 6c 6c 6f 20 54 68  65 72 65 0a              |Hello There.|
0000000c

Notice that last line with the c character? Need to find out what that is I guess.

It's this formatting and output that's going to be difficult, and your game is to replicate it as best as you can, which is why the beginning of this exercise dictated that you work in a test first manner. It will be much easier to create tests that feed data to your hexdump and compare it with the real hexdump until it starts working.

Study Drills

  1. Research the od command, and see if your hexdump code can be reused for an implementation of od. If it can, then make a library that both of them use.

Further Study

There are people who advocate only doing test first development, but I believe that no technique works all the time. I prefer to write tests first when I'm testing the interaction of the software from the user's perspective. I will write tests that describe the user interacting with the software and then go make the software make it happen. This is what you did here since you were testing how the user sees output from your command line hexdump calls.

For other types of coding tasks, dictating whether to write a test first or the code first is ridiculous and just kills your ability to solve a problem. Automated tests are simply tools, and you are an intelligent person who has the authority to try to use tools however you think they will work best in each situation. Anyone telling you different is probably an abusive person who actually isn't that great at writing software.

Pre-order Learn More Python The Hard Way

When you pre-order Learn More Python The Hard Way, you'll receive the Python 3 Edition as it's being created. All files are DRM free and you can download them to your computer for offline viewing. Digital Download Only! You do not get a physical book.

$29.00

Buy the Online Course

Or, you can read Learn More Python the Hard Way for free right here, video lectures not included.

Other Buying Options

Other buying options coming soon.

No content available for this exercise. You can view all available downloads at your customer account page.