Warning

This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email help@learncodethehardway.org to talk privately.

Exercise 29: diff and patch

To finish Part IV you will simply apply the full TDD process you've been studying on a much more involved project that may be unfamiliar to you. Refer back to Exercise 28 to confirm you know the process, and make sure you follow it strictly. Create a check-list to follow if you must.

Warning

When you are actually working, all this strict process is not very useful. Currently you are studying the process and working on internalizing it so you can use it in the real world. That's why I am being strict about how you should follow it. This is only practice, so don't become a zealot about it when you are doing real work. The purpose of the book is to teach you a set of strategies to get work done, not teach you a religious rite you can preach to the masses.

Exercise Challenge

The diff command takes two files and produces a third file (or output) that encodes what changed in the first to make the second. It's the basis of tools like git and other revision control tools. Implementing diff in Python is fairly trivial since there's a library that does it for you, so you don't need to work on the algorithms (which can be very complex).

The patch tool is the companion to the diff tool as it takes a diff file and applies it to another file to produce the third file. This lets you take changes you've made in two files, run diff to produce only the changes, then send that .diff file to someone. That person can then use their original copy of the file and your .diff with patch to rebuild your changes.

Here's an example work flow to demonstrate how diff and patch work. I have two files A.txt and B.txt. The A.txt file contains some simple text, and then I copied it and created B.txt with some modifications:

$ diff A.txt B.txt > AB.diff
$ cat AB.diff
2,4c2,4
< her fleece was white a mud
< and every where that marry
< her lamb would chew cud
---
> her fleece was white a snow
> and every where that marry went
> her lamb was sure to go

This produces a file AB.diff that has changes from A.txt to B.txt, which you can see is fixing a rhyme I broke. Once you have this AB.diff you can use patch to apply the changes:

$ patch A.txt AB.diff
$ diff A.txt B.txt

That finall command should show no output since the patch command before it effectively made A.txt have the same contents as B.txt.

Implementing these two should start with the diff command since you have a fully implemented diff using Python to cheat from. You can find it at the end of the difflib documentation but try to implement your version and see how it compares to theirs.

The real meat of this exercise is the patch tool, which Python does not implement for you. You will want to read up on the SequenceMatcher class in difflib and specifically look at the SequenceMatch.get_opcodes function. That is your only clue to making patch work, but it's a very good clue.

Study Drills

  1. How far can you take this diff and patch combination? Can you combine them into one tool? Can you make it work like a miniature git?

Further Study

Find as many diff algorithms as you can. Another thing to research is how a tool like git works.

Pre-order Learn More Python The Hard Way

When you pre-order Learn More Python The Hard Way, you'll receive the Python 3 Edition as it's being created. All files are DRM free and you can download them to your computer for offline viewing. Digital Download Only! You do not get a physical book.

$29.00

Buy the Online Course

Or, you can read Learn More Python the Hard Way for free right here, video lectures not included.

Other Buying Options

Other buying options coming soon.

No content available for this exercise. You can view all available downloads at your customer account page.