Warning

This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email help@learncodethehardway.org to talk privately.

Exercise 52: moreweb

Now that you've created a web server using the Python http.server library you can move to the very final project. You are going to create your own web server from abolute nothing using everything you've learned so far. In Exercise 51 you created the majority of the handling that is "above" the http.server module. You didn't do any network connection handling or HTTP protocol parsing. In this final exercise you will implement all the gear necessary to replicate what http.server does for your lessweb server.

Exercise Challenge

To complete this exercise you'll want to read the Python 3 asyncio module. This library provides you with tools for handling socket requests, creating servers, waiting for signals, and most everything else you'll need. If you want an extra challenge doing this, then you can use the Python 3 select module, which provides an even lower level for handling sockets. You should use this documentation to create a series of small little socket servers and clients.

Once you understand how to create servers and clients that talk over a TCP/IP socket you'll need to move to processing HTTP requests. This part of the project is going to be daunting as the HTTP standard is insane and way more complicated than it needs to be. I would start with the simplest HTTP parsing library you can design, and then expand on it with more and more samples. The first place to start is the RFC 7230, but be prepared to experience some of the worst writing humans could devise.

The best way to study RFC 7230 is to first extract out all of the grammar listed in the Collected ABNF appendix. At first glance this seems crazy since this is just a huge grammar specification. You actually learned how to read these in Part V of this book but on a smaller scale. You know how regular expressions, scanners, and parsers work and how to read a grammar like this. All you need to do is study this grammar and implement it a little bit at a time. When implementing this I would completely ignore any of the "chunk" grammar.

Once you've studied this grammar you should start writing a parser for HTTP using what you have already created. Use your data structures, parsing tools, and everything you can to create a valid parser for a small subset of HTTP. Try to cover as much of this grammar as you can. To help you out, there is a set of test files that have valid HTTP requests in them at https://learncodethehardway.org/more-python-book/http_tests.zip. You can download this set of test cases and run them through your parser to confirm it works. I extracted many of these test cases from the excellent And-HTTP server and then augmented them with more basic examples. Your goal is to get as many of these as possible passing.

Finally, once you have a way to write a decent asyncio or select socket server and a way to parse HTTP, you can then put the two together and make your first functioning web server.

Breaking It

You should definitely try to break this web server, but you should also try something different here. You've written a parser for HTTP that tries to handle valid HTTP in the most logical way possible using a RDP style parser. There's a good chance that your parser blocks many bad HTTP requests, so find some past attacks and try them on your web server. There are several website hacking automation tools out there, so grab one and point it at your server. Be safe about this though, and make sure you are only running reputable testing tools and only on your own server.

Further Study

If you wanted to completely understand web servers and technology then use your moreweb server to create a web framework. I would suggest creating a web site first and then extract the patterns you needed out of it for a web framework. The goal of such a framework is to encapsulate patterns that you use so that you can simplify later web applications you make. As with the lessweb and moreweb exercises, your goal should also be to research, implement, and exploit common attacks to web frameworks.

If you want to dive deep into TCP/IP I recommend the book Effective TCP/IP Programming by Jon C. Snader. This book is written in C, but it is effectively "Learn TCP/IP The Hard Way" and covers 44 topics with simple code for you to understand how basic TCP/IP works. C is where TCP/IP was born, so much of how other languages handle socket connections seems weird until you know how C does it. By studying this you'll get a firm grounding in how socket servers work. The only warning is the book is a little dated, so the code should work, but it won't be the most modern code possible.

Pre-order Learn More Python The Hard Way

When you pre-order Learn More Python The Hard Way, you'll receive the Python 3 Edition as it's being created. All files are DRM free and you can download them to your computer for offline viewing. Digital Download Only! You do not get a physical book.

$29.00

Buy the Online Course

Or, you can read Learn More Python the Hard Way for free right here, video lectures not included.

Other Buying Options

Other buying options coming soon.

No content available for this exercise. You can view all available downloads at your customer account page.