I’ve been writing about (and working in) Django for almost five years now, and I’ve written a lot about specific pieces of implementation. But as I’m about to move into a new role that won’t be using Django anymore, I want to take a step back and look at how it all fits together.
This post is largely a compilation of links to both the documentation, as well as posts I’ve written about various topics related to running Django at scale in production, and some related high-level thoughts. Feel free to reach out if there are areas you’re curious about that you don’t see covered here, and I’ll see what I can do!
If you’re here, you probably either already have an application built in Django, or have it on your short-list. In case you’re still weighing your options, here are some of the reasons you might choose it:
If you’re weighing Django as an option for a new project but haven’t built in it before, I definitely recommend working through their tutorial to get a basic understanding of how the framework works. The tutorial introduces you to basic topics such as routing, logic, data modeling, and forms.
Of course, Django isn’t the right choice for every project out there.
Django is quite full-featured, so if you just need a couple web endpoints or a static site, it is likely overkill. If you’re looking for something more lightweight or don’t need an ORM, but you still want to build in Python, Flask might be a good choice.
Likewise, if your primary data store is non-relational, Django might not be the right choice for you - though it can be done, apparently.
Django’s Model layer is built for representing things that are stored in a relational database in an object-oriented way. However, there are actually some differences between how your models are defined and the underlying table structure in your database — mostly things that prevent you from needing to repeat yourself on every model, or define behavior with which you’ll never need to interact. This is usually overwhelmingly a good thing, but can complicate things as the needs of your app change. For example, Django has a specific field type to represent a many-to-many relationship that defines the through-table for you without your defining a model to represent it. But what if, down the road, you need to have that model defined? You can do it, but it will take a little bit of effort to do it without data loss or downtime.
A few other bits of syntactic sugar Django provides for allowing you to interact easily with your database are:
auto_now_add) that can be used for automatically populating fields to track when a row was created or updated.
get_or_create, which allow us to abstract away some common patterns and avoid race conditions
If you have a multi-tenant application or store read-only analytics data in a separate database (or any other reason!) you can connect your application to multiple databases and route different requests to them.
When you change your model definitions, Django creates automatic migration files for you that, when run, run the SQL to apply the changes to your database, and track dependencies to ensure you could rebuild a database from scratch. These need to be done carefully, of course, to ensure that there aren’t runtime issues with the application while they’re running.
In most cases, migrations need to be run before the application servers have access to the new code: consider when you’re adding a new field to a model. If the application server has access to the code before the migration has been run, you’ll get an application error that the field does not exist. Of course, the opposite is true when removing a field: it needs to be removed from the application code before it’s removed from the database definition (though it also needs to be nullable to prevent errors when writing to the table during this period). Handling migrations in this way often requires multiple sequential deploys, or having multiple environments that are connected to the same database that can be deployed to in a specific order.
However, as your app gets more complicated and has more traffic, Django’s built-in migrations won’t be able to meet all your needs. Migrations are built to get your database in a particular state, but don’t always give the flexibility you need to get them into that state in the specific way you need to with specific database operations. One example of this is the need to add a
unique_together constraint: this takes a lock on the table that can cause downtime if the table has a lot of data and is also highly accessed. Postgres handles this with concurrent operations, but Django migrations don’t handle this. The good news is, they do allow you to write your own custom SQL to run with a migration file, so that your deploy pipeline can treat those operations as they would any other. You can read about this specific example and see how this comes in handy here
The Django Rest Framework (DRF) is an excellent way to build an API within your Django application. You can use it for both public-facing APIs, as well as for those you use within your own application, perhaps to replace views to drive your front end, or to connect your microservices.
The most standard use-case is building CRUD functionality for your models, but you can make adjustments where your external data model doesn’t exactly match your internal one, or use it to build Remote Procedure Calls (RPC, or endpoints not strictly based on a CRUD action).
Most web applications working at scale have some operations that don’t need to happen in the foreground in order for the page to load (sending emails is one of the most common use cases, though there are as many different use cases as there are apps!). One of the easiest ways to build out background tasks within your Django application is celery, though there are of course other options.
Here are some notes and things to watch out for that I’ve run into with background tasks within a Django application:
Most monitoring and observability happens in tools outside of Django itself, and Django is not opinionated about which you use. That said, in order for these tools to have access to sufficient information and data to do their jobs, we must first emit log lines - Django allows you to define custom handlers and formatters. This framework can be used and extended to add custom attributes to each log line, to simplify downstream parsing and searching.
I’ve loved working with Python and Django over the past five years. It’s been a great framework that’s shown fantastic built-in support for things that I was glad to not have to build my own versions of, as well as the flexibility to overwrite the things that I did. I’ve seen engineers (myself included) without any experience with it ramp up and become productive in it quickly. I hope you find the same joy in it, and reach out if you have questions! 🎉