Django at Scale in Production

I’ve been writing about (and working in) Django for almost five years now, and I’ve written a lot about specific pieces of implementation. But as I’m about to move into a new role that won’t be using Django anymore, I want to take a step back and look at how it all fits together.

This post is largely a compilation of links to both the documentation, as well as posts I’ve written about various topics related to running Django at scale in production, and some related high-level thoughts. Feel free to reach out if there are areas you’re curious about that you don’t see covered here, and I’ll see what I can do!

Why Django?

If you’re here, you probably either already have an application built in Django, or have it on your short-list. In case you’re still weighing your options, here are some of the reasons you might choose it:

Built in Python, a much-loved language (for good reason!)
Rich ecosystem of open-source third-party libraries
Powerful Object-Relational Mapper (ORM) to simplify interacting with your relational database, and make it more secure.
Django Rest Framework makes it easy to build APIs, either for external or internal (to drive your front-end or microservices) use.
It’s an MVC (Model-View-Controller) framework, meaning even people who haven’t used Django before, but might be familiar with other frameworks like Rails or Spring or ASP.NET will be able to pick it up quickly. Terminology note: what other MVC frameworks call “controllers”, Django calls “views”, and what other frameworks call “views”, Django calls “templates”.
The Django Admin panel provides basic CRUD (Create, Read, Update, Delete) functionality out of the box if you’ve defined your models, that can be leveraged until you’ve built your own interface, allowing you to get an MVP up and running more quickly. You can also add custom views and templates to it, if you need to extend the functionality a bit.

If you’re weighing Django as an option for a new project but haven’t built in it before, I definitely recommend working through their tutorial to get a basic understanding of how the framework works. The tutorial introduces you to basic topics such as routing, logic, data modeling, and forms.

Why Not Django?

Of course, Django isn’t the right choice for every project out there.

Django is quite full-featured, so if you just need a couple web endpoints or a static site, it is likely overkill. If you’re looking for something more lightweight or don’t need an ORM, but you still want to build in Python, Flask might be a good choice.

Likewise, if your primary data store is non-relational, Django might not be the right choice for you - though it can be done, apparently.

Databases & the ORM

Django’s Model layer is built for representing things that are stored in a relational database in an object-oriented way. However, there are actually some differences between how your models are defined and the underlying table structure in your database — mostly things that prevent you from needing to repeat yourself on every model, or define behavior with which you’ll never need to interact. This is usually overwhelmingly a good thing, but can complicate things as the needs of your app change. For example, Django has a specific field type to represent a many-to-many relationship that defines the through-table for you without your defining a model to represent it. But what if, down the road, you need to have that model defined? You can do it, but it will take a little bit of effort to do it without data loss or downtime.

A few other bits of syntactic sugar Django provides for allowing you to interact easily with your database are:

Django has some specific field types (auto_now and auto_now_add) that can be used for automatically populating fields to track when a row was created or updated.
The Django ORM provides some handy helpers, such as get_or_create, which allow us to abstract away some common patterns and avoid race conditions
Django also provides model managers that allow you to implement some useful patterns, such as soft deletion

If you have a multi-tenant application or store read-only analytics data in a separate database (or any other reason!) you can connect your application to multiple databases and route different requests to them.

Migrations

When you change your model definitions, Django creates automatic migration files for you that, when run, run the SQL to apply the changes to your database, and track dependencies to ensure you could rebuild a database from scratch. These need to be done carefully, of course, to ensure that there aren’t runtime issues with the application while they’re running.

In most cases, migrations need to be run before the application servers have access to the new code: consider when you’re adding a new field to a model. If the application server has access to the code before the migration has been run, you’ll get an application error that the field does not exist. Of course, the opposite is true when removing a field: it needs to be removed from the application code before it’s removed from the database definition (though it also needs to be nullable to prevent errors when writing to the table during this period). Handling migrations in this way often requires multiple sequential deploys, or having multiple environments that are connected to the same database that can be deployed to in a specific order.

However, as your app gets more complicated and has more traffic, Django’s built-in migrations won’t be able to meet all your needs. Migrations are built to get your database in a particular state, but don’t always give the flexibility you need to get them into that state in the specific way you need to with specific database operations. One example of this is the need to add a unique_together constraint: this takes a lock on the table that can cause downtime if the table has a lot of data and is also highly accessed. Postgres handles this with concurrent operations, but Django migrations don’t handle this. The good news is, they do allow you to write your own custom SQL to run with a migration file, so that your deploy pipeline can treat those operations as they would any other. You can read about this specific example and see how this comes in handy here

Django Rest Framework

The Django Rest Framework (DRF) is an excellent way to build an API within your Django application. You can use it for both public-facing APIs, as well as for those you use within your own application, perhaps to replace views to drive your front end, or to connect your microservices.

The most standard use-case is building CRUD functionality for your models, but you can make adjustments where your external data model doesn’t exactly match your internal one, or use it to build Remote Procedure Calls (RPC, or endpoints not strictly based on a CRUD action).

Background Tasks

Most web applications working at scale have some operations that don’t need to happen in the foreground in order for the page to load (sending emails is one of the most common use cases, though there are as many different use cases as there are apps!). One of the easiest ways to build out background tasks within your Django application is celery, though there are of course other options.

Here are some notes and things to watch out for that I’ve run into with background tasks within a Django application:

When using background tasks, you can add the id of the request that enqueued a background task to the id of the task itself, so the two can be tied together for logging and tracing purposes.
This isn’t actually a Django-specific issue, but it is very much a production one: because tasks that were enqueued before a deploy may still be being processed during and after a deploy, we need to be thoughtful about how to change the signature of a background task without causing errors during deployment.
Another common gotcha I’ve seen teammates struggle with is the difference between local development and the distributed nature of production systems when it comes to background tasks. Passing ORM instances as arguments to methods is common within the application itself, but cannot be done with background tasks because they are not JSON serializable. Instead, pass through the primary key of the object, and retrieve it from the database again when the task is executed.

Logging & Monitoring

Most monitoring and observability happens in tools outside of Django itself, and Django is not opinionated about which you use. That said, in order for these tools to have access to sufficient information and data to do their jobs, we must first emit log lines - Django allows you to define custom handlers and formatters. This framework can be used and extended to add custom attributes to each log line, to simplify downstream parsing and searching.

Wrap-up

I’ve loved working with Python and Django over the past five years. It’s been a great framework that’s shown fantastic built-in support for things that I was glad to not have to build my own versions of, as well as the flexibility to overwrite the things that I did. I’ve seen engineers (myself included) without any experience with it ramp up and become productive in it quickly. I hope you find the same joy in it, and reach out if you have questions! 🎉