Django: making backfills in migrations backward compatible

As teams and codebases grow, things that seem straightforward can become more complex. For example, a series of database schema changes that when deployed independently and in order, work exactly as expected, can cause other developers issues when trying to get their local environments caught up. Let’s look at one example of this in Django, along with its solution: deprecating a database field, but using it as part of a backfill prior to its removal from the database schema.

The Scenario Your workflow when deprecating a field in favor of a new one might look something like this:

In the first PR:

  • Add the new field (nullable)
  • Backfill the new field from the old one
  • Make the old field nullable if it wasn’t already, to prepare for it’s removal
  • Make sure any write paths write to both fields

In the second PR:

  • Make the new field not-nullable, if desired
  • Remove dual writes from relevant code paths
  • Deprecate the old field (depending on your deployment process, you may need to deprecate the field from code separately from deprecating it from the database schema in order to not cause downtime)

In an example, say we have a Student model, with an email field on it, but now we need to support the possibility that a student has multiple email addresses. As a first step, we might want to migrate our existing email field to a new one called primary_email.

The migration file in our first PR might look like this:

# -*- coding: utf-8 -*-

from django.db import migrations, models
from django.db.models import F

from student.models import Student


def backfill_display_name(apps, schema_editor):
    Student.objects.update(primary_email=F('email'))


class Migration(migrations.Migration):

    dependencies = [
        ...
    ]

    operations = [
        migrations.AddField(
            model_name='student',
            name='primary_email',
            field=models.CharField(max_length=100, null=True),
        ),
        migrations.RunPython(backfill_display_name, migrations.RunPython.noop)
    ]

This is fairly straightforward - we’ve updated the migration file that Django automatically generates when a field is added to the model definition that includes a backfill, setting the value of the new primary_email field with the value of the existing email field. (As a side-note, having a no-op rollback operation in this scenario is just fine, since a rollback would entail removing the field from the schema, thereby dropping anything that had been added anyway).

This PR can safely be deployed and we can carry on our merry way. So we open another PR that removes the deprecated field. That one might look something like this:

class Migration(migrations.Migration):

    dependencies = [
        ...
    ]

    operations = [
        migrations.RemoveField(
            model_name='student',
            name='email',
        ),
    ]

We deploy this PR next, and everything goes swimmingly! Nice work.

This sequence of migrations might have gone through successfully when run independently and in order, but if another developer didn’t pull down the master branch of the repository and run migrations between these two PRs going out, when they pull down both and try to catch up by running migrations, they will see an error that looks like this:

django.core.exceptions.FieldError: Cannot resolve keyword 'name' into field. Choices are: <field choices listed here>

What happened?

Well, the developer that just pulled down the code now has a version of the Student model definition that doesn’t have the email field defined on it, so when we try to access that field in the backfill in the first migration, it throws this error, and we can’t complete the backfill. This is because we imported Student directly from the relevant models.py file, which represents the current state of the model.

But that’s where we normally import models from…what can we do instead?

Enter apps.get_model() - documentation here.

You’ll notice that even in the original backfill we wrote, the backfill function takes two arguments: apps and schema_editor - we’re going to focus on the first one here. This is true of any function passed into migrations.RunPython (documentation here). This gives us an alternative to importing our models directly from the models.py file and actually imports the model as it was defined at the time. This means that regardless of the current model definition of Student at the time this migration is run, it will have access to the email field, because it existed at the time (this assumes you don’t go back and re-write the history of your migration files - it uses previous migration files to determine what the state of the model was at the time…but let’s just assume re-writing history is generally bad, though as with everything, there are exceptions!).

def backfill_display_name(apps, schema_editor):
    Student = apps.get_model('student', 'Student')
    Student.objects.update(primary_email=F('email'))

If we re-write our migration like the above, a developer who pulls down the codebase and tries to run migrations days or weeks later, will have no trouble doing so 🎉Happy migrating!