Django Model Definitions: How They Differ From the Generated Database Schema

Models are the way we tell Django what we want our database schema to look like, and are the things through which we interact with rows, or instances, in our database.

One of the benefits of using a framework like Django is that it provides functionality making it easier to create and interact with our data than if we had to do so directly with the database and generate and validate our own SQL statements. This means that there are things that will look different in the database than they do in our code — this is okay! But it’s also interesting to look at some of them to better understand where Postgres (in this example, you may be using a different database engine) and Django diverge, and which is driving what behavior.

Let’s build an example model as we go through this, so that at the end we can take a comprehensive look at the differences between our model definition, the migration file, and the database schema.

from django.db import models

class BlogPost(models.Model):
    # We'll build out our model definition here!

Primary Keys

By default, models are given an id field as the primary key without needing to define it at all (so nothing to add to our example model definition we’ll be using throughout this post!) — documented here.

Foreign Keys

To our example BlogPost class, we’ll add a foreign key to the author:

author = models.ForeignKey(User)

However, what’s actually stored in the database for foreign keys is the value of the related object’s primary key (typically the id field, as mentioned above). For this reason, the name of the field in the database will be slightly different than our model definition: the field will be author_id. Both author and author_id are available as properties on a model instance.

Optimization note: there’s a cool implication of this! Because of Django magic related to defining something as a ForeignKey (instead of just an integer, which is what’s actually stored as the database), when we call .author on an instance of a blog post, it will retrieve that instance from the database as well. If all we need is the id field, calling .author_id can improve performance over calling .author.id

The database field will be of type integer and will by default have an index, even though we did not define db_index=True the way we would for non-foreign key fields on which we wanted an index.

The constraint bit of a `ForeignKey is actually handled at the database level, not Django - even though this is just an integer field, trying to save an integer in it for which there is no corresponding row in the related table, will fail.

ManyToMany Fields

A ManyToManyField is Django’s way of creating the many-to-many relationship without you needing to define a through table (though you can provide a through argument if you have a table defined yourself). If you inspect your database schema after creating one of these, you will see that a join table is created for which you have no corresponding model.

To demonstrate this, we’ll add a Tag table and create a ManyToMany field on our BlogPost table so we can see what this looks like at the end:

tags = models.ManyToManyField(Tag, blank=True)

`auto_now` and `auto_now_add`

I wrote more about these fields and their use for tracking when model instances were created and updated here, but it’s worth noting that they’re used strictly by Django. Django ensures that the fields are added or updated on each save, but in the database they’re just timestamp fields. Let’s add them to our model definition so we can see what things look like at the end.

created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)

Validators

`blank` and `choices`

Both of these are options you can pass into a field as you’re initializing them, and neither has any impact on the schema definition of the field. Django uses them both when validating form submissions, and when trying to save data to the database via the ORM, and will only allow you to save values that comply.

However, because these are not enforced at the database level, you could definitely write an SQL statement that saves invalid data. This is one of the (many!) reasons always using the ORM to access data is advisable.

Our ManyToManyField above is already defined with blank=True, so we’ll just add a new field here to demonstrate choices:

access = models.IntegerField(choices=ACCESS_CHOICES)

Custom field types

Examples of custom field types include EmailField, UrlField, and SlugField, though it’s definitely worth perusing all the different field types Django offers (documentation here or source code here — looking at the source code can help clarify which base field type each of these inherits from, a hint at how it will actually be defined in the database).

All of these fields inherit from one of the basic field types (usually a CharField) but have a default validator on them, giving you one less thing to worry about when validating form submissions or API serializers, and which handle the work for you of deciding how to define the validator as you’re adding the field. Let’s add a SlugField to our example model:

path = models.SlugField()

Uniqueness

Uniqueness is interesting: when a single field on a model needs to be unique, we define that by just applying unique=True to the field definition, but at the database level, it doesn’t touch the field — instead Postgres creates an index to enforce uniqueness. The Django docs mention this in reference to the fact that adding unique=True means it’s unnecessary to add db_index=True, since we definitionally get an index with the uniqueness constraint.

This means that unlike with things like blank and choices, even if you’re accessing the database directly with SQL, this constraint will be enforced.

Let’s add a unique field to our model:

title = models.CharField(max_length=255, unique=True)

Help Text

Adding help_text to a field definition doesn’t have any impact on the database at all. Django uses it when when drawing model forms. Perhaps unexpectedly though, if you change it, running makemigrations will generate a migration file that will be a no-op when it’s run. Let’s add help text to one of the fields we added above.

Bringing it all Together

Model Definition

Ok, pulling together each of the fields we discussed above, this is what our model definitions look like now:

from django.contrib.auth import get_user_model
from django.db import models

User = get_user_model()

class BlogPost(models.Model):
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)
    author = models.ForeignKey(User)
    title = models.CharField(max_length=255, unique=True)
    path = models.SlugField()

    ACCESS_CHOICES = (
        (0, 'Public'),
        (1, 'Private'),
        (2, 'Password')
    )
    access = models.IntegerField(choices=ACCESS_CHOICES, help_text='controls who this post is visible to')
    tags = models.ManyToManyField(Tag, blank=True)

class Tag(models.Model):
    name = models.CharField(max_length=100)

Migration File

Given the model definitions above, when we run makemigrations, here’s are the operations that are added to the migration file:

operations = [
    migrations.CreateModel(
        name='BlogPost',
        fields=[
            ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
            ('created_at', models.DateTimeField(auto_now_add=True)),
            ('updated_at', models.DateTimeField(auto_now=True)),
            ('title', models.CharField(max_length=255, unique=True)),
            ('path', models.SlugField(max_length=255)),
            ('access', models.IntegerField(choices=[(0, b'Public'), (1, b'Private'), (2, b'Password')], help_text=b'controls who this post is visible to')),
            ('author', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to=settings.AUTH_USER_MODEL)),
        ],
    ),
    migrations.CreateModel(
        name='Tag',
        fields=[
            ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
            ('name', models.CharField(max_length=100)),
        ],
    ),
    migrations.AddField(
        model_name='blogpost',
        name='tags',
        field=models.ManyToManyField(blank=True, to='Tag'),
        ),
    ]

You can see here some of the things we talked about, such as how both of our new models have an id field, which is shown here as an AutoField. For the most part though, things look an awful lot like what we defined in our models.py file. That’s because migration files are solidly in the Django realm, instead of in the database realm — Django translates them into SQL under the hood.

Note: if you want to see the SQL that will be run when you run your migration file, you can use sqlmigrate for that.

Things get more interesting when we look at what happens in the database itself. Let’s do that now.

Database Schema

First, let’s look at the tables we have. When we go into our Postgres shell and run \d, here’s what we see:

Schema |    Name                   |   Type  
--------+---------------------------------------------------------
public  | blogpost                  | table 
public  | blogpost_tags             | table 
public  | tag                       | table

One of the primary things that you’ll notice here is there’s a table in the database that we didn’t define at all in our models! That’s the blogpost_tags table, which is the join table between blogpost and tag. It has its own primary key, in addition to foreign keys to those two tables.

Ok, let’s look at the database schema for the blogpost table, which is where most of the interesting things are. We can do that by running \d blogpost. Here’s what we see when we do that:

Column   |           Type              |  Modifiers
------------+--------------------------+--------------------------
 id         | integer                  | not null default nextval('blogpost_id_seq'::regclass)
 created_at | timestamp with time zone | not null
 updated_at | timestamp with time zone | not null
 title      | character varying(255)   | not null
 access     | integer                  | not null
 author_id  | integer                  | not null
Indexes:
    "blogpost_pkey" PRIMARY KEY, btree (id)
    "blogpost_title_key" UNIQUE CONSTRAINT, btree (title)
    "blogpost_author_id_f4a5a919" btree (author_id)
    "blogpost_title_14359379_like" btree (title varchar_pattern_ops)
Foreign-key constraints:
    "blogpost_author_id_f4a5a919_fk_auth_user_id" FOREIGN KEY (author_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "blogpost_tags" CONSTRAINT "blogpos_blogpost_id_306b49da_fk" FOREIGN KEY (blogpost_id) REFERENCES blogpost(id) DEFERRABLE INITIALLY DEFERRED

Let’s do a quick rundown of the things we notice here:

We have an id field, even though we didn’t define one
Our ForeignKey fields are just integer fields, but they have CONSTRAINT FOREIGN KEY on them, which is how cascade deletion is handled
No sign of help_text, blank, or choices
Also no sign of a specific slug field type, though that field was given a max_length of 50, even though we didn’t define that
Our access field (which had the choices) is just an integer field, since that’s the type of data all of the options were
Our created_at and updated_at fields are just your typical timezone fields, with no references to being populated automatically
I’m not going to talk about all of the indices that are defined there, as that’s out of scope for this post, but the notable one is that our title field has a UNIQUE CONSTRAINT index on it.

There are plenty more interesting bits to the way Django and databases communicate with each other, and how things are translated from one to the other, but hopefully this was a useful primer. ✨