No Downtime Deployments: Changing the Signature of a Background Task

No-downtime deployments is one of my favorite things to think about. It requires thinking about inputs and outputs and the different states of your application and database and other infrastructure during the phases of a deployment. It requires you to be intentional about how you make changes, not just what changes you make.

As business needs change and features evolve, it is common to change the signature of a function. Usually this is fine — you can control what the input of a function is, and adjust the logic within the function accordingly. Not so simple when your function is a background task though!

The Problem

The problem here is best demonstrated by example. Lets say you have a task that currently looks like this (Syntax in the examples assumes you’re using celery with Django, but the concepts apply even if you’re not!):

@shared_task(ignore_result=True)def do_something(arg_1):    
    ...

and you now need it to take an additional parameter, so that it looks like this:

@shared_task(ignore_result=True)def do_something(arg_1, arg_2):    
    ...

The issue here is that background tasks are definitionally not evaluated at the time they are enqueued, so a task that was enqueued with only arg_1 passed in will throw an error if you’ve deployed your change between when it was enqueued and when it will be processed — by the time it’s picked up, the function is expecting both arg_1 and arg_2 to be passed in, and it only has arg_1!

This is most likely only a problem for a minute or two, while your app finishes processing any tasks that were enqueued before the change, and it will then self-resolve, but some number of tasks will not be able to be processed successfully. This is obviously not ideal.

The Solution

As is often the case with challenges like these, a change like this will need to be made in phases, and doing it without downtime will require multiple deployments. We’ll take advantage of the fact that this works in Python:

def something(arg_1, arg_2=None):
    if arg_2:
        print arg_1, arg_2

>>> something("foo", arg_2="bar") # => foo bar
>>> something("foo", "bar") # => foo bar

This means that we can do the following:

Deploy a change that:

Adds a kwarg to our function that has a default value of None, and updates the function to handle the fact that it may or may not exist
Updates the places the task is enqueued to pass in both arguments. The second one need not be passed in as a kwarg — this is not best practice and would be confusing if you intended to leave your code this way, but since this is a very interim solution. Seriously — this should only need to be in your code for 5 minutes or less, assuming all of your background tasks are picked up within that timeframe. This isn’t an example of a “make a #TODO and come back to it when I feel like it” kind of thing, this is a “have both PRs ready and deploy them one after the other” kind of thing.

Once you’re sure that all of the tasks that were enqueued before you deployed your first change have been processed, deploy a change that changes your newly added argument from a kwarg to an arg. At this point, you can also remove any code that was accounting for the fact that the newly added argument may or may not exist.

Note: this works even if your function already has multiple args and kwargs. You would add your new argument as the first kwarg (arg3 in this example):

def do_something(arg1, arg2, arg3=None, arg4=None):
    print arg1, arg2, arg3, arg4

>>> do_something(1, 2, arg3=3, arg4=4) # => 1 2 3 4
>>> do_something(1, 2, 3, arg4=4) # => 1 2 3 4