get_or_create
is an awesome helper utility to have at your disposal when you need an object matching some specifications, but there should only be exactly one match — you want to retrieve it if it already exists, and create it if it doesn’t.
However, there’s a scenario where it doesn’t quite do what we expect in the case of race conditions (exactly the thing we’re trying to prevent). There is a nod to what we’re about to talk about in the docs:
This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in a get_or_create call (see unique orunique_together), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.
Let’s talk about this in more detail. Here’s the relevant bit of Dango’s implementation of get_or_create
:
lookup, params = self._extract_model_params(defaults, **kwargs)
try:
return self.get(**lookup), False
except self.model.DoesNotExist:
return self._create_object_from_params(lookup, params)
This does exactly what the name implies! It attempts to do a lookup based on the filter args that are passed in (explicitly doing a .get()
, which fails with DoesNotExist
if there is no match in the database), and then catching that DoesNotExist
exception, and creates the object instead.
However, if you go further into _create_object_from_params
, you’ll notice that it does a lot more than just make a call to .create()
. Here’s what happens there (still in Django source code):
try:
with transaction.atomic(using=self.db):
obj = self.create(**params)
return obj, True
except IntegrityError:
exc_info = sys.exc_info()
try:
return self.get(**lookup), False
except self.model.DoesNotExist:
pass
six.reraise(*exc_info)
This is cool — it’s explicitly accounting for race conditions! It tries to create the object, but if that operation throws an IntegrityError
, it does the lookup again and tries to return what it finds.
The problem is this: if you hit this part of the code in one thread (meaning the lookup has already taken place and not returned anything) on an object that does not have a uniqueness constraint on the attributes you’re doing the lookup based on, if one is created in another thread, the creation in this thread will not throw an IntegrityError
, and you’ll end up with two! This may be fine — for now. After all, your call to get_or_create
returned an instance matching your parameters, and so did the call in the other thread, and both will carry on their merry way.
The problem arises next time you try to retrieve the object using the same lookup params with get_or_create
.
Because you now have two objects in your database, when get_or_create
tries its .get()
, (note, not .filter()
), you’ll get a MultipleObjectsReturned
error…but get_or_create
only catches a DoesNotExist
exception. This means that unless we do additional exception handling on our own (which we shouldn’t have to, in this case!), the user will see an error.
The moral of the story? Don’t use get_or_create
on objects that don’t have uniqueness constraints on the attributes you’re doing the lookup based on, at the database level.
The inverse is also true: if you have a model with a uniqueness constraint, using the built-in get_or_create
method is preferable to trying to build your own, since you likely won’t handle the race condition caused by multiple threads attempting this at the same time.
Updated 6/24/2020