Encryption is a huge part of modern software engineering. This is true both of data within our own systems, but also encryption of data we send or receive with network requests, to ensure the security of data in flight and the validity of data received from other services.
Most modern languages have libraries or built-in functionality that can be used to implement encryption for us, so knowing the ins-and-outs in depth isn’t necessarily a prerequisite of working in parts of your system that use encryption. That said, it’s always good to know the basics of what the code we write actually does, so the goal of this article is to introduce concepts so you understand what the library code you’re using is doing under the hood.
For example, I wrote code that looks something like this, so wanted to know what all the pieces meant and were doing.
from Crypto import Random
from Crypto.Cipher import AES
from Crypto.Util.Padding import pad
def encrypt(value, key):
value = value.encode('ascii')
iv = Random.new().read(AES.block_size)
cipher = AES.new(key, AES.MODE_CBC, iv)
ciphertext = cipher.encrypt(pad(bytes(value, encoding='utf8'), size))
return iv + ciphertext
Let’s dive in!
AES is a block cipher, which just means it operates on blocks of text that are a fixed size - AES has specifically chosen a block size of 128 bits, or 16 bytes. However, it supports keys of either 128, 192, or 256 bits (16, 24, or 32 bytes respectively). The secret key is the piece of information shared with both parties - the ones encrypting and the ones decrypting.
Of course, not every string that we need to encode is cleanly divisible into 16-byte blocks — we’ll revisit this idea in a couple of the other pieces discussed below!
An initialization vector (or IV) are used to ensure that the same value encrypted multiple times, even with the same secret key, will not always result in the same encrypted value. This is an added security layer. If strings did always have the same result when encrypted, it would be easier for someone to figure out what the starting value was just through brute force trial and error.
In the example above, an iv
is created by generating a random string of the same length as the block size (so our iv
will be 16 bytes, or characters, as we mentioned above). This value is then passed in to the initialization of the cipher
so that it can be used by the library code when encrypting the value.
Other than the encryption key and the initialization vector, the other thing you’ll notice about the initialization of the cipher
, is that we’ve passed in a mode
. The mode
defines which algorithm is used to encrypt the data. Some provide a higher level of security/randomness than others, but the main thing here is to use the mode of encryption that will be used for decryption on the other side.
I found this article to be a useful primer in the different encryption modes. Different modes have different requirements for initialization vectors and padding of the data. We’re using CBC in our example, so your code may have some differences based on which mode you’re using.
When we actually call encrypt
on our cipher
(which has been initialized with the encryption key, encryption mode, and initialization vector), you’ll notice we’re also calling pad
on the value we’re encrypting first. This goes back to the concept of “block size” that we talked about earlier. Because AES is a block cipher that works on “blocks” of a predefined length, if the value we’re encrypting isn’t cleanly divisible by that length, it won’t work. Calling pad on the value adds empty bytes to the end of your string until it’s the correct number of bytes long. It returns a byte string,
The pad
function is built into the Python encryption library we’re using — you may need to use a different one or implement your own, depending on what language you’re using. It takes a byte string as input as well, which is why we’re calling bytes
on our string before passing it in.
You’ll notice the final element of our encryption logic is to return iv + ciphertext
. When we talked about initialization vectors at first, there’s an important piece we left out: the party on the other end who’s decrypting our value won’t be able to do that without knowing what the iv is. For this reason, we prepend it to our value. Because IVs are always the same length, before our value is decrypted, a string of that length can be stripped from the beginning, and we know that’s what was used as the initialization vector.
With that in mind, decryption of the value we encrypted above might look like this:
def decrypt(encrypted_value, key):
iv = encrypted_value[:AES.block_size]
cipher = AES.new(key, AES.MODE_CBC, iv)
return cipher.decrypt(encrypted_value[AES.block_size:])
Again, there’s still some magic going on in there (the library we’re using has encrypt
and decrypt
methods that actually do the magic!), but hopefully this clarifies how all the pieces tie together. Happy encrypting!