A faster PBKDF2 for Python

I came across a blog post titled "PBKDF2: performance matters" where the author discusses how most implementations of PBKDF2 are slower than it otherwise could be.

After reading the blog post, I decided to write some Python bindings to see how much of a performance increase I can obtain over the standard library's hashlib.pbkdf2_hmac implementation. My goal is a library with an interface that is compatible with hashlib.pbkdf2_hmac.

The results are surprisingly good. With a basic benchmarking script on CPython 3.4.1, my implementation is about 3 times as fast as the standard library.

$ ./bench.sh
Benchmark hashlib...
100 loops, best of 3: 60.2 msec per loop
Benchmark fastpbkdf2...
100 loops, best of 3: 20.3 msec per loop

With PyPy 2.6.0, the results are even better.

$ ./bench.sh
Benchmark hashlib...
100 loops, best of 3: 242 msec per loop
Benchmark fastpbkdf2...
100 loops, best of 3: 19.2 msec per loop

I have since release my library as a PyPI package and the code is available on GitHub.

Simply install the package with pip,

pip install fastpbkdf2

and import the function

from fastpbkdf2 import pbkdf2_hmac

The interface is exactly the same as hashlib.pbkdf2_hmac and should be a drop-in replacement.

Introducing python-aead

Cryptography libraries often have complicated APIs with many different options to tweak. It is a goal PyCA's cryptography library to provide safe and easy to use APIs for common cryptographic tasks. To that end, the cryptography package has a Fernet recipe for symmetric encryption derived from the original Ruby implementation and specification. However, the Fernet recipe lacks the ability to authentiate (without encrypting) arbitrary data.

To make up for that use case not being covered by Fernet, I have written and released on PyPI a library called aead. It can be installed with pip.

$ pip install aead

The aead library is based on a IETF Internet Draft from David McGrew. It is essentially AES_128_CBC and HMAC_SHA_256 composed with an encrypt-then-mac construction. It relies on the cryptography library for the cryptographic primitives.

It has a simple to use API heavily inspired by the Fernet recipe in the cryptography library.

The module contains a single class that can be imported.

from aead import AEAD

The class takes requires an encryption key to be initialized. The key has to be 32 bytes long and encoded with base64url as specified in RFC 4648. The library provides a classmethod to generate a suitable random key.

cryptor = AEAD(AEAD.generate_key())

After initializing the object, encryption can be done by calling the .encrypt() method. The .encrypt() method takes two paremeters, the first being the data you want to encrypt and the second being associated data that you want to authenticate but not encrypt. The second parameter is optional and can be left out if there isn't any data to authenticate.

ct = cryptor.encrypt(b"Hello, World!", b"Additional Data")

.encrypt() returns base64url encoded cipher text.

Decrypting any data encrypted with aead is similar. Simply call .decrypt() in place of .encrypt(). The .decrypt() method takes two parameters, the first being the cipher text that needs decrypting and the second being the associated data that was authenticated.

If the cipher text is corrupted or the associated data provided during the decryption process does not match the associated data provided during encryption, a ValueError is raised, otherwise the decrypted plain text is returned.

The repository for aead can be found on GitHub and the README.md file in the repository should be treated as the source of truth if any information there differs from this blog post due to changes over time.

Look before you pip

For Python programmers, downloading Python packages from PyPI, the Python Package Index, is second nature. Tools like pip and conventions like the requirements.txt file that most Python projects follow provides a consistent way of specifying project dependencies.

However, installing random packages from PyPI is actually very dangerous, a fact that not many people are aware of. There are a few factors that contribute to this.

  1. Python packages can execute arbitrary Python code during the installation process.

  2. PyPI packages are not moderated. Unlike the package managers used in Linux distros, anyone can register an account and upload Python packages without going through a review process. While this is one factor contributing to PyPI's success as a package repository, you will have to trust the maintainer of the package that the package is safe.

As a proof of concept, I have written a setup.py file that connects to a Metasploit listener and downloads a Meterpreter shell during installation. This demonstrates that it is trivial for someone to execute arbitrary code on a machine through the installation of a Python package. You can obtain the code from GitHub.

Run the Metasploit listener.

msf > use exploit/multi/handler
msf exploit(handler) > set payload python/meterpreter/reverse_tcp
msf exploit(handler) > set LHOST

Finally, run the setup.py file.

python setup.py install

You should obtain a Meterpreter shell with the same privileges that you ran the setup.py script with.

While my example involves connecting to a Metasploit listener on localhost, the same attack can be extended to install malware from remote systems or do almost anything a Python script can do.

The problematic thing about this attack is that there are valid reasons for Python packages to execute code during installation. This ranges from things like OS version checks to compiling C code for packages that rely on C extensions. Restricting setuptools to a subset of Python during installation isn't exactly foolproof as demonstrated by the numerous Python sandbox escape techniques. Moderating PyPI isn't a solution either as that will greatly diminish PyPI's attractiveness as a package repository.

Here are two recommendations to limit the potential of such attacks.

  1. NEVER install Python packages as root. This limits the privileges an attacker has if the attack succeeds. virtualenv is incredibly useful for this.

  2. If you are in an organization with larger resources, audit the third-party packages you depend on. Mirror trusted packages on an internal devpi server instead of installing packages directly from PyPI.

While I am limiting the details in this post to Python packages as that is the ecosystem I am most familiar with, I believe that this issue also extends to other languages and ecosystems such as Ruby and the gems ecosystem. While there has been an increased focus over the years on paying attention to good security practices when writing code, many forget about third-party code. This worries me because third-party code represents such a large attack surface open to exploits. As we all know, security is only as strong as the weakest link.

Using a single password for Authentication and Encryption

A common scenario in web applications involve using a single password as a means of authentication as well as a means to derive a secret for use in encrypting data.

Many strong key derivation functions like pbkdf2, bcrypt or scrypt have properties that make them strong password hashing functions as well. However, the same derived value cannot be used as an encryption key and a password hash. The password hash value has to be stored by the server to compare against the provided password in future authentication attempts. If this same value is used as an encryption key, an attacker that compromises the server will be able to decrypt the data easily.

There is an easy solution for this problem. While I will be using pbkdf2 as an example, pbkdf2 can be substituted for any strong algorithm like bcrypt or scrypt.

  1. Generate a random key k using a cryptographically secure random number generator. This means using CryptGenRandom on Windows and /dev/urandom on *nix operating systems. This random key k will be used for encryption.

  2. Generate two salts s1 and s2 and store them in plaintext.

  3. Compute pbkdf2(password, s1) and store this value. This will be the password hash you use to compare against for future authentication attempts.

  4. Compute pbkdf2(password, s2) xor k and store this value.

  5. When the random key k is required for encrypting or decrypting data, simply xor the value of pbkdf2(password, s2) against the value computed in step 4.

The advantage of this scheme is that the encryption key k is not tied to the password. This means that passwords can be changed without re-encrypting the data with a new key repeating steps 1 - 4. A very useful property to have in the event of a server compromise where passwords have to be reset en masse.