I came across an interesting tweet by Juliano Rizzo.
The correct answer is that the statement is true if several (very unlikely to
happen in the real world) conditions are met. Let us take a look at why it
happens and what conditions have to be met for this to work.
The authentication key K can be of any length up to B, the block length of
the hash function. Applications that use keys longer than B bytes will first
hash the key using H and then use the resultant L byte string as the actual key
This means that for keys that are longer than the block size of the hash used
for the HMAC (>64 bytes in the case of SHA1), HMAC(key) == HMAC(HASH(key)).
>>>importhashlib>>>importhmac>>>key1=b"This is a very long key, a very very long key indeed. This key is absurdly long.">>>key2=hashlib.sha1(key1).digest()>>>hmac.new(key1,b"msg","sha1").digest)b'\xac\x87j&\xc6}\xa3\xc4\xf2$z\x06\x19\x87\\e\x81N\xcei'>>>hmac.new(key2,b"msg","sha1").digest)b'\xac\x87j&\xc6}\xa3\xc4\xf2$z\x06\x19\x87\\e\x81N\xcei'
2. PBKDF2 (and Scrypt)
PBKDF2 essentially boils down to applying HMAC a number of times to a key in a
loop. Thus, the property of HMAC mentioned in the previous section applies to
PBKDF2(as well as Scrypt, which uses PBKDF2 internally).
>>>importhashlib>>>key1=b"This is a very long key, a very very long key indeed. This key is absurdly long.">>>key2=hashlib.sha1(key1).digest()>>>hashlib.pbkdf2_hmac("sha1",key1,b"salt",1)b'\xf1\x18\xa4J]y\xf6\x85J\x8eq\xef\xea\x16>\x826+\x7f\xc8'>>>hashlib.pbkdf2_hmac("sha1",key2,b"salt",1)b'\xf1\x18\xa4J]y\xf6\x85J\x8eq\xef\xea\x16>\x826+\x7f\xc8'
So, how does this result in a vulnerability?
If Dropbox had decided to switch from SHA1 to PBKDF2_HMAC_SHA1 instead of
Bcrypt, any attacker that manage to obtain dumps of the SHA1 hashed password
and the PBKDF2_HMAC_SHA1 hashed passwords can authenticate as any users
that a. have passwords longer than 64 bytes and b. reused the same password in
the switch without cracking the password hash.
So why isn’t this likely to be a problem? There are a number of unlikely
conditions that have to be fulfilled first.
The attacker has to have access to both the SHA1 and the
PBKDF2_HMAC_SHA1 password dumps. This would require Dropbox to keep the old
SHA1 hashed passwords around together with the new PBKDF2_HMAC_SHA1 hashed
passwords or that the attacker managed to dump the database both before and
after the algorithm switch.
Users have to use passwords longer than 64 bytes. This is very uncommon
even for users who use password managers or passphrases.
Users have to reuse the same password before and after the algorithm
switch. This is especially unlikely to happen because users who are
security-consious enough to use passwords longer than 64 bytes most likely
will not reuse passwords.
So there you have it, a very unlikely set of circumstances that if fulfilled
can potentially result in a very interesting vulnerability.
I came across a blog post titled “PBKDF2: performance matters”
where the author discusses how most implementations of PBKDF2 are slower than
it otherwise could be.
After reading the blog post, I decided to write some Python bindings to
see how much of a performance increase I can obtain over the standard library’s
hashlib.pbkdf2_hmac implementation. My goal is a library with an interface
that is compatible with hashlib.pbkdf2_hmac.
The results are surprisingly good. With a basic benchmarking script
on CPython 3.4.1, my implementation is about 3 times as fast as the standard
100 loops, best of 3: 60.2 msec per loop
100 loops, best of 3: 20.3 msec per loop
With PyPy 2.6.0, the results are even better.
100 loops, best of 3: 242 msec per loop
100 loops, best of 3: 19.2 msec per loop
I have since release my library as a PyPI package and the code is available on
Simply install the package with pip,
pip install fastpbkdf2
and import the function
The interface is exactly the same as hashlib.pbkdf2_hmac and should be a
Cryptography libraries often have complicated APIs with many different options
to tweak. It is a goal PyCA’s cryptography library to provide safe and easy
to use APIs for common cryptographic tasks. To that end, the cryptography
package has a Fernet recipe for symmetric encryption derived from the
original Ruby implementation and specification. However, the
Fernet recipe lacks the ability to authentiate (without encrypting) arbitrary
To make up for that use case not being covered by Fernet, I have written and
released on PyPI a library called aead. It can be installed with pip.
$ pip install aead
The aead library is based on a IETF Internet Draft from David
McGrew. It is essentially AES_128_CBC and HMAC_SHA_256 composed with an
encrypt-then-mac construction. It relies on the cryptography library for
the cryptographic primitives.
It has a simple to use API heavily inspired by the Fernet recipe in the
The module contains a single class that can be imported.
The class takes requires an encryption key to be initialized. The key has to be
32 bytes long and encoded with base64url as specified in RFC 4648. The library
provides a classmethod to generate a suitable random key.
After initializing the object, encryption can be done by calling the
.encrypt() method. The .encrypt() method takes two paremeters, the first
being the data you want to encrypt and the second being associated data that
you want to authenticate but not encrypt. The second parameter is optional and
can be left out if there isn’t any data to authenticate.
.encrypt() returns base64url encoded cipher text.
Decrypting any data encrypted with aead is similar. Simply call .decrypt()
in place of .encrypt(). The .decrypt() method takes two parameters, the
first being the cipher text that needs decrypting and the second being the
associated data that was authenticated.
If the cipher text is corrupted or the associated data provided during the
decryption process does not match the associated data provided during
encryption, a ValueError is raised, otherwise the decrypted plain text is
The repository for aead can be found on GitHub and the
README.md file in the repository should be treated as the source of truth
if any information there differs from this blog post due to changes over time.
For Python programmers, downloading Python packages from PyPI, the Python
Package Index, is second nature. Tools like pip and conventions like the
requirements.txt file that most Python projects follow provides a consistent
way of specifying project dependencies.
However, installing random packages from PyPI is actually very dangerous,
a fact that not many people are aware of. There are a few factors that
contribute to this.
Python packages can execute arbitrary Python code during the installation
PyPI packages are not moderated. Unlike the package managers used in Linux
distros, anyone can register an account and upload Python packages without
going through a review process. While this is one factor contributing to
PyPI’s success as a package repository, you will have to trust the maintainer
of the package that the package is safe.
As a proof of concept, I have written a setup.py file that connects to a
Metasploit listener and downloads a Meterpreter shell during installation.
This demonstrates that it is trivial for someone to execute arbitrary code
on a machine through the installation of a Python package. You can obtain the
code from GitHub.
Run the Metasploit listener.
msf > use exploit/multi/handler
msf exploit(handler) > set payload python/meterpreter/reverse_tcp
msf exploit(handler) > set LHOST 127.0.0.1
Finally, run the setup.py file.
python setup.py install
You should obtain a Meterpreter shell with the same privileges that you ran
the setup.py script with.
While my example involves connecting to a Metasploit listener on localhost,
the same attack can be extended to install malware from remote systems or do
almost anything a Python script can do.
The problematic thing about this attack is that there are valid reasons for
Python packages to execute code during installation. This ranges from things
like OS version checks to compiling C code for packages that rely on C
extensions. Restricting setuptools to a subset of Python during installation
isn’t exactly foolproof as demonstrated by the numerous Python sandbox escape
techniques. Moderating PyPI isn’t a solution either as that will greatly
diminish PyPI’s attractiveness as a package repository.
Here are two recommendations to limit the potential of such attacks.
NEVER install Python packages as root. This limits the privileges an
attacker has if the attack succeeds. virtualenv is incredibly useful for
If you are in an organization with larger resources, audit the
third-party packages you depend on. Mirror trusted packages on an internal
devpi server instead of installing packages
directly from PyPI.
While I am limiting the details in this post to Python packages as that is the
ecosystem I am most familiar with, I believe that this issue also extends to
other languages and ecosystems such as Ruby and the gems ecosystem. While there
has been an increased focus over the years on paying attention to good security
practices when writing code, many forget about third-party code. This worries
me because third-party code represents such a large attack surface open to
exploits. As we all know, security is only as strong as the weakest link.
A common scenario in web applications involve using a single password as a
means of authentication as well as a means to derive a secret for use in
Many strong key derivation functions like pbkdf2, bcrypt or scrypt have
properties that make them strong password hashing functions as well. However,
the same derived value cannot be used as an encryption key and a password hash.
The password hash value has to be stored by the server to compare against the
provided password in future authentication attempts. If this same value is
used as an encryption key, an attacker that compromises the server will be
able to decrypt the data easily.
There is an easy solution for this problem. While I will be using pbkdf2 as
an example, pbkdf2 can be substituted for any strong algorithm like bcrypt
Generate a random key k using a cryptographically secure random number
generator. This means using CryptGenRandom on Windows and /dev/urandom on
*nix operating systems. This random key k will be used for encryption.
Generate two salts s1 and s2 and store them in plaintext.
Compute pbkdf2(password, s1) and store this value. This will be the
password hash you use to compare against for future authentication attempts.
Compute pbkdf2(password, s2) xor k and store this value.
When the random key k is required for encrypting or decrypting data,
simply xor the value of pbkdf2(password, s2) against the value computed in
The advantage of this scheme is that the encryption key k is not tied to the
password. This means that passwords can be changed without re-encrypting
the data with a new key repeating steps 1 - 4. A very useful property to have
in the event of a server compromise where passwords have to be reset en masse.