bcrypt_sha256 is vulnerable to breach correlation attacks

Issue #114 resolved
Aaron Toponce created an issue

According to https://passlib.readthedocs.io/en/stable/lib/passlib.hash.bcrypt_sha256.html#algorithm, to address the 72 byte limit of bcrypt, the algorithm executes bcrypt(base64(sha-256(utf-8(password)))). Unfortunately, because the SHA-256 prehash is unsalted, the resulting bcrypt digest is vulnerable to breach correlation attacks.

To demonstrate this, suppose password breach 1 hashes passwords with bcrypt(sha-256(password)), and breaches 2 and 3 just hashed passwords with sha-256(password). All of the hashes from breaches 2 and 3 can be fed into bcrypt from breach 1 as bcrypt(hash). Any successful hits, means the password cracker only need find the password from breaches 2 and 3, which means focusing on SHA-256 rather than spend the energy on bcrypt from breach 1. Basically, bcrypt was just removed from the workload.

This isn’t a theoretical attack. It’s used in practice right now by hobbyist and professional password crackers to make their workload as efficient as possible. A full presentation of breach correlation attacks can be found at BSides Vegas, 2018: https://www.youtube.com/watch?v=5su3_Py8iMQ.

The way to plug this vulnerability is to salt the prehash. This way, no correlated hits from breaches 2 and 3 will be successful in breach 1. This means that to discover the plaintext password in breach 1, the password cracker must focus their energy on cracking bcrypt directly.

The prehash salt doesn’t have to be unique per user, it doesn’t have to be random or unpredictable, and it doesn’t have to be updated per password change. The prehash salt can be a static pepper, such as the word “bcrypt”, your company name, or anything else. It need only change the resulting prehash, such that it won’t be found in other breaches.

One approach could be to bcrypt(base64(sha-256(sha-256(“bcrypt”)||password)), cost, salt), while another would be to use an HMAC such as bcrypt(base64(sha-256-hmac(password, “bcrypt”)), cost, salt).

This was recently brought up in the following Twitter thread (it goes deep, with many subthreads): https://twitter.com/AaronToponce/status/1224341295188983809

Comments (9)

  1. Dmitry Chestnykh

    The second approach, with password as a key to HMAC, is not ideal due to the fact that passwords longer than 64 (for SHA-256) would result in two valid passwords — see https://twitter.com/dchest/status/421595430539894784?s=20 thread. Instead, the domain separation value, “bcrypt”, should be the HMAC key and the password should be the data. This is the approach taken by yescrypt (https://www.openwall.com/yescrypt/) and discussed during the Password Hashing Competition.

  2. Aaron Toponce reporter

    @Dmitry Chestnykh Thanks. I was ambiguous in my arguments. I generally think of cryptographic tuples as (payload, key), but I should have been pedantic about it in this bug report.

    It’s probably worth mentioning also that the domain-specific pepper should probably be as unique as possible, and it’s not a bad idea to use a random meaningless string. The goal is to avoid correlating passwords between breaches, so simple strings like “bcrypt” or “passlib” might run you into trouble if a breach also salted their hashes with your pepper. Getting 16 bytes out of the system RNG will steer you clear from this potential headache.


    $ python3 -c 'import secrets; print(secrets.token_urlsafe(16));'

    This is an operational decision by the admin or developer, and outside of the control of passlib, but might be worth mentioning in the documentation.

  3. Eli Collins repo owner

    Ouch, that’s a nasty little attack. I definitely want to get a fix out for that soon. Great discussion … I like the bcrypt(base64(sha256_hmac(pepper, password)), cost, salt) construction.

    If the pepper doesn’t have to be secret, and would benefit from being varied, are there any downsides to just setting pepper = salt ? It’ll be known at validation time, and will add enough variability to really decrease the odds of a collision with another pepper.

    [Sidenote - I was initially thinking of doing something as simple as bcrypt(base64(sha256(sha256(salt) + password)), cost, salt) ; but there’s a distant chance that might collide with some plaintext… and HMAC is a good standard construction that gets rids of that].

  4. Eli Collins repo owner

    I’ve committed an update to bcrypt_sha256() that now uses bcrypt(hmac_sha256(salt, secret), salt, cost) by default. Will be in the 1.7.3 release, which should come in the next week unless notice anything after reviewing the code.

  5. Log in to comment