Improve names in password hash interface

Issue #21 resolved
Eli Collins
repo owner created an issue

(Imported from Google Code)

Certain parts of the password hash interface could have been named better.

encrypt()

The name of the .encrypt() method has the implication that there might be a .decrypt() that recovers the password. This is of course not the case.

One alternative name, .encode(), suffers a similar problem. Another alternative, .digest(), might be better... that name matches the similarly-purposed method used by the python hash interface (though the call syntax is different... would that cause confusion?)

In any case, this rename could be done, though it would have to have a very slow deprecation cycle.

rounds

The name of standard parameter rounds reflects the unit of measurement, not the purpose: to adjust the time-cost of digesting a password. As well, it not technically correct for hashes which use a log2 cost parameter (eg bcrypt), as the actual number of iterations = 2**rounds.

Renaming it (and related informational attributes) to cost would perhaps clarify things... though before doing so the nature of scrypt's multiple parameters should be studied to make sure a longer name such as time_cost isn't required.

In any case, renaming this setting would be extremely disruptive and burdensome, particularly if the deprecation cycle was long (which it would need to be). So this will probably not be done any time soon (if ever).

Comments (12)

  1. Eli Collins reporter

    (Imported from Google Code)

    This is somewhat of a long-term ticket. It's main purpose is to let people know that I'm aware the naming could be improved... though I should probably spell out why nothing has happened yet.

    ---

    What's holding this issue back isn't so much the coding, as it is remaining doubts I have about the usefulness of the change, when compared to the maintenance and compatibility costs it would incur.

    I'm not normally uptight about API changes - I do it with barely a thought in some of my company's code, and I've been evolving Passlib slowly as well. But changing something this central in Passlib's API means pretty much all of it's userbase has to update their application code, and (for some of those applications) maintain backwards-compatibility with older versions of Passlib.

    To ease the transition, I'd want Passlib to support both old and new APIs for at least min(1.5 years, 2 minor releases). During that time, there would be the added complexity of duplicated code paths and unit tests, new code paths and tests devoted to handling mixed use of old and new APIs, and the potential for unforeseen new bugs.

    Compared to all of that, I'm not entirely sure making a few changes for semantic clarity is necessarily worth it. That said, I'm slowly improving bits of Passlib's API, leading up to a 2.0 release where I plan to strip out any deprecated APIs that are still lying around. If this API change gets made at all, it'll be in the 1.8 and 1.9 releases leading up to that.

  2. Scott Arciszewski

    What's holding this issue back isn't so much the coding, as it is remaining doubts I have about the usefulness of the change, when compared to the maintenance and compatibility costs it would incur.

    The public's confusion over "password encryption" is endemic and frustrating. The current choice of verb muddies the water and prevents them from understanding the difference between hashing and encryption.

    https://www.google.com/search?q=site%3Astackoverflow.com+decrypt+md5+password

  3. Eli Collins reporter

    You're right, the issue is distressingly endemic. Moreso than I thought when I created this issue. The naming CryptContext.encrypt() shows that I wasn't even thinking fully properly about it back when I wrote the API :)

    I've been contemplating some other breaking changes leading up to Passlib 2.0, cleaning up some cruft, and it's probably time this one got on the roadmap.

    After a little thought, I think 'm going to get the new names introduced in Passlib 1.8 (at the latest, I might have time to work it in as part of 1.7). That'll give 1.8 & 1.9 for the legacy names to remain deprecated, before removal in 2.0. That should give 2-3 years for folks to safely migrate to the new API

    I'm currently planing to go with renaming .encrypt() -> .digest(), unless a better name comes along.

    The other part of this issue (renaming rounds) may get punted until after I add scrypt & argon2 support, and have a better set of use-cases to consider.

  4. Scott Arciszewski

    I went with pwhash() in PR#3 for a couple of reasons:

    • Password hash, as a compound noun, is kind of a special case.
    • Libsodium already uses crypto_pwhash(), which is an acceptable abbreviation.

    Outside my usual security consulting, I'm trying to advocate for better diffusion of cryptology terminology in the open source community. https://paragonie.com/blog/2015/08/you-wouldnt-base64-a-password-cryptography-decoded lays out the definitions and how different concepts are related.

  5. Eli Collins reporter

    You're right, .pwhash() is an overall better name. I tried converting a bit of the passlib codebase to use .digest(), and it was just to generic (and confusing when mixed with hashlib code). Also, I think the bcrypt cffi library already uses this name as well. Moving this to definitely in the 1.7 release.

  6. Eli Collins reporter

    This is fixed as of rev 1f7421b35b75; the 1.7 release will uses .hash(), with .encrypt() as a deprecated alias.

    Targeting june/july 2016 for the release.

    (Punting on the second part of this issue, renaming rounds -- it's not nearly as confusing an issue, though may consider adding as a separate issue for the 1.8 release).

  7. Log in to comment