Issue #1 resolved
Joseph Turian
created an issue

The current code uses a base-16 encoding: {{{

!python

user.username = sha_constructor(user.email).hexdigest()[:30] }}}

This has 2^(16/8 * 30) = 2^60 possible values.

It is better to use a base-64 encoding because there are fewer hash collisions: {{{

!python

user.username = string.replace(base64.b64encode(sha_constructor(email).digest(), altchars="_@"), "=", "")[:30] }}}

This has 2^(64/8 * 30) = 2^240 possible values, a far lower risk of hash collision.

Here, I replace base64 characters + and / with _ and @. I also strip the final "=".

This code will only work in Django >= 1.2, because: "Changed in Django 1.2: Usernames may now contain @, +, . and - characters."

i.e. username character @ is not permitted in Django < 1.2.

Comments (1)

  1. Tino de Bruijn repo owner

    I just committed 333592b2ba83 with your advice, just in a slightly different implementation.

    I see it like this. Hex encoding stores 1 byte with 2 chars, base64 3 with 4 chars. In 30 chars we can resp. store 15 and 22.5 bytes of information. As sha1 only gives us 20, sha256 is used. So the current implementation should store as many bytes as possible, providing the least collisions.

  2. Log in to comment