HtpasswdFile does not like additional fields

Issue #89 new
Mike Boyle created an issue

Apache and nginx actively support 2 or more colon-separated fields in an htpasswd file. But such files break in HtpasswdFile because of this exception. I explained this, and showed evidence from the apache and nginx sorce code, in this bug report on trac: https://trac.edgewall.org/ticket/12744#comment:2. (Sorry, nicely formatted links aren't working with that URL for some reason.)

You mentioned elsewhere that there isn't an official spec for the htpasswd format, which makes it hard to know just what makes an acceptable file. But the source code in both projects makes clear that this is not an accident; additional colon-separated fields are allowed. You could allow this by changing

if len(result) != 2:

to

if len(result) < 2:

I have not looked at the code for digest, but corresponding changes may be needed there, as Ryan J Ollos suggested on trac.

Comments (3)

  1. Eli Collins repo owner

    I'm very open to making Htpasswd & Htdigest more flexible; and I'm curious what you're using the additional fields for (use-cases always help me flesh out corner situations).

    Adding support for more fields is actually going to be a bit more complex than just that one line, since the Htpasswd class has to both read & write those lines, so I'll need to make some further changes so that it preserves existing data, and generates new rows as usefully as possible. Similar for Htdigest. I'll start looking into best approach when I have a chance.


    I want to make sure I move in the right direction though, and what Apache intended doesn't seem as clear cut me. While I agree that httpd's parser will allow user:hash:ignored lines, I'm not sure if I agree it's intentional:

    • mod_authn's parser uses ap_getword() rather than ap_getword_nulls(). Since ap_getword() skips repeated separators, this means lines like user:::hash::: will be read exactly the same as user:hash. This is of course a completely crazy input line, but mod_authn_file will accept it as valid. The htdigest section has a similar issue, compounded by the additional field.

    • the htpasswd tool isn't just a producer of these files, but (via the "-v" flag) is also a consumer. It's frequently used in this mode as an external verifier tool, e.g. by applications such Squid. In turn, the htpasswd -v parser uses a completely different method for parsing the file, one which will ONLY accept the user:hash format. In fact, if you pass it user:hash:etc, it will use hash:etc as the hash string, meaning passwords will never validate against such a file.

    Combining those points together makes me think mod_authn's code was written to be good enough to parse htpasswd's output, but with little concern outside of that. This would mean any behavior outside of that is just a unspecified side-effect, and potentially subject to break if someone rewrites the code.


    Thus why I'd like to know a bit more about your use-case for this behavior. If this change does get made, I'll need to make sure it fits in properly so that passlib doesn't silently accept malformed input files as valid (for those who use it for authentication), nor let passlib output things which could cause mod_authn to mistakenly grant access due to a corrupted file (for those who use it to update htpasswd files).

    Though this does seem doable, I like to take care when stepping outside of the de jure specifications, especially when the de facto specification is itself poorly defined.

  2. Mike Boyle reporter

    My use case is dokuwiki, which uses extra fields to record the user's real name, email, and groups (for authorization in addition to authentication). You can see the file format here: https://www.dokuwiki.org/auth:plain#file_format. Files in that format work with dokuwiki (obviously), apache, and nginx. Now, I don't want the hassle of trying to keep two copies of my user information in sync, so I want to use that same file with other apps — in this case trac, which is apparently considering using passlib.

    Googling around briefly, I find that at least a couple people in one ancient apache dev thread used additional fields. And the perl module for htpasswd explicitly refers to the third element of a split-by-colon as "info" (though it would just drop the 4th and higher). Here's some code from the fetchInfo subroutine, for example:

        while (<FH>) {
            chop;
            my @tmp = split ( /:/, $_, 3 );
            if ( $tmp[0] eq $Id ) {
                $info = $tmp[2];
                last;
            }
        }
        return $info;
    

    This is just meant as evidence that uses for (and users of) additional fields exist in the wild.

    I don't see your point about repeated separators; that's arguably a bug in httpd, but it doesn't affect the intent. For my argument, the only thing that matters is that mod_authn_file.c finds the password with this code:

    file_password = ap_getword(r->pool, &rpw, ':');
    

    On entering this line, &rpw is a pointer to the beginning of the password (more precisely, the first non-colon character following the username). This is explicitly asking for everything from that point to the following colon or the end of line. I don't care how many following colons ap_getword might gobble up, this establishes that at least one colon is explicitly allowed (and following content is ignored). That doesn't look accidental or incidental to me, so I would call that intentional support.

    As for the htpasswd command, I would just say that its behavior is clearly different from httpd's behavior (not to mention nginx). And given a tie, I would suggest breaking in favor of succeeding when several other tools succeed, as opposed to failing when one other tool fails.

  3. Log in to comment