more invalid repository names and proposed detection RE

Issue #148 resolved
Marcel Huber
created an issue

By further investigating and testing issues #142 and #144 I discovered more names which lead to unusable repositories link abc/

I decided to create a small python script to build a regular expression to detect malformed paths in the sense of detecting slash and dot problems. So the rule to detect malformedness I ended up with is: "(^|/)(.(.(/|$)|/|$)|/|$)" which is held in very basic RE syntax so it might be used on the client side (JavaScript) too. See attached script for the tests I made. By the way I think your false test of .scm/plugin is not correct as it represents a valid path.

Comments (8)

  1. DRayX

    Seems like it would be simpler to just disallow \ / : * ? " < > | . % ~ # % & { } + $
    Because the repository name is both a part of the file path, and part of the repository URL, characters that cause problems on any filesystem, or don't work well in URIs should probably be avoided. I also put $ on that list to avoid anything that expands to an environment variable such as $HOME.

  2. Marcel Huber reporter

    I think limiting to this set of characters is too rigid.

    My opinion is to not restrict things more than really necessary. If really wanted/needed, special characters could easily be encoded for URLs ( by scm-manager and represented to the user to copy/paste from the UI. Even I would tend to omit such characters as it might lead to troubles...

    I can see two different aspects of repository name checking:

    1. mandatory: it should be path compliant in the means as proposed by the regular expression above; no combination of / and . should be allowed which leads to a reserved filesystem path
    2. optional and user definable/customizable: disallowed characters for path-segments/repository-names

    I would let the user (admin) of the system decide what he wants to (dis-)allow. A reasonable preset would prevent abuse but still be flexible to customize.

    btw: I did not test the RE in the initial post for windows path style but it should not be a problem to extend. Here is the modified RE to also catch absolute windows paths and backslashes (escaping is always a mess...):


    ... and attached the modified python file containing tests for point 1.

  3. DRayX

    Try as an example creating a mercurial repository called "foo\bar". It properly creates the directory (at least on linux) at foo\bar, but the url doesn't work at all (even if you use percent encoding). I think any characters which could even potentially cause issues should be avoided. I suppose a blacklist of characters could be used, but if you remove certain characters from that path you could definitely run into problems.

  4. Marcel Huber reporter

    Good point.

    My tests using git showed that the client/server itself supports such paths and also cloning using either %5c or \\ as escaped path:

    ## initialize repo to test with
    $> git init --bare --shared=group win\\doofs
    m1huber@dt-tt-114280:~/tmp$ ls -la
    total 24
    drwxr-xr-x  4 m1huber users  4096 May 18 08:34 .
    drwx------ 39 m1huber users 12288 May 18 08:29 ..
    drwxrwsr-x  7 m1huber users  4096 May 18 08:34 win\doofs
    ## clone using URL escaping
    $> git clone git://dt-tt-114280/win%5cdoofs
    ## or using double backslash to not confuse the shell
    $> git clone git://dt-tt-114280/win\\doofs
    ## or finally using a path to not confuse us
    $> git clone git://dt-tt-114280/win\\doofs win_doofs
    ## directories created
    m1huber@dt-tt-114280:/tmp$ ls -la
    total 8
    drwxrwxrwt 15 root    root   320 May 18 08:43 .
    drwxr-xr-x 23 root    root  4096 May 14 16:25 ..
    drwxr-xr-x  3 m1huber users   60 May 18 08:36 win%5cdoofs
    drwxr-xr-x  3 m1huber users   80 May 18 08:45 win_doofs
    drwxr-xr-x  3 m1huber users   80 May 18 08:41 win\doofs

    Creating such a repository using scm-manager is possible and does not screw up things on the filesystem side:

    scm-manager ~$ ls -la repositories/git/
    total 28
    drwxr-x--- 7 scm-manager scm-manager 4096 May 18 08:28 .
    drwxr-x--- 5 scm-manager scm-manager 4096 May 10 10:28 ..
    drwxr-x--- 3 scm-manager scm-manager 4096 May 11 13:35 admin
    drwxr-xr-x 3 scm-manager scm-manager 4096 May 16 16:31 arch
    drwxr-x--- 7 scm-manager scm-manager 4096 May 16 08:26 Craaap
    drwxr-xr-x 7 scm-manager scm-manager 4096 May 18 08:28 win\doofs
    ## these two fail
    m1huber@dt-tt-114280:/tmp$ GIT_SSL_NO_VERIFY=true git clone https://m1huber@dt-tt-114280/git/win%5cdoofs blubby
    Cloning into 'blubby'...
    fatal: https://m1huber@dt-tt-114280/git/win\doofs/info/refs not found: did you run git update-server-info on the server?
    m1huber@dt-tt-114280:/tmp$ GIT_SSL_NO_VERIFY=true git clone https://m1huber@dt-tt-114280/git/win\\doofs
    Cloning into 'blubby'...
    fatal: https://m1huber@dt-tt-114280/git/win%5cdoofs/info/refs not found: did you run git update-server-info on the server?

    ...but can not be checked out again as showed?!?

    So the problematic component might then be the scm-manager itself as it seems to handle these paths inconsistently. A possible solution could be to always translate backslash path separators into forward separators - as windows is able to handle these too - or as @DRayX states to disallow them.

  5. Log in to comment