Problems with spaces and hyphens in file paths

Issue #155 resolved
Al Kalter
created an issue

When running in a Linux environment from the command line, with both target and source file paths inside quotation marks, JHOVE2 has major problems with spaces in the path or file name.

If there is a space in the target pathname, the results will be written into a file with the name ending with the space is (e.g., -o "/tmp/my results.xml" will put the results in /tmp/my).

A space by itself in the source file pathname is OK, but if that space is followed by a hyphen, JHOVE2 will not process the file, interpreting that hyphen as a new command line argument, even though it's inside the quotes. So if the source file path is "c:\temp\MyFiles - Copy\stuff.txt" (the default name that Windows Explorer would assign after a copy-and-paste of a folder called MyFiles), JHOVE2 will error out with the following message: Unknown option - ' - '. It doesn't matter whether there is a space or a character after the hyphen; as long as the string space-hyphen appears, even, as I said, inside a fully quoted pathname string, JHOVE2 will fail and no results file is created.

Comments (8)

  1. Richard Anderson

    At first it was assumed this behavior might be caused by a bug in the jargs.gnu.CmdLineParser library that is used to parse the command-line arguments. But my experimentation using the Eclipse debugger showed that the arguments passed to this library are already structured into an array of strings at the point where the Java application's main method is called. The jargs parser does not split the option values any further using whitespace delimiters.

    By further reseach I discovered that splitting of command line arguments at whitespace boundaries is due to incorrect use of the $@ variable in the shell scripts (such as jhove2.sh) used to invoke the JHOVE2 application.

    This can be demonstrated by using a couple of shell scripts where one script calls another script.

    A script named list-args is used to print out the array of arguments that the shell passes to it:

    #!/bin/sh
    # list-args
    for arg; do
      echo $arg
    done
    
    ./list-args 1 2 "3 4"
    1
    2
    3 4
    

    Another script named wrapper-a is used to wrap the above script and forward the supplied arguments to it:

    #!/bin/sh
    # wrapper-a 
    ./list-args $@
    
    ./wrapper-a 1 2 "3 4"
    1
    2
    3
    4
    

    As you can see $@ as used above 'flattens' the argument array, removing the effect of the quotation marks. This is the same behavior as $*.

    This behavior can be fixed for Unix/linux shell scripts by using "$@" instead of $@, as illustrated in a script named wrapper-b

    #!/bin/sh
    # wrapper-b
    ./list-args "$@"
    
    ./wrapper-b 1 2 "3 4"
    1
    2
    3 4
    

    Here are some links to references that elaborate on the correct use of "$@" in scripts:

    Note that some of the above articles recommend a more arcane syntax ${1:+"$@"} (with or without the colon), but the need for that form has been eliminated in most operating system variants.

  2. Richard Anderson

    There does not seem to be a need for changes to the Windows command script files

    >type list-args.cmd
    @echo off
    echo %1
    echo %2
    echo %3
     
    >list-args 1 2 "3 4"
    1
    2
    "3 4"
     
    >type wrapper.cmd
    @echo off
    list-args %*
    
    >wrapper 1 2 "3 4"
    1
    2
    "3 4"
    
  3. Log in to comment