Issue #6 resolved

truncating of metadata from a word document

Wayne Glover
created an issue

attached is a file that shows the truncation of metadata from a Word document.

Myself and my python programmer looked through the hachoir code and could not see a reason for this truncation.

Thoughts are appreciated. Answers are adored ")

Comments (4)

  1. Robert Xiao

    Try the following code:

    for data in sorted(metadata):
        for item in data.values:
            print data.description, item.value

    The code which truncates the displayed text (item.text, instead of item.value) is this snippet in metadata_item.py:

    # Skip empty strings
    if isinstance(value, unicode):
        value = normalizeNewline(value)
        if config.MAX_STR_LENGTH \
        and config.MAX_STR_LENGTH < len(value):
            value = value[:config.MAX_STR_LENGTH] + "(...)"

    So if you want to use item.text but show more characters, you can increase MAX_STR_LENGTH in hachoir_metadata.config. But to get the whole string, use item.value as above.

  2. Wayne Glover reporter

    Hi nneonneo,

    thanks for your response. I'd like to add for anyone else, that the changes nneonneo refers to are in the latest version (1.2.2) of the parser which in from bitbucket. (I was using 1.2.1 and couldn't figure out why I couldn't find the config file. ").

    After finding the config file, I changed the max amount. I have not be able to test it yet; however, my programmer will soon. I understand how this will likely fix the problem for me.

    However, I have a question on the other part of your answer. As I understand your answer, using this new code this will give the entire value of the metadata element, regardless of size or the config setting, correct? Also, could you please clarify for me where the code you proposed above should go? The code i am referring to is listed below.

    for data in sorted(metadata): for item in data.values: print data.description, item.value

    Thanks for your help

  3. Victor Stinner repo owner

    Using config.MAX_STR_LENGTH=0 or config.MAX_STR_LENGTH=None, hachoir-metadata doesn't truncate item.text anymore.

    config.MAX_STR_LENGTH option was introduced by hachoir-medata.

    I think that this issue is not a bug, it's just a question of configuration in your program. You might use the mailing list to ask questions ;-)

  4. Anonymous

    Thanks Victor,

    With this information my programmer was able to get it working.

    Thanks for your time in helping me.

