IOError on reading obsolete "previous msgid" entries

David Planella avatarDavid Planella created an issue

I was trying to load a PO file from GNOME at (attached), and I got the following error:

dpm@el-far:~$ ipython
In [1]: import polib

In [2]: po = polib.pofile('/home/dpm/')
IOError                                   Traceback (most recent call last)

/home/dpm/<ipython console> in <module>()

/home/dpm/ in pofile(pofile, **kwargs)
    100         file (optional, default: ``False``).
    101     """
--> 102     return _pofile_or_mofile(pofile, 'pofile', **kwargs)
    104 # }}}

/home/dpm/ in _pofile_or_mofile(f, type, **kwargs)
     71         check_for_duplicates=kwargs.get('check_for_duplicates', False)
     72     )
---> 73     instance = parser.parse()
     74     instance.wrapwidth = kwargs.get('wrapwidth', 78)
     75     return instance

/home/dpm/ in parse(self)
   1261             else:
   1262                 raise IOError('Syntax error in po file %s (line %s)' % \
-> 1263                               (self.instance.fpath, i))
   1265         if self.current_entry:

IOError: Syntax error in po file /home/dpm/ (line 23110)

It seems polib is crashing on the following entry, in particular at the `#| msgid ""` line:

#, fuzzy
#~| msgid ""
#~| "Error on %s\n"
#~| "%s"
#~ msgid ""
#~ "Error on %s: %s\n"
#~ "%s"
#~ msgstr ""
#~ "S'ha produït un error en %s:\n"
#~ "%s"

Looking at other files on the GNOME l10n site, I can see more instances of #~|. These seem to be generated automatically by a gettext tool (probably msgmerge) when marking "previous msgid" fuzzy entries as obsolete.

Looking at says nothing on the format of obsolete entries, so I understand that the docs leave a wee bit too much room for guessing in the implementation of a parser.

In any case, if it's generated by a gettext tool, it would be good if polib would either ignore or treat `#|` instances as obsolete entries instead of raising an exception.


Comments (7)

  1. David Planella

    On the other hand, it seems that msgmerge or whatever generates the #~| entries tends to create mismatched msgids without msgstr. I've observed this on the Catalan (above), Spanish and Simplified Chinese versions of that same PO file:


    #~| msgid "Attendees"
    #~ msgid "Attendee_s"
    #~ msgstr "Participante_s"

    Simplified Chinese:

    #~| msgid "An error occurred while printing"
    #~ msgid "An error occurred while sending."
    #~ msgstr "发送时出现了一个错误。"

    So I'm wondering whether to simplify things obsolete previous msgid entries (i.e. those starting with #~|) should be ignored altogether.

  2. David Jean Louis
    • changed status to open

    Hi David !

    Sorry for the delay...

    I'm not sure either what's the best solution, ignoring seems the easiest thing to do, would you mind working on a patch for this ? If not, no problem, but it may take some time till I work on it.


    -- David

  3. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.