Issue #28 resolved

IOError on reading obsolete "previous msgid" entries

David Planella
created an issue

I was trying to load a PO file from GNOME at (attached), and I got the following error:



dpm@el-far:~$ ipython In [1]: import polib

In [2]: po = polib.pofile('/home/dpm/')

IOError Traceback (most recent call last)

/home/dpm/<ipython console> in <module>()

/home/dpm/ in pofile(pofile, kwargs) 100 file (optional, default: False). 101 """ --> 102 return _pofile_or_mofile(pofile, 'pofile', kwargs) 103 104 # }}}

/home/dpm/ in _pofile_or_mofile(f, type, **kwargs) 71 check_for_duplicates=kwargs.get('check_for_duplicates', False) 72 ) ---> 73 instance = parser.parse() 74 instance.wrapwidth = kwargs.get('wrapwidth', 78) 75 return instance

/home/dpm/ in parse(self) 1261 else: 1262 raise IOError('Syntax error in po file %s (line %s)' % \ -> 1263 (self.instance.fpath, i)) 1264 1265 if self.current_entry:

IOError: Syntax error in po file /home/dpm/ (line 23110)


It seems polib is crashing on the following entry, in particular at the #~| msgid "" line:


, fuzzy

~| msgid ""

~| "Error on %s\n"

~| "%s"

~ msgid ""

~ "Error on %s: %s\n"

~ "%s"

~ msgstr ""

~ "S'ha produït un error en %s:\n"

~ "%s"


Looking at other files on the GNOME l10n site, I can see more instances of {{{#~|}}}. These seem to be generated automatically by a gettext tool (probably msgmerge) when marking "previous msgid" fuzzy entries as obsolete.

Looking at says nothing on the format of obsolete entries, so I understand that the docs leave a wee bit too much room for guessing in the implementation of a parser.

In any case, if it's generated by a gettext tool, it would be good if polib would either ignore or treat #~| instances as obsolete entries instead of raising an exception.


Comments (7)

  1. David Planella reporter

    On the other hand, it seems that msgmerge or whatever generates the #~| entries tends to create mismatched msgids without msgstr. I've observed this on the Catalan (above), Spanish and Simplified Chinese versions of that same PO file:


    #~| msgid "Attendees"
    #~ msgid "Attendee_s"
    #~ msgstr "Participante_s"

    Simplified Chinese:

    #~| msgid "An error occurred while printing"
    #~ msgid "An error occurred while sending."
    #~ msgstr "发送时出现了一个错误。"

    So I'm wondering whether to simplify things obsolete previous msgid entries (i.e. those starting with #~|) should be ignored altogether.

  2. David Jean Louis repo owner
    • changed status to open

    Hi David !

    Sorry for the delay...

    I'm not sure either what's the best solution, ignoring seems the easiest thing to do, would you mind working on a patch for this ? If not, no problem, but it may take some time till I work on it.


    -- David

  3. Log in to comment