Accents in HTML output are not rendered

Issue #58 resolved
Former user created an issue

[I would post this to a mailing list, if there was one, instead of to the bug tracker.]

When rendering HTML, is there a way to have LaTeX accents, etc. be turned into the proper character? I'm talking about mapping \'{e} to é, and so on.

The attached script has an example of what I'm referring to. Currently the output is

{G}abriel {G}arc'{\i}a {M}arquez. <em>Cien a\~{n}os de soledad</em>. Marx bros., 2000.

But this should be something along the lines of

Gabriel García Marquez <em>Cien años de soledad</em>. Marx bros., 2000.

Comments (11)

  1. Andrey Golovizin

    Yes, please go ahead with the PR. There have been plans to use latexcodec since long ago.

    Some thoughts:

    • latexcoded should not be used to decode the whole .bib file, because not the whole file is LaTeX. Only field values should be decoded.
    • latexcodec should not be used with the BibTeX engine, only with pythonic styles. The BibTeX engine is LaTeX-only anyway, so it is better to output LaTeX markup as it is.

    Given that, the best place to decode LaTeX is probably in pybtex.style.template.field (it is used by pythonic styles to access entry fields).

  2. Hong

    Maybe modifying format_str for each backend is better? Then the internals are always maintained in LaTeX codecs and only the output is "decoded" according to the backend. The reason is as follows. If the backend is latex, nothing will change (no decoding). In this case, the output is legal LaTeX, which is supposed to be decoded when translated to human readable forms by the TeX program. If the backend is not latex, the process is similar to the latex case: the latex output is produced first, then a translation program (latex to a human readable form), which is the format_str function, will translate it to Unicode characters. What do you think?

  3. Andrey Golovizin

    Even the LaTeX backend would benefit from decoding LaTeX to Unicode. For example, the name \'Evariste Galois should be abbreviated as \'E. Galois, but pythonic styles are currently unable to do that, because they are markup-agnostic by design and do not know how to process LaTeX commands, like \'. But the decoded name, Évariste Galois, can be abbreviated correctly, because É is just a normal Unicode character.

    The idea is that pythonic styles work only with Unicode and rich text, without having to deal with the markup. The markup is converted to Unicode and rich text before being passed to the style the style, then the rich text returned by the style is converted to markup again. Letting the markup inside the styles would just complicate things too much.

  4. Hong

    Besides the field function, we also need to alter the names function. However, changing persons names, by altering persons[*].{last,first,middle}_names in the names function does not have any effect. Maybe it's because person.text has already been determined before entering this function... Do you know how I can change the names? Thanks!

  5. Andrey Golovizin

    Yes, you are right about names. Altering person.*_names won't not work because person.text is assigned by pybtex.style.formatting.BaseStyle.format_entries() before the style code is called. This is confusing and feels just wrong. I'll think how to rewrite it in a cleaner way.

  6. Andrey Golovizin

    OK, here comes Plan B. I've finally merged the latex-braces branch, and we have the Text.from_latex() method now. It is used by the rest of the code to convert both fields and person names to rich text, so you can just plug in the latexcodec, and it should work.

  7. Log in to comment