Accents in HTML output are not rendered
[I would post this to a mailing list, if there was one, instead of to the bug tracker.]
When rendering HTML, is there a way to have LaTeX accents, etc. be turned into the proper character? I'm talking about mapping \'{e} to é, and so on.
The attached script has an example of what I'm referring to. Currently the output is
{G}abriel {G}arc'{\i}a {M}arquez. <em>Cien a\~{n}os de soledad</em>. Marx bros., 2000.
But this should be something along the lines of
Gabriel García Marquez <em>Cien años de soledad</em>. Marx bros., 2000.
Comments (11)
-
-
If you agree with using latexcodec package, I can make the PR.
-
Issue
#79was marked as a duplicate of this issue. -
Yes, please go ahead with the PR. There have been plans to use latexcodec since long ago.
Some thoughts:
- latexcoded should not be used to decode the whole .bib file, because not the whole file is LaTeX. Only field values should be decoded.
- latexcodec should not be used with the BibTeX engine, only with pythonic styles. The BibTeX engine is LaTeX-only anyway, so it is better to output LaTeX markup as it is.
Given that, the best place to decode LaTeX is probably in
pybtex.style.template.field
(it is used by pythonic styles to access entry fields). -
Maybe modifying
format_str
for each backend is better? Then the internals are always maintained in LaTeX codecs and only the output is "decoded" according to the backend. The reason is as follows. If the backend islatex
, nothing will change (no decoding). In this case, the output is legal LaTeX, which is supposed to be decoded when translated to human readable forms by the TeX program. If the backend is notlatex
, the process is similar to thelatex
case: thelatex
output is produced first, then a translation program (latex
to a human readable form), which is theformat_str
function, will translate it to Unicode characters. What do you think? -
Even the LaTeX backend would benefit from decoding LaTeX to Unicode. For example, the name
\'Evariste Galois
should be abbreviated as\'E. Galois
, but pythonic styles are currently unable to do that, because they are markup-agnostic by design and do not know how to process LaTeX commands, like\'
. But the decoded name,Évariste Galois
, can be abbreviated correctly, becauseÉ
is just a normal Unicode character.The idea is that pythonic styles work only with Unicode and rich text, without having to deal with the markup. The markup is converted to Unicode and rich text before being passed to the style the style, then the rich text returned by the style is converted to markup again. Letting the markup inside the styles would just complicate things too much.
-
Besides the
field
function, we also need to alter thenames
function. However, changing persons names, by alteringpersons[*].{last,first,middle}_names
in thenames
function does not have any effect. Maybe it's becauseperson.text
has already been determined before entering this function... Do you know how I can change the names? Thanks! -
Yes, you are right about
names
. Alteringperson.*_names
won't not work becauseperson.text
is assigned bypybtex.style.formatting.BaseStyle.format_entries()
before the style code is called. This is confusing and feels just wrong. I'll think how to rewrite it in a cleaner way. -
Any updates ? :)
-
OK, here comes Plan B. I've finally merged the latex-braces branch, and we have the Text.from_latex() method now. It is used by the rest of the code to convert both fields and person names to rich text, so you can just plug in the latexcodec, and it should work.
-
- changed status to resolved
- Log in to comment
I found this solution. Maybe pybtex should incorporate it?