Unable to get the HTML file after conversion

Issue #2 open
Harminder singh created an issue

Hi, I am not able to get the HTML file after converting the media wiki xml file. only index.html is generated.

PFA source file and index.html file

Comments (10)

  1. Pascal Lehner repo owner

    Hi Can you provide the source file as well? I need to analyse what happened because your target file GBIF is completely empty..

  2. Pascal Lehner repo owner

    Hi, I just tried a convert job myself. Result isn't very nice but at least I get output. Can you tell me where this file comes from? Which MediaWiki Version is this?

  3. Pascal Lehner repo owner

    Hi Can I get some more information on your source file? I just tested my tool with the current MediaWiki version 1.22.6 and this works fine as well.

  4. Harminder singh reporter

    i took the sample text from http://biowikifarm.net/meta/Mediawiki_XML_page_importing

    and changed value in <text></text> tag

    <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.4/" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.4/ http://www.mediawiki.org/xml/export-0.4.xsd" version="0.4"
      xml:lang="en">
      <siteinfo>
       <!-- … the XML header from an arbitrary wiki page export -->
      </siteinfo>
    <page>
      <title>GBIF:cultivar</title>
      <revision>
        <contributor><username>User name</username><id>123</id></contributor>
        <text xml:space="preserve">{|
    |Orange
    |Apple
    |-
    |Bread
    |Pie
    |-
    |Butter
    |Ice cream 
    |}</text>
      </revision>
    </page>
    </mediawiki>
    
  5. Pascal Lehner repo owner

    Ah well, I think I see the issue: The converter basically just looks at everything within the <page> attribute and renders this information. Due to a special syntax my example database uses, stuff right of a pipe symbol is being ignored. Thus you get an empty output in the file that is being created. My wiki uses * to mark <li> items. Does this work?

  6. Harminder singh reporter

    i just tried the same input by replacing | with *. still i am not getting any output.

    <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.4/" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.4/ http://www.mediawiki.org/xml/export-0.4.xsd" version="0.4"
      xml:lang="en">
      <siteinfo>
       <!-- … the XML header from an arbitrary wiki page export -->
      </siteinfo>
    <page>
      <title>GBIF:cultivar</title>
      <revision>
        <contributor><username>User name</username><id>123</id></contributor>
        <text xml:space="preserve">
    *collection = GBIF
    *short URI = cultivar
    *full URI = http://vocabularies.gbif.org/services/gbif/taxon_rank/cultivar
    *label = cultivar
    *code = cultivar
    *see also = http://rs.gbif.org/vocabulary/gbif/rank.xml</text>
      </revision>
    </page>
    </mediawiki>
    

    Am i doing anything wrong?

  7. Pascal Lehner repo owner

    Hi, I am afraid the tool is quit picky because it has to handle quite a lot of special markup for the case I built it for. Could you try by adding a whitespace before the *?

    <text xml:space="preserve">
    * collection = GBIF
    * short URI = cultivar
    * full URI = http://vocabularies.gbif.org/services/gbif/taxon_rank/cultivar
    * label = cultivar
    * code = cultivar
    * see also = http://rs.gbif.org/vocabulary/gbif/rank.xml</text>
    

    I wrote a version two the last few days that enhances the capabilty for my specific case. I will merge these changes into this branch asap as well..

  8. Log in to comment