Clone wiki

AMI / Home


AMI is a software tool to extract and analyze the world's scientific facts in scholarly publications, theses and reports.

AMI Tutorial Oxford Launch

AMI will be used to power the Content Mine: see this video. Everyone is welcome, either to help develop/document software or extract and analyse content.

AMI consists of about eight projects, with production and development versions (*-dev). The first (XHTML2STM) is where users should start.

  • XHTML2STM toplevel package for converting PDF or HTML to Science Technical Medical (STM). It uses domain-specific visitors (e.g. chemistry, phylogenetics, metabolism).
  • SVG2XML. Converts SVG (usually from PDF2SVG) to XHTML and SVG/PNG.
  • SVG. SVG library (based on XOM)
  • SVGBuilder including tools for building higher level graphics primitives (squares, circles, arrows, etc.)
  • PDF2SVG. Parser/converter from PDF to SVG, including normalization to Unicode where possible.

The actual order of execution is , however, normally: PDF2SVG . Uses SVG library SVGBuilder creates higher level primitives as far as possible. AND SVG2XML which creates text. * XHTML2STM. Uses discipline specific plugins to create science (should be renamed "AMI"


  • CRAWLERREPO. Crawler for Open scientific publications. Repository using CKAN

Helper libraries:

  • HTML. HTML library (based on XOM) covering the commonest elements in XHTML required for representing static text and images.
  • EUCLID. Numeric and geometrical libraries, and also generic XML (STM primitives)
  • CMLXOM. Library for Chemical Markup Language (CML).
  • JUMBO. Tools for Chemistry in (CML).