1. petermr
  2. AMI


Clone wiki

AMI / AMI_Tutorial

AMI Tutorial

  • AMI does not yet work with URLs but will RSN. Examples are provided, but download any additional files that you want to work with and put them in a directory (remember where it is, because you will need to tell AMI).
  • All exercises are from the commandline. By default this will be run from AMi's bin directory.

  • Suggest you work in pairs. The examples are biological, but don't worry. It may help slightly if one person is a bioscientist, so they can find examples.

  • find out whether your are running Windows or Mac/Unix. Also find out if you have a Java runtime. If not, find a partner who has. Windows people will use .zip, the others .tar.gz



$ https://bitbucket.org/petermr/xhtml2stm-dev/downloads

and download your file.

 $ cd /some/where/convenient/that/doesnt/matter e.g.:
 $ C:\Users\pm286\amitest You'll probably want to delete this after the exercise

Unzip/untar the file and navigate to

$ cd C:\Users\pm286\amitest\xhtml2stm-0.1.2-SNAPSHOT-bin\xhtml2stm-0.1-SNAPSHOT


$ ls OR dir should show bin/ exampleData/ and (ignore) repo/ 
$ cd exampleData
$ ls OR dir should show html/ and pdf/ and maybe more
$ cd pdf 
$ ls or dir should show
$    multiple-1471-2148-11-312.pdf
$    tree-1471-2148-11-313.pdf
$ cd ..
$ cd html
$ ls or dir should show pb1.html and multiple.312.html
$ cd ..
$ cd ..

$ cd bin

if you now list the files (dir or ls) you should see:

$ chem
$ chem.bat
$ ...
$ species
$ species.bat

we are going to use species and sequence for most of the exercises.


We suggest you run from within the bin/ directory. If you have extract files, put them in html/or pdf/


$ species

and you'll see the help. now type:

$ species -i ../exampleData/html/multiple.312.html -o output312.xml

This will create a new file in bin ,

UPDATE: The current version will create a directory:

$ ../exampleData/html/multiple.312/

with the results inside it.

$ output312.xml.

(If you are confident with filesystems we suggest you put it somewhere else to be tidy).

Now you can vary the options in ami:


$ species does Binomial names in italics (Genera coming soon)
$ sequence works on DNA-containing text (it will be extended to use RNA, proteins, soon)

Don't use these:

$ tree works on SVGs with phylogenetic trees (you will have one example)
$ chem doesn't work today but RSN will extract molecules from diagrams
$ regex is a general framework in which you can put your own. By default we give a phylogenetic set


$ -i foo.pdf use a named file with explicit extension
$ - bar/ look in directory bar/. Requires the -e flag to denote the file extensions


$ -o results312.xml write output to file. If file exists it will be overwritten so choose your names carefully

None of the rest are needed

file extensions

$ -e html, pdf tells AMI what files to use in the -i option


$ -x is an expath expression (not yet used)

PDF files

PDF files have to be converted before use. They are split into one *.html file (currently TEXT.0.html) and several PNG or SVG files depending on the type sof graphic i the file. By default AMI will just use the html option (as if the file was explicitly present). Note that conversion can take time (perhaps 2 secs per page, so perhaps 20 secs) and may also generate files (e.g. in target/)

Currently PDF seems to have a bug in the distro ... If you have an interesting problem, talk to PMR who can run the PDF for the class.

SVG files

SVG files liberated from PDFs can be searched for some of the current types, especially tree,species and sequence. The problem is knowling what an SVG contains - we are working on that RSN.