petermr  committed 51ba346


  • Participants
  • Parent commits f015fdc

Comments (0)

Files changed (2)

+CrystaleyeTestdata LICENSE
+All data are licensed as Open Data under the PDDL license
+CrystaleyeTestdata README
+This is a subset of publications automatically downloaded from Acta Crystallographica E
+Open Subset. It currently contains:
+* download
+    ... 254 folders (uuid names generated by Nick Day's downloader)
+        ... entry.xml metadata on each publication
+        ... aadddd.html supplementary data for each pub
+* html
+    ... all folders will be generated by crystaleye-processor. Some will be downloaded.
+    the directory structure will be of the form /x/x/x/uuid where x/x/x are the first 3 
+        letters of the uuid. In each directory are:
+        annotated.xml		probably obsolete
+	chemicalTagger.xml	result of chemicalTagger
+	chemicalTreeBank.xml	result of adding semantics with ChemicalTreeBank
+	cifmetadata.xml		biblographic metadata from CIF file
+	data.cif		CIF file downloaded from Acta site
+	data.cif.cml		Cif processed to CML stage 1
+	data.cif.xml		Cif processed to CML stage 0
+	data.complete.cml	Cif processed to CML stage 2
+	data.morganized.cml	possibly obsolete
+	data.png		probable 2D chemical structure from 3D coordinates
+	experiment.xml		experimental paragraph (to be parsed by chemical tagger)
+	full.html		full-text of paper (downloaded)
+	image.png		chemical structure of compound as image (roundtripping)
+	imageMorgan.cml		converted to chemical identifier
+	imageStructure.cml	converted to chemical structure
+	metadata.xml		bibliographic metadata from Acta site
+	morganized.xml		experiment with resolved chemical names
+	opsin.png		parsing of title as compound (image)
+	opsin.xml		parsing or title into chemistry
+	opsinCoords.xml		coordinates of opsin structure
+	opsinMorgan.xml		identifier for opsin structure
+	resolved.xml		obsolete?
+	scheme.gif		image of chemical structure (from Acta site)
+	summary.html		summary page on crystaleye
+	summaryPageUrl.xml	summary page URL
+	suppText.html		supplemental data (tidied)
+        title.xml		title of compound (chemical name)
+        Not all files may be present. Failures in parsing result in zero-length files