generateDS.py -- Generate Python Data Structures

What is it

generateDS.py generates Python data structures from an Xschema document. It generates a file containing: (1) a Python class for each element definition and (2) parsers (which use the Python minidom module) for XML documents that satisfy the Xschema document. The class definitions contain:

  • A constructor with initializers for member variables.
  • Get and set methods for member variables.
  • A 'build' method used during parsing to populate an instance.
  • An 'export' method that will re-create (write out) the XML element in an XML document.
  • An 'exportLiteral' method that will write out a text (literal) Python data structure that represents the content of the XML document.

How to build and install it

De-compress the generateDS distribution file. Use something like the following:

tar xzvf generateDS-1.5a.tar.gz

Then, the regular Distutils commands should work:

python setup.py build python setup.py install

How to use it

See generateDS.html for documentation.

Produce class definitions and sub-class definitions with something like the following:

$ python generateDS.py -o people.py -s people_subs.py people.xsd

Here is a test using the enclosed sample Xschema file:

$ python generateDS.py -o people.py people.xsd $ python people.py people.xml

More information

More information on generateDS.py is in generateDS.html in the distribution or generateDS -- Generate Data Structures from XML Schema -- http://www.davekuhlman.org/generateDS.html.

There is also a tutorial. See tutorial/tutorial.html in the distribution or generateDS -- Introduction and Tutorial -- http://www.davekuhlman.org/generateds_tutorial.html.


XML Schema limitations -- There are lots of things in Xschema that are not supported. You will have to use a restricted sub-set of Xschema to define your data structures. See the documentation (generateDS.html) for supported features. See people.xsd and people.xml for examples.

Mixed content -- generateDS.py generates a parser and data structures that do not handle or represent mixed content. Here is an example of mixed content:

<note>This is a <bold>nice</bold> comment.</note>

My only, and some what feeble, excuse for this is that generateDS.py is intended for structured data rather than marked up text. However, whether my excuse is a good one or a feeble one, you should be warned that if you anticipate needing mixed text, do not use generateDS.py.

Large documents -- The parser generated by generateDS.py uses minidom. This means that the entire XML document must be read and a DOM tree constructed in memory. In addition, the data structures generated by generateDS.py must occupy memory. This means that generateDS.py is not well-suited for applications that read large XML documents, although what "large" means depends on your hardware. Notice that the parsing functions (parse() and parseString()) over-write the variable doc so as to enable Python to reclaim the space occupied by the DOM tree, which may help alleviate the memory problem to some extent.


Copyright (c) 2002 Dave Kuhlman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


[MIT License -- http://www.opensource.org/licenses/mit-license.php]

Change history

Version 2.24b (01/02/2017)

  • Added several fixes to generateDS.py and process_includes.py that are needed for the support for Python 3. Thank you Ian Glover for catching this and for contributing the fixes.
  • Fixed bug in generation of regular expression for validating pattern in a restriction on a simpleType. In the pattern, we needed to replace "|" with "$|^", unless the vertical bar was escaped with a backslash. This was necessary so that each regular expression separated by a vertical bar would be anchored at the left and right. Thanks to Clint Pitzak for catching and reporting this.
  • Modified the Django support (in ./django/) so that it will run under Python 3. Thanks to Shane Rigby for reporting this problem.
  • Fixed an error in encoding unicode valueOf_ for <xs:complexType <xs:simpleContent> <xs:extension base="xs:string">. Thanks to Andrii Iudin for catching this.

Version 2.24a (11/16/2016)

  • Added entry_points to setup.py so that distutils will generate executable scripts for executable .py files (for example, generateDS.py and process_includes.py). Thanks to Michael Jenny for suggesting this and for showing the way to do it.
  • Fixed function call signature mismatch in MixedContainer call to export method. Thanks to Lev Israel for catching this and providing the solution.
  • Added "remove duplicate elements" fix to catch duplicate definitions of child elements with the same name inside a single parent element. This fix does the following: (1) removes duplicate child; (2) makes the remaining child a Python list (effectively maxOccurs="unbounded"); (3) prints a warning message when it finds and removes a duplicate. Thanks to Pietro Saccardi for catching and reporting this.
  • More fixes for "remove duplicate elements".
  • Removed command line option for "remove duplicate elements". This behavior will now always be performed.
  • Added unit test for "remove duplicate elements".
  • Added command line option "--no-warnings" to turn off warning messages. I needed it for the unit test for "remove duplicate elements".

Version 2.23b (09/26/2016)

  • Added missing unit test files to build (MANIFEST.in).
  • Fixed exception that occurs when character content is empty for an element defined as type xs:token. Thanks to Andrii Iudin for reporting and checking this.

Version 2.23a (09/14/2016)

  • Integrated Clayton Daley's fixes to the unit tests. Thanks much, Clayton.
  • Clayton's fixes to the unit tests uncovered several errors that had been masked and hidden. Fixed those errors, for example: (1) eliminated generation of erroneous call to validation method; (2) added catalog file.

Version 2.22c (04/26/2016)

  • Fixes to generation of validation methods for xs:date, xs:time, and xs:dateTime simpleType. Thanks to Andrii Iudin for reporting this and for suggesting a solution.
  • Added additional unit tests for validations of xs:date, xs:time, and xs:dateTime simpleType.

Version 2.22b (04/20/2016)

  • Fixed endless recursion that occurred while attempting to replace attribute group names. Thanks to Bing Wang for reporting this and for identifying and providing the XML schema that reproduced it.
  • Fixed failure to clean up names containing special characters in function generateBuildStandard_1. This error was uncovered when generating code from Bing Wang's schema. Thanks again Bing.

Version 2.22a (04/11/2016)

  • Added support for additional command line options to generateds_gui.py. Added analogous support to generateDS.py for use of session files produced by generateds_gui.py.
  • There is now a bit of documentation with a few usage notes on generateds_gui.py. See generateds_gui_notes.txt and generateds_gui_notes.html.

Version 2.21a (04/01/2016)

  • The GUI (graphical) front end to generateDS.py has been resuscitated and is now working again thanks to Aleksandr Dragunkin. The GUI front end must be run under Python 3, and you must install Python support for Gtk. Aleksandr has also provided a Russian translation of the labels etc in the user interface. You can run that with:

    $ cd /path/to/generateds/gui $ python3 generateds_gui.py --impl-gui=generateds_gui_ru.glade

    Note that the GUI interface still lacks support for a few of the command line options that were added most recently to generateDS.py. If you need one or more of those missing options but would still like to use the GUI front end, you can consider using the "Capture CL" under the Tools menu, and then copy and paste the result into a shell script, add any needed options to that script, and run the script from the command line.

Version 2.20b (03/28/2016)

  • Fixes to handling of simpleType with and without restrictions on another defined simpleType. These were not being handled correctly when the name of the simpleType contained a dash. Thanks to Ryku for identifying this problem and for a very helpful description of what was wrong and for providing schemas to reproduce the problem.

Version 2.20a (02/25/2016)

  • Another patch for Python 2 and 3. We needed to protect against performing an encoding that caused an exception in generateDS.py and process_includes.py. Thanks to Marcus Schäfer for catching this and for providing a fix.

Version 2.19b (02/16/2016)

  • Modified generated code so that it will run under both Python 2 and Python 3. There is no longer any need to generate different code for Python 2 and Python 3. If fact, the "--py3" command line option has been removed.

Version 2.19a (02/08/2016)

  • Added the ability to generate code that can run under Python 3. Use the "--py3" command line option. Note that if you generate code for Python 2 (the default), then you must run that generated code under Python 2. And, if you generate code for Python 3, then you must run that generated code under Python 3. There is currently no way to generate code that will run under both Python 2 and Python 3.

  • Modifications so that generateDS.py itself can be run with either Python 2 or Python 3.

  • Fixed the template (TEMPLATE_HEADER) so that it uses the format function and keyword arguments.

  • Added info on --py3 command line option to doc (generateDS.txt).

  • Added new script (fix_subclass_refs.py) that can be used to fix-up (change) which subclass file (of two or more that were generated with the -s command line option) is used by the superclass file when parsing an XML instance document. This will enable you to use the -s option to generate multiple subclass files, add different code to each of them, and then parse documents and create instances of classes from one then another during the same run. But also, see next item.

  • Added generation of code to lookup the subclass of a generated class using a global variable containing the subclass module. This provides an alternative and more convenient way to do the above (i.e., use fix_subclass_refs.py to select from multiple subclass files generated with the -s command line option). However, there may be tasks that can be performed with that script or a modified version of it that cannot be done with this approach using a global variable. Here is a sample script that uses this option:

    import tmp01suba import tmp01subb def test():

    tmp01suba.supermod.CurrentSubclassModule_ = tmp01suba roota = tmp01suba.parse('test01.xml', silence=True) tmp01subb.supermod.CurrentSubclassModule_ = tmp01subb rootb = tmp01subb.parse('test01.xml', silence=True) roota.show() print '-' * 50 rootb.show()


Version 2.18a (12/16/2015)

  • Fixed quoting of simpleContent so that, e.g., "&amp;" is exported as "&amp;" and not as "&". Thanks to Ardan Patwardha for reporting this and contributing a fix.
  • Fix to generation of exportAttributes so that the test for already generated is properly quoted. Thanks to Naresh Shenoy for reporting this and for contributing a fix.
  • Another fix related to the unquoted constant in exportAttributes. A simple fix had a bad conflict. Thanks to Christian Rasmussen for focusing my attention on this one.
  • Fix for xs:simpleContent that extends type xs:float (or xs:integer or other numeric types). When set to numeric zero (for example, after parsing the instance doc), the value was not being exported. Thanks to Ardan Patwardhan for diagnosing this and for contributing the fix.

Version 2.17a (08/17/2015)

  • Modified setup.py so that process_includes.py is installed where it can be imported.
  • Changed default settings for export -- Default is now to generate only the normal export methods, instead of both normal and literal. See command line option --export.
  • Fix to regex pattern used to capture "<![CDATA[ ... ]]>". The old pattern was dropping ending characters when the content contained HTML/XML markup. Thanks to Adrian Cook for this fix.
  • Merged use of replacement patterns in cleanupName. With this fix users can specify patterns to look for and replacements strings to be used to clean up special characters and other patterns in names. There are some notes in the document; search for "cleanup-name" in generateDS.html. Thanks to Fedor Tyurin for suggesting and implementing this enhancement.
  • Added unit test for enhanced cleanupName. Added documentation to generateDS.txt.

Version 2.16a (05/28/2015)

  • Added new command line option ("--preserve-cdata-tags") that causes generation of code that contains a special parser to retain CDATA tags. Thanks to Adrian Cook for reporting this, for providing test data and test cases, and for help with testing and feed-back.
  • Added ability for user to specify the names of classes generated from anonymous, nested xs:complexType definitions, rather than accept the names created in process_includes.py.
  • Added a unit test for the anonymous, nested definition capability.
  • Fix to error caused by check (in generated code) for whether lxml or ElementTree is being used. We no longer support use of ElementTree. Thanks to Emil Nordling for catching and reporting this.

Version 2.15b (04/07/2015)

  • Fix to generation of simpleType validation code for list (unbounded) elements. Thanks to wobanator for this fix.
  • Fix to code for --one-file-per-xsd. Added check to avoid an infinite loop schemas not suitable to --one-file-per-xsd. Thanks Michael Vezie for catching this and for identifying relevant location in the code. And, thanks to George David for providing a better fix than mine.
  • Enhancement so that child elements defined with a default value will not export when the current value and the default value are the same. Also added equivalent changes for attributes. Thanks to Jan Biel for finding and reporting this.
  • Added unit tests for the above default value enhancement.

Version 2.15a (02/18/2015)

  • Modifications so that we generate code that can be used by Python 3. Thanks much to Richard Gerkin for this work.
  • Removed possible use of ElementTree. Lxml is now a requirement for both running generateDS.py itself and for running the generated code.
  • Fixed exporting of text content so that, when it contains CDATA sections, the mark-up characters inside the CDATA sections are not escaped. Thanks to George David for reporting this and for helping with a fix.

Version 2.14a (11/26/2014)

  • Fixed export of simpleType lists (added "' '.join(data)". Thanks to Per Rosengren for catching this.
  • Added new style validation of simpleType data. Validation requirements are captured from the XML schema definition of the simpleType, e.g. 'restriction base="..."' etc. Thanks to azer gh for implementing much of this extended capability.
  • Added unit test for simpleType validation, including test for proper detection of bad (invalid) data.
  • Did some code cleanup with the help of the flake8 code checker.
  • Added a fix so that attribute declarations that use ref= rather than type= will also be generated with the specific type. Thanks to Florian Wilmshoever for catching and reporting this and for providing an XML schema as a test case.
  • Added unit test for reference to simpleType.
  • Fix to generation of names of substitutionGroup. The namespace prefix was not being stripped in some cases.

Version 2.13a (09/09/2014)

  • Minor fix to function generateToEtreeChildren. Must generate call to get_valueOf only when defined (i.e. when element is simpleContent or isMixed).
  • Fix to generation of class name prefixes added with the "-p" command line option. This fix was added by Christian Ascheberg. Thank you Christian.
  • Added unit test for class name prefixes command line option.

Version 2.12f (08/12/2014)

  • Fix for substitutionGroup conflict with keyword name mapping. Thanks to Leonid Minchin for finding and helping with this problem.
  • An exception occured when an element had a documentation string that was short (possibly 1 character). Fixed. Thanks to Matthias Zaake for finding this and for providing a patch.

Version 2.12e (06/16/2014)

  • Fix for formatting error. Thanks to Nikolay Lavrov for catching this and for providing a fix.
  • Fix to gds_parse_datetime(). The Python datetime module's datetime object uses microseconds, but xs:dateTime uses fractions of a second (e.g. 0.123). Converted from decimal fraction to microseconds. Thanks to Mikki Weesenaar for catching this.
  • Modified behavior and names for generated method insert_xxx(which are generated when, e.g., maxOccurs="unbounded"), so that now we generate insert_xxx_at and replace_xxx_at. Thanks to Bart Wagenaar for pointing out this deviation from Pythonic style.
  • Function transitiveClosure in generateDS.py was susceptible to infinite looping. This seemed to occur when a substitutionGroup contains a member with the same name as the head of the substitutionGroup (but in a different namespace?). Added a test to stop the recursion when this occurs. Thanks to Stuart Chalk for finding and reporting this.
  • Added explanation to the documentation explaining how the source distribution (generateDS-x.xxy.tar.gz or Bitbucket) is needed for use of the Django model generation capability.

Version 2.12d (04/02/2014)

  • Fix for an infinite loop caused by inconsistent use of mapped/clean names with list AlreadyGenerated. Thanks to Jerome Allasia for catching this and for suggesting a fix.
  • Added a unit test for the use of mapped/clean names, in particular when one xs:complexType is an xs:extension of another.
  • Changed several lists to sets for faster look-up, for example AlreadyGenerated, AlreadyGenerated_subclass, DelayedElements, etc.
  • Cleaned up the use of functions mapName() and cleanupName() to avoid duplicate transformations.

Version 2.12c (03/28/2014)

  • Fix for "one module per XSD file" to handle an include or import element that refers to a remote schema on the Net (i.e. the location is "http:..." or "ftp:...") rather than a file on the local file system. Added ability to access include/import file across the Net. Thanks to Jinquan Liu for reporting this.
  • Added schema to unit test for "one module per XSD file" that is read from remote site (http://www.davekuhlman.org).
  • Fix to process_includes.py -- When run directly from the command line (as opposed to imported and called from another python module), the fixtypenames option was not being intialized.
  • Fix for error in order of generation of classes that have superclasses. When an anonymous simpleType occured, the name of the enclosing complexType was used, which caused generateDS.py to believe that the superclass had already been generated. Thanks again to Jinquan Liu for reporting this issue.
  • Fix for handling of xs:substitutionGroup -- Namespace prefix was causing gDS to fail to match on substitutionGroup name.
  • Added code so that an instance of a generated class can remember the tag from which it was built. This is needed for instances of a class that represents an element type that is a member of a xs:substitutionGroup. But, in fact, generated code now uses this feature to remember and export the tag name of all complex elements.
  • Enhanced command line option --root-element so that both the root tag and the root class can be specified (separated by a vertical bar).
  • Added support for the ability of an element definition to inherit minOccurs and maxOccurs from the xs:sequence that contains it.
  • The command line options and command line arguments used to generate modules are now included as comments near the top of the generated modules. Also included in these generated comments is the command line used to generate the module. This will help users later to determine which XML schema and what options were used to generate each module, and to re-generate the module, if needed. Thanks to Mikki Weesenaar for suggesting this enhancement.

Version 2.12b (02/10/2014)

  • Fix to the aliasing capability. You should now be able to alias one element to another, and by doing so, only generate the targeted alias. See notes on generateds_config.py in the documentation for more on this. Thanks to Mikki Weesenaar for bring up the use case that needed this.
  • Additional fixes for the "one module per XSD file". Also, creation of a unit test for this capability. See section "One Per -- generating separate files from imported/included schemas" in the documentation for more information. Thanks again to George David for all his work on this.
  • Fixes to process_includes.py -- Some uses of namespace prefix xs: were hard-coded, whereas some XML schemas use xsd: instead of xs:.
  • Various fixes to unit tests so that all unit tests pass when using either the cloned Mercurial repository at Bitbucket (https://bitbucket.org/dkuhlman/generateds) or the tar achive.

Version 2.12a (10/29/2013)

  • A name conflict issue caused by naming anonymous types. An anonymous type is a complexType that does not have a name attribute and that is nested inside an element that does not have a type attribute. Strengthened the code that generates new, unique names. And, also fixed a problem or two in the surrounding code. Thanks to Shahaf Abileah for reporting this and for providing test files to reproduce the problem behavior.
  • Created unit test for anonymous types.
  • Added command line option --fix-type-names. This may be useful if there are name conflicts in your XML schema, for example, because the schema refers to two types with the same name but in different namespaces.
  • Ability added to generate one Python module for each XML Schema (.xsd file) imported/included. Added command line options --one-file-per-xsd, --output-directory=, and --module--suffix= in support of this. Thanks much to George David for implementing this new feature.
  • This change provided by Logan Owen. -- Return self from build function of generated classes, to allow easy chaining. The main use case for this change is if you have a list of xml documents, and you want to change them into generateDS class instances. Thank you Logan.

Version 2.11a (08/16/2013)

  • Added ability to use XML catalog to find included/imported schemas. The -c command line option has been added to support this. Thanks to George David for implementing this enhancement.
  • Added unit test for the catalog capability.
  • Added ability to pick up the target namespace and its prefix, then use them in calling the export functions from the parse functions. Thanks to George David for suggesting this.
  • Several fixes to formatting date and floats during export. Thanks to Domenico Mangieri for catching and fixing these.
  • Added generation of an extra, optional "silence" argument to the parse functions so that export can be turned on or off at runtime. Domenico is the motivator on this one, too.
  • The information about minOccurs and maxOccurs in the generateDS document (generateDS.txt) was misleading or wrong. Edited it. Thanks to Rinat Yangurazov for catching this.

Version 2.10b (07/22/2013)

  • Changed flag for generating getters and setters. Removed flag --use-old-getter-setter. Replaced it with new flag --use-getter-setter, which can have the following values:

    "old" - Name getters/setters getVar()/setVar(). "new" - Name getters/setters get_var()/set_var(). "none" - Do not generate getter/setter methods.

    The default is "new". See the help (use --help option) or see the doc (generateDS.txt/generateDS.html) for more on this. Thanks to Mike Vella for suggesting this.

  • Changed suffix used to prevent name conflicts with Python keywords from "xx" to "_".

Version 2.10a (05/29/2013)

  • Added ability to produce mapping (a dict) during call to to_etree() that maps gDS instances to their associated etree elements. Also added convenience method gds_reverse_node_mapping to reverse the order of keys and values in a mapping/dict. See function parseEtree in the generated code for hints about how to produce these mappings. There is also a note on generating the Lxml Element tree in the docs (generateDS.txt/generateDS.html).
  • Python datetime.date objects don't have tzinfo, so trying to access it in gds_format_date was throwing an error. According to http://stackoverflow.com/a/610923, the best way to avoid that type of error is to use a try/catch for AttributeError. Thanks to Logan Owen for this fix.
  • Fixed bug so that gDS will now handle a simpleType nested inside a restriction nested inside a simpleType. Thanks to Christian Kaiser for finding this, for focusing my attention on it, and for providing the sample files to test it with.
  • Fixed bug where gDS was failing to resolve defined a simpleType correctly. It was failing to add the XSchema namespace (usually xs:). Thanks again to Christian Kaiser for focusing me on this one.
  • Fixes to handling of xs:dateTime when the XML schema specifies a default value and the XML instance document omits the value. Also, fixed formatting because datetime.strftime does not handle dates outside of range (e.g. earlier then 19000). Attempts to use a consistent internal representation across xs:dateTime, xs:date, and xs:time, specifically instances of datetime.datetime, datetime.date, and datetime.time from the Python standard library. Thanks to Shahaf Abileah for reporting this and for providing an example of the schema. Caution: Because this changes the internal representation of dates and times used by the generated code, this fix may break some existing applications.
  • Various fixes to generation of method exportLiteral in generated classes.
  • More code clean-up in generateDS.py to eliminate coding style warnings and errors reported by flake8. Ditto for process_includes.py. Also, made a few changes to reduce the warnings and errors reported by flake8 when run on code generated by gDS.

Version 2.9a (02/21/2013)

  • Added support for exporting to an Lxml element tree. The element tree can then be serialized to XML, e.g. using Lxml etree.tostring(). This innovation is by Logan Owen, who also did most of the work on it (but I helped some, too). Note that this work is not yet complete; it's still "work in progress"; but it looks very promising.
  • Added --export command line option. This enables the user to selectively generate export methods for any or all of normal export, export to etree (lxml element tree), or export to literal python code. This will enable users to reduce bulk in their generated files when any or all of these are not needed. The default is "write literal", i.e. the normal export methods that we are used to. Use the --help command line option or read the doc for a description of this option.
  • Fixed a bug that occurs when a schema has an attributeGroup referenced with a name that includes a namespace prefix but the attributeGroup is defined with a name that does not have the namespace prefix. Thanks to Mike Detecca for reporting this and for nudging me in the right direction when I, initially, made the wrong fix.
  • Added unit test for export to etree.
  • Various fixes to the to_etree (export to Lxml element tree) capability: (1) fix to preserve names that contain special characters (e.g. "-" and "."); (2) fix to preserve the type attribute (xsi:type) for abstract types that whose type is set explicitly. Round turn (XML --> gDS object tree --> lxml element tree --> gDS --> lxml ...) now seems to work reasonably well, although I'm guessing that there are still bits missing (in particular, support for xs:anyAttribute).

Version 2.8c (provisional) (01/30/2013)

  • Changed generated check for attributes that are already_processed to use a set object rather than a list. Since sets are hashed, I believe that lookup is faster.

Version 2.8b (01/30/2013)

  • Fixed missing underscore in reference to member names in generateExportLiteralFn_2. Thanks to Sergii Chernysh for reporting this.
  • Fixed use of NameTable for mapping names when an element has an attribute and a child with the same name. Needed to use correct name (original name or mapped name) when doing (1) fix_dup_names, (2) exportAttributes, and (3) buildAttributes. Thanks to Mike Vella for reporting this.
  • Fixed gds_parse_datetime so that it will handle fractional seconds. Thanks to Matt Hughes for providing this fix. Now, xs:dateTime values that include microseconds are successfully parsed and exported.
  • Created a Mercurial repository for generateDS at Bitbucket: https://bitbucket.org/dkuhlman/generateds

Version 2.8a (01/10/2013)

  • Fix to process_includes.py so as to remove the limitation on the number of unique names it can generate when raising anonymous types to the top level. Thanks to Daniel Browne for help with this.
  • Added support for multiple level attributeGroup, i.e. for attribute groups that themselves contain references to other attribute groups. Thanks to Harley Green for pointing out the need for this. Also added a unit test for attribute groups.
  • Added support for more date and time simple types, specifically gYear, gYearMonth, gMonth, gMonthDay, and gDay. Thanks to Nicholas Krasney for catching this. Added tests in the unit tests for these types.
  • Quite a bit of code clean-up with the help of the flake8 Python code checker. This is predominantly code cleanup that does not affect behavior, most commonly splitting lines that are longer than 80 characters across multiple lines for readability. (See: http://pypi.python.org/pypi/flake8 for info about the flake8 Python code checker. I use it with the Syntastic plugin for the Vim text editor.)
  • Added generation of a dictionary that maps element definition names to generated class names. Thanks to Elena Dolinin for the original work on this one.
  • Added support for xs:date and xs:dateTime. These are now captured as instances of class datetime.datetime from the Python standard library. They are parsed and exported with the help of that class and using methods gds_format_date, gds_format_datetime, gds_parse_date, and gds_parse_datetime in class GeneratedsSuper (which is part of the generated module). Logan Owen did the work on this. Thanks much to Logan for implementing this and contributing this patch.
  • Turned logging off. I did not realize that generateDS.py had been creating a log file (generateDS.log). Logging can be turned back on by modifying the logging calls near the top of generateDS.py.
  • Fixed exception that is thrown when the XML schema file (.xsd) only contains a simple type. Now, the output is generated, but it contains no data representation classes. Thanks to Daniel Browne for catching this.

Version 2.7c (08/06/2012)

  • Added xs:hexBinary to the list of string types in generateDS.py and django/generatedssuper.py. Effectively, we are generating the same code for types xs:base64Binary and xs:hexBinary. That leaves it up to the user to add code that converts into and out of these formats. Thanks to Peter Kreinhöfer for finding this.
  • Added support for compressed export, that is, export without ignorable white space (indentation and new lines). Normally the generated export methods produce pretty-printed (indented) XML output. With this change, we generate modules which enable you to export in a way that omits ignorable whitespace. It is anticipated that this feature will be useful to those who need to export XML documents that are machine readable but not human readable. Thanks much to Logan Owen for doing the work on this one. Compressed (non-pretty-print) output is produced by passing the keyword argument pretty_print=False to the export method. There is a note in the document (generateDS.html) about it; see section titled "Exporting compact XML documents".

Version 2.7b (12/10/2011)

  • Fix for xs:any in buildChildren in an element defined with no other children so that we do not generate "else:" clause without an "if ...:". Thanks to Keith Robertson for help with this.
  • Change for xs:any in buildChildren (when maxOccurs > 1) so that the gds_build_any() method always, consistently takes a single child node as input and returns a single built object. Thanks Marcin Tustin for guidance with this.
  • Fix for element definition containing an anonymous xs:simpleType.
  • Added xs:time to list of handled simple (date, time) types.

Version 2.7a (11/04/2011)

  • Fix for case where a child is defined with a reference (ref="") to a complexType (rather than a simpleType) and the complexType is abstract.
  • Added minimal support for xs:any. See section "Support for xs:any" in the documentation.
  • Added unit test for xs:any.

Version 2.6b (10/13/2011)

  • Fix for case where a child element is declared with a type that is a simpleType whose restriction base is another simpleType that is referred to with a namespace prefix. With this fix we ignore the prefix, so that at least it will work when there are not two different simpleTypes whose qualified names have the same local name (qualified name minus the namespace prefix). Thanks to Thomas Nichols for finding and reporting this one.
  • Added a unit test for the above restriction base with namespace prefix.
  • Added a blank character when needed at the beginning and end of of doc strings inserted in generated classes to protect against the case where the doc string begins or ends with a double quote character.
  • Fixes to various files in the tutorial/Code/ directory and to the text files in the tutorial/ directory in order to make them more consistent and less confusing. Added the individual sample code files to the distribution so that users will not have to find and unzip a zipped archive.
  • Fixes to files in tests/ and to the distribution config (MANIFEST.in) so that the distributed version would pass unit tests. (Please let me know if it does not.)
  • Removed file generatedssuper.py from the distribution. Added notes to the documentation on how to create this module by copying from a generated module for those who want to customize those methods in the common superclass.
  • Fix to django/generatedssuper.py -- Regularized and fixed the names generated in models and forms files.
  • Fix to the code that generates the member_data_item_/MemberSpec_ list/dict. If the type of a child element is defined by a reference (ref="") to an element rather than, e.g. a complexType, it was using the child's name not it's type.
  • Added xs:base64Binary and xs:language to the list of string types in generateDS.py and django/generatedssuper.py. Also, xs:anyURI and xs:duration.

Version 2.6a (07/28/2011)

  • Fix to capture xs:/xsd:/etc namespace prefix from schema. Was not setting global variable XsdNameSpace. Thanks to Frank Liauw for focusing my attention on this one.
  • Fix for substitutionGroup -- Was not setting the correct instance variable during generation of build method when child is a member of substitutionGroup. Thanks to Serge Dikic for finding this one and bringing it to my attention.
  • One more attempt to fix whether to call the exportChildren method when the complexType is an extension and not a restriction and not defined with simple content that extends a simpleType. Thanks to Jaime Cepas for alerting me to this.
  • process_includes.py has a fix to the problem where there are more than one anonymous complexType that define elements with the same name. The issue is that generateDS.py must generate a Python class for each complexType and cannot do so in this case. One solution, which is now implemented in process_includes.py, is to raise each complexType to top level in the schema DOM tree and to give it a name. process_includes.py does this by appending "Type" to the name, and when there are duplicate names, appending "1" or "2" or ... to that. So far this change passes my tests, but it does not work for you, then comment out the call to raise_anon_complextypes(). Thanks to Amal Khailtash for finding a schema that exhibits this problem and bringing it to my attention.
  • Fix for generation of export method that exports xs:anyAttribute when there is an xsi:type attribute.
  • Fix for use of valueOf_ -- Should only be used when element is defined either with (1) mixed content or (2) simpleContent.
  • Question: The xsi:type attribute is being exported for any derived type. Perhaps it's harmless, but it seems excessive. When should the export method have that code to export the xsi:type attribute? Only for types derived from (an extension of) an abstract base type? Only for the abstract base type itself? Only when a derived type is substituted for a base type using the base type's tag and the xsi:type attribute to specify the derived type? Need to investigate.
  • I've reworked the xsi:type attribute stuff. It now operates on the following assumptions: (1) an instance of any complexType that has been extended can have an xsi:type attribute (which specifies one of the extending types) and (2) the generated code should export the xsi:type attribute only and always when (if and only if) the element in the input instance document has that attribute.
  • A patch to convert floats and ints etc to str during export. Thanks Jaime Cepas.
  • Fixes to ctor/initializers when there is a default value for a an child element defined as a complexType containing simpleContent.
  • librarytemplate -- (1) Renamed documentation files to librarytemplate_howto.html and librarytemplate_howto.txt for consistency with the name of the librarytemplate distribution file (currently librarytemplate-1.0a.zip). (2) Added the documentation and distribution files for librarytemplate to the main generateDS distribution file.
  • Added xs:byte to the list of integer types.

Version 2.5a (06/06/2011)

  • Fix for generation of default value in parameters for the constructors.
  • Fix for lookup of attribute value in generated buildAttributes methods -- Formerly, attribute names having a namespace prefix were not being found.
  • Added some support for xs:group -- Named model groups (model group definitions) are now treated as definitions of blocks of elements to be copied/inserted where referred to. This replacement has been added to the preprocessing done in process_includes.py. So, this <xs:group ref="some_def"/> is replaced by the contents of <xs:group name="some_def"> ...
  • Fix to generation of calls to validator methods for child elements. Before the fix, the validators were called in buildAttributes methods but not in buildChildren. Also, generation of the validator method (stubs) was also missing in some cases. Thanks to Béres Botond for alerting me to this.
  • Fixes to generateds_gui.py -- Now it can load a session again. Also a fix to the check for and warnings about the changes to current session on exit.
  • process_includes.py -- Fix for yet another problem with including the same file multiple times when included from different directories.

Version 2.4c (03/21/2011)

  • Added minimal support for unsignedLong, unsignedInt, and unsignedByte.
  • Made the retrieval of the parent (superclass) name and parent object for an element more consistent. Fixed some cases where this was not handled correctly, in particular, the generation of arguments and paramenters for ctors (__init__) was inconsistent and caused errors.
  • Regularized the handling of fromsubclass_ and added this handling to the exportChildren methods. This is used to tell a superclass, during build and export, that the subclass has already performed certain operations.
  • Fix to process_includes.py so as to prevent it from loading schemas multiple times. The check for already_processed was formerly incorrect.
  • Fix related to restrictions on complexType -- Do not generate call buildChildren in the superclass for restrictions (as opposed to extensions) of a complexType. Ditto for exportChildren. Note that restrictions must repeat (and restrict the value of) each sub-element that is to be included in the content of the restriction. See: http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#DerivByRestrict

Version 2.4b (02/21/2011)

  • Fix to generation the superclass in the class statement header line. Formerly we did not correctly pick-up the superclass name (from extension base=""). Thanks to Timo Sugliani for finding this bug.

Version 2.4a (02/16/2011)

  • A few fixes to format of some error messages.
  • Clean-up names in the exportableClassList (__all__).
  • Modify reading session object/doc to use lxml instead of minidom.
  • Fix to process_includes.py to protect against crash when an import element is missing a schemaLocation attribute.
  • Fix to parsing and exporting simpleTypes declared as lists (<xs:list>).
  • Added new methods to class GeneratedsSuper to validate (during build) and format (during export) for simpleTypes declared as lists (<xs:list>).
  • Fix for incorrect detection of type during generation of build method.
  • Added first cut at generating Django models and forms. Thanks to Derek Corbett for this suggestion.
  • Added "meta-app" that generates Django database models and Django forms. See doc and files in subdirectory django/.
  • Fix to generation of __all__ list: converted non-word characters to "_" etc
  • Fix to process_includes.py so that it uses the entire path to a file when trying to determine whether it duplicates a previous import. Perhaps this will avoid skipping an import when attempting to import two files with the same name from different directories. Thanks to Mihai Ibanescu for pointing out this fix.

Version 2.3b (12/28/2010)

  • Fix for simpleTypes defined as a restriction whose (ultimate) base type is a pre-defined simple type which were not generating correct (type-specific) code in build method. Thanks to Noel Merket for finding this problem.
  • Fix for simpleTypes defined as a xs:list with "itemType" attribute where the type was not being recognized.
  • Fix so that we recognize some other simple types as xs:string type (e.g. xs:NMTOKEN, xs:ID, xs:Name).
  • To do -- If a simpleType is a restriction on another simpleType and the base simpleType definition is declared as a list, we are not recognizing that it is a list.

Version 2.3a (12/02/2010)

  • Added generatation of code to handle attributes inherited by a restriction from its base type and the types that the base extends (i.e. from a restriction base class and its superclasses). Thanks for help from Jaime Cepas.

  • Fix to code that generate the references to the superclass from a type that is an extension: special characters (e.g. dash) were not being cleaned/mapped. Reported by Koen Smets; thanks.

  • To do:

    • In a restriction, inherited attributes can be "prohibited". It would be nice if gDS would do something to block their use.

    • When:

      AbstractElement mixed=false and Element1 mixed=true base=AbstractElement and Element2 mixed=FALSE base=AbstractElement

      Incorrect parse code is generated for Element2. Reported by Jaime Cepas.

    • It might be desirable if the getter functions could be asked to return values encoded to utf-8 for xs:string types.

    • Code that is generated to export to python code needs updating, in particular we need to update encoding of exported strings. Thanks to Kristoffer Kobosko for reporting this.

    • Update to the code that generates code that exports Python literals (exportLiteral ...). In particular: (1) fix excoding of Python code and of string literals (unicode, utf-8).

Version 2.2b (11/10/2010)

  • Added generation of __all__ global variable containing a list of generated class names. This enables you to do a reasonably safe "from mymodule import *". It's sorted, so it also gives you something in the way of an alphabetical table of contents of the generated classes. Thanks to Jaime Cepas for this.
  • Added another fix so that the generated code for mixed content elements will not generate empty blank lines on export. Thanks again to Jaime for this fix.
  • Added patch to sort mixed content in their class containers. Jaime contributed this one too. Thanks again.
  • Added check for endless recursion while collection list of parent type element names. When detected, raises exception that identifies the elements. Thanks to Maximilian Holtzberg for finding this one. One case that can cause this problem is when an element type definition extends a type definition of the same name in a different namespace. Since generateDS.py ignores the namespace, this looks like a type that is extending itself.
  • Modified code generated to process token lists in order to prevent breakage processing some strings.
  • Updated the tutorial so that the examples use the new parsers (ElementTree or lxml).
  • The "Clear" buttons in generateds_gui.py is broken when run with GTK2. generateds_gui.py is still usable, but, if you need to erase the contents of a text field, you will have to do so manually until I can figure out a fix.

Version 2.2a (9/14/2010)

  • Changes for coding consistency -- Used wrt() pervasively instead of outfile.write().

  • Re-write of process_includes.py -- It now handles xs:include/xs:import elements that include a file from another directory that incude a file relative to that directory that include a file across HTTP, and so on.

  • The command line option --search-path is no longer supported. I don't think that behavior was standard for XML schema anyway. Removed support for search_path from generateDS.py, process_includes.py, and generateds_gui.py.

  • Added support for specifying additional name mappings in a config file: generateds_config.py. That file, if it exists, must be located where it can be imported by generateDS.py and should contain a dictionary named NameTable. For example, the following maps the name "range" to "rangeType" so that if the schema defines a complexType "range", generateDS.py will generate a class named "rangeType":

    NameTable = {

    'range': 'rangeType', }

    See the doc for more on this.

  • Instead of using the lower() function from the string module, added a function to the GeneratedsSuper class and used the string method. Prepares for Python 3.0

  • Added "gds_" prefix to all methods in class GeneratedsSuper to make possible name clashes less likely.

  • Fixes to exporting elements with mixed="true" -- Reduced extra whitespace.

  • Fixes to building (capturing) attribute values for elements with anyAttribute="..." -- Eliminated capture of duplicate attribute values.

Version 2.1d (8/23/2010)

  • Fix to indentation of generated code in the build method for type checking of NonNegativeIntegerType.
  • Fix to generation parameters in call to superclass constructor. Count of children was incorrect, triggering generation of valueOf_.
  • Known issue -- If type B extends type A, and type A declares anyAttribute, then duplicate attributes with the same name may be produced during export.
  • Known issue -- Some namespaces ("{URI}") are not converted to namespace prefix during export. The needed information is not available during export.

Version 2.1c (8/8/2010)

  • Fix to functions parse, parseString, and parseLiteral so that they start the build with the correct root class. I believe that there yet is another case that this does not handle, specifically when element name is different from class/type name and the element definition is not the first definition in the schema.

  • Fix to generation of build method for derived elements (i.e. elements with "extension base=". These were being treated as if they were abstract, i.e. 'abstract="true"'.

  • Fix to generation of the call to the superclass constructor in the generated subclass module. Prevented the generation of duplicate arguments.

  • Added a comment to the generated superclass module at the top that specifies the utf-8 source code encoding:

    # -- coding: utf-8 --

Version 2.1b (8/2/2010)

  • Fix to generation of export functions. If no children, must generate "pass" statement.
  • Changed generated get_all_text function so that it uses an "if" statement instead of a conditional (if) expression. The conditional expression does not work with older versions of Python.

Version 2.1a (7/8/2010)

  • Added ability to capture annotation/documentation element text as doc-strings in the generated classes. Thanks to Roy Williams for suggesting this and for guidance.

Version 2.0b (6/24/2010)

  • Fix to generation of export method so that valueOf_ is exported when childCount == 0 and not isMixed.

Version 2.0a (6/21/2010)

  • Switched to use of lxml/ElementTree in generated files. Thanks to Biswanath Patel and Jaime Huerta Cepas for encouraging me to implement the switch to lxml/ElementTree.
  • Modified the generation of functions parse(), parseString(), and parseLiteral() so that they automatically recognize the root element of an instance XML document and call the build method of the appropriate class.
  • Fix to hasContent_ method so that in elements defined with extension-base, the superclass is checked also.
  • For classes that must call an overridden method m in the superclass, switched to use "super(superclassname, self).m(...)" instead of "m.(self, ...)".
  • Known issues -- (1) generateDS.py loops and crashes with "RuntimeError: maximum recursion depth exceeded" on some schemas (for example collada_schema_1_4.xsd). (2) Failure in process_includes.py with import of remote file and nested imports (for example collada_schema_1_5.xsd).

Version 1.20g (5/21/2010)

  • Update to documentation -- Added a section on suggested ways to handle/recognize different top level (root) elements.

Version 1.20f (5/3/2010)

  • Fix to generation of export so that anyAttribute does not cause duplicate attributes to be exported.
  • Fix so that we do a better job of determining whether a reference to a type is a simple, builtin type in generation of constructor.
  • Fix to generation of constructors so that (1) valueOf_ is intialized in subclass modules and (2) valueOf_ is initialized to None (rather than '').
  • To do: Extend the --root-element flag so that we can specify both the tag name and the element/type name. Sometimes they are different.

Version 1.20e (2/8/2010)

  • Fixed error that caused incorrect tag name to be exported when the tag name contains special characters and the tag name is different from the type name.
  • Fixed links so that latest versions are included in the install distribution file.

Version 1.20d (2/3/2010)

  • Updated version number/info in genereateds_gui.py.
  • Fix to process_includes.py -- Handle include elements and import elements in the same way. In particular, allow both to reference schema files on either the local file system or remotely across the Net (via ftp or http).
  • Fix to generation of properties -- When the name of a member is mapped (e.g. a Python language keyword), wrong name for getter and setter was used.
  • Fix to generation of export methods: missing encoding.
  • Fix to selection of type for exportLiteral.
  • Added missing files in the tests/ directory to the distribution.

Version 1.20c (1/1/2010)

  • Replaced symbolic links in the distribution with hard links. Symbolic links do not work on MS Windows.
  • Fix to the use of the subprocess module in generateds_gui.py, which had caused a problem on MS Windows.
  • Cosmetic fix in generateds_gui.py: labeled "Save" (session) button.
  • Fix so that File/Open action in generateds_gui.py will check for and warn user if the session data has been modified.
  • Fix to generation of code for simpleContent with restriction: now treats the restriction element as a superclass. Thanks to Franis Sirkovic for catching this and for providing the patch. Also added a unit test for this case.

Version 1.20b (12/14/2009)

  • Fix to process_includes.py so that it handles relative paths in include/import elements in the schema.
  • Various fixes and additions to the GUI front-end, e.g. added "Clear" buttons to erase some fields.
  • Fixed bug -- self.inRestrictionType was not initialized.
  • Added --session command line option to generateDS.py -- It can now use session files generated by the GUI front-end.
  • Fixes to the generation of the exportLiteral methods. We can now export Python literal representation of an instance doc that can be read/imported by Python.
  • Added unit test for generation of Python literal representation.
  • With the help of Erica Tolbert, generateDS.py can now generate bindings for gcdml (Genomic Contextual Data Markup Language. See http://gensc.org). Thank you, Erica.
  • generateDS.py can now generate bindings for the following (rather large) schemas:

Version 1.20a (12/01/2009)

  • Added first version of the GUI front-end. See the generateDS doc (generateDS.html).

Version 1.19a (10/21/2009)

  • Enhancement to the table of information generated in each class when the --member-specs=list|dict command line option is used. For a complexType defined as a simpleType, we now generate a list of the simpleType and the simpleTypes it is based on using name "valueOf_". Thanks to Ryan Leslie for much help and guidance with these changes. Example:

    'valueOf_': MemberSpec_('valueOf_', [u'RelationType',

    u'RelationType2', u'xs:string'], 0),

    Note the following incompatible changes:

    • _MemberSpec changed to MemberSpec_ -- We want avoid posible name conflicts, not make it "weakly hidden". See the Python style guide for more on this
    • _member_data_items changed to member_data_items_ -- Same reason.
    • Method MemberSpec_.get_data_type() now returns the last item if the types is a list and the single type if not a list.
    • Method MemberSpec_.get_data_type_chain() is a new method that returns the entire list of data types.

    The new tutorial (see tutorial/tutorial.html in the distribution) has an example of the use of the MemberSpec feature.

  • Fix to DecimalType -- In some cases treated as an integer. Should be a float. Thanks Ryan Leslie for catching this.

  • Removed last bits of the generation of a SAX parser. It no longer worked and is not needed.

  • Several fixes to determination and handling of types.

  • Added unit test for extensions to simple types and for MemberSpec.

  • There is now a preliminary version of a tutorial.

Version 1.18f (9/14/2009)

  • Fixes to process_includes.py from Mihai Ibanescu. These fixes address namespace and namespace prefix problems for XML tree that is copied into a document. Thanks Mihai.
  • Added xs:anySimpleType to the list of OtherSimpleTypes. This prevents anySimpleType from being used as a base type.
  • Change so that sub-classes are generated for types that do not have children or attributes.
  • Fixed crash that occurred when a simple type is nested in a simple type and use of memberTypes attribute.
  • Fix to GeneratedsSuper -- Inherit from "object".
  • Added command line option --no-versions, which, when used, tells generateDS.py not to insert the version in generated files. This is useful when you want to be able to compare generated files and not detect version differences.
  • Patch to eliminate extra space after element tag. Thank you Ryan Leslie.

Version 1.18e (9/1/2009)

  • Added patch from Mihai Ibanescu which handles and expands groups. Also added Mihai's unit test for groups. Thank you, Mihai.
  • Added patch, also from Mihai, that passes the node's text to the super-class constructor.
  • Added patch that implements a --no-dates command line flag which, when used, tells generateDS.py not to insert the time-stamp in generated files. This is useful when you want to be able to compare generated files and not detect date/time differences. Thanks again to Mihai.

Version 1.18d (8/26/2009)

  • Automatic detection of the namespace prefix used in the schema document. Thanks to Mihai Ibanescu for this enhancement.
  • Fix to deal conflicts with generateDS's internal function names, for example "build". Thanks again to Mihai.
  • Upgrade to the unit test harness. Replace popen (which is deprecated) with use of the subprocess module. Thank you Mahai.
  • Fix in the class constructors (__init__) to cast XML primitive types (xs:integer, xs:float, etc) to Python built-in types (int, float, etc). Thanks once more to Mahai.
  • Fix to add enumeration value resolution when the possible values are not declared in an explicit definition but in a "top level" type. Also fix a bug with enumeration value population for elements where the unwound element stack contains more than one element. Thanks to Chris Allan for this fix.

Version 1.18c (8/11/2009)

  • Small changes related to check for mixed content.
  • Enhancement to generation of hasContent_() method to check for items with maxOccurs > 1.
  • Fix for generation of test for valueOf_ in hasContent() method.
  • Fix for generation of initializers in ctor -- children were being skipped when the element is mixed.

Version 1.18b (7/29/2009)

  • Fix for exception with simpleType that is an extension of another simpleType.
  • Change to mixed extension chain -- Will now generate class.
  • Fix to generation of constructors -- Will now initialize to default value for simpleTypes.
  • Fixed generations of validator methods, validator bodies, and call to validator bodies for attributes.
  • Command line option "--validator-bodies" now triggers check for option value is an existing directory.
  • Various cleanup, deleting commented-out debug code, etc.
  • Now writing help messages, error messages to stderr instead of to stdout.

Version 1.18a (7/14/2009)

  • Added command line flag --member-specs to generate the member specifications as described in "User Methods" section of the doc. The member specs can be a list or a dictionary.
  • Fix to export indentation. Thanks Tim Marchelli.
  • Added a utility script: generate_coverage.py which generates a dictionary of class names and classes from a (superclass) module generated by generateDS.py.

Version 1.17d (7/2/2009)

  • Fix for generation of recursively defined simpleTypes, e.g. a simpleType defined as a restriction of another simpleType. (see fix_simpletype comment in generateDS.py)
  • Added version number to generated class files.
  • Fixes to/for process_includes.py -- DirPath/DIRPATH now initialized correctly and fixed failure to initialize a local variable.

Version 1.17c (6/24/2009)

  • Fix for error generating code for exporting related to simpleType.
  • Fix for syntax error in export of boolean types.
  • Fix for export of elements with type of attribute defined in-line.
  • Fix to generation of export function when the --silence command line option is used.

Version 1.17b (6/10/2009)

  • Fix so that generateDS.py will still work with Python 2.4. Thanks to Dave Sugar for that.

Version 1.17a (5/20/2009)

  • Modified export of children of type xs:string so that (1) if None, not exported and (2) if not None but an empty string, exported (example "<aa></aa>").
  • Generated calls to format_string(), format_integer(), etc in the generated export methods. Enables the user to override these methods to customize format of exported values. See the "Overridable methods" section in the doc (generateDS.html) for more info and for an explanation of how to override these methods. Currently used to give the user control over formatting of values during export.
  • Fixes to generated build and export methods so that elements defined as xs:simpleType as handled as the specificsimpleType xs:restriction base, for example xs:string, xs:integer, etc.

Version 1.16e (4/28/2009)

  • Eliminated generation of SAX parser. I'm sure it no longer worked, anyway.
  • Fix to export of CDATA characters, provided by Kerim Mansour. Thanks.
  • Added support for command line option --external-encoding. Exported character data is now encoded to sys.getdefaultencoding() or to the encoding specified by command line option --external-encoding.
  • Added attributes (in addition to children) to the list of data type specifications in _MemberSpec/_member_data_items. This fix was provided by Ryan.
  • Several fixes suggested by Kerim Mansour including one related to export of CDATA. Thank you Kerim.
  • Removed generation of SAX parser. It did not work any more anyway.

Version 1.16d (3/25/2009)

  • Fixes to generation of the exportLiteral functions. We can now do exportLiteral, then import the resulting file in Python. See generated parseLiteral() for an example.

  • Added an additional parameter to the export() methods. Now, you can call export() as follows:

    rootObj.export(outfile, 0,


    which will insert the namespace prefix definition in the exported root element.

  • Added new command line option --namespacedef= to specify the namespacedef_ to be passed in by the generated parse() and parseString() functions. Example use:

    generateDS.py --namespacedef='xmlns:abc="http://www.abc.com/"'

    -o out.py myschema.xsd

Version 1.16c (3/13/2009)

  • One more fix for abstract types -- When the implementation element/class for an abstract class exports itself, it adds the xsi:type="class_name" attribute.
  • A minor fix to handling namespace prefix and the -a command line option.
  • Additional fixes so that in constructors (__init__), all instance variables are initialized to None.
  • Some fixes to quoting and escaping quotes when exporting attribute values. Thanks to Kerim Mansour for help with this.

Version 1.16b (3/9/2009)

  • Added support for restriction/list, i.e. a list of words separated by whitespace.

Version 1.16a (2/16/2009)

  • Generated export methods now check for empty content and write out <xx ... /> rather than <xx ...></xx> if empty.
  • All generated constructors (__init__()) now initialize instance variables to None.
  • Generated export methods now check for None before attempting to write out attributes and children.
  • More consistent use of direct access to instance variables rather than calling getter methods with a class, that is use of self.xxx rather than self.get_xxx().

Version 1.15d (1/22/2009)

  • Fix to setup.py so that it also installs process_includes.py.
  • Enhancements to process_includes.py so that it can also retrieve included files via ftp and http.
  • Fixes for default values for attributes.
  • The above changes are all from Arne Grimstrup. Thank you Arne.

Version 1.15c (11/26/2008)

  • Added switch (--silence) to cause generateDS.py to generate parsing functions that do not write output to stdout. This fix contributed by Al Niessner.

Version 1.15b (11/24/2008)

  • Added Amnon Janiv's fixes for attribute groups and for logging.

Version 1.15a (11/20/2008)

Version 1.14g (10/17/2008)

  • Fix in generation of exportChildren (omitted "_" in "namespace".
Version 1.14f (10/06/2008)
  • Minor fix related to simple types in generateBuildStandard_1().
Version 1.14e (09/25/2008)
  • Minor fix for generation of syntax error (missing parenthesis).
  • Eliminated generation of specification of superclass (superclass =) for undefined types.
  • Fixed error setting value in SimpleElementDict.
  • Fixed error when getting type for building attributes.
  • Fixed and regularized exception reporting when building float and integer values.
  • Fixed error referring to simple types in build function.
Version 1.14d (08/28/2008)
  • Several fixes related to simple types.
Version 1.14c (08/16/2008)
  • One more namespace patch from Andre Adrian.
  • A fix to generated export methods for valueOf from Oscar (Oeg Bizz).
  • First attempt to fix the name_type problem, specifically an incorrect generation of the element name where it should generate the type name and vice versa.
Version 1.14b (06/17/2008):
  • More namespace patches from Andre Adrian.
  • Changed "lower()" to "str_lower()" in generated code so that we have a less common name in generated code.
Version 1.14a (06/03/2008):
  • In generateBuildFn, the generated code formerly would skip the children of a base class in an extension class if the extension class has children of its own. This patch fixes that problem. (The buildChildren call for the base class is inside a "if hasChildren == 0" block.)
  • The export functions formerly would output the attributes and children of the derived classes before those of the base class, where the XSL spec specifies that the base class elements are earlier than derived elements in a sequence. This patch corrects the generation order.
  • This patch adds proper xs:boolean reading and writing to generateDS. "true" and "false" values in the XML will become True and False in Python, and will be written back out as "true" and "false", respectively.
Version 1.13a (05/26/2008):
  • Added support for generating namespace prefix during export if the XML Schema specifies the targetNamespace. Thanks to Andre Adrian for implementing this feature.
Version 1.12b (05/20/2008):
  • Patches to escape special XML characters (entities) in valueOf and attributes. Thanks to Darius Powell for this fix.
Version 1.12a (05/06/2008):
  • Fix to name used to generate validation method.
  • Embedded process_includes.py functionality into generateDS.py.
Version 1.11d (04/28/2008)
  • Added support for specifying default values in attributes and elements that are primitive types.
Version 1.11c (03/31/2008)
  • A fix in enumeration building code.
Version 1.11b (11/19/2007)
  • Fixed bug that caused an infinite loop when a class has a simple type as a base/super class.
  • Added additional simple types to the list of recognized simple types. For a list of simple types, see: http://www.w3.org/TR/xmlschema-0/#SimpleTypeFacets
  • Added additional Python keywords to list of transformed names. See global variable NameTable.
Version 1.11a (10/11/2007)
Version 1.10a (08/21/2007, again)
  • Added xs:int basic type. Handle same as xs:integer.
  • Generate tests so that for elements declared with minOccurs="0" and maxOccurs="1" and empty value, then export does not generate output.
Version 1.10a (05/11/2007)
  • Added support for user methods. See section "User Methods" in the documentation.
Version 1.9a (03/21/2007, again)
  • Added process_includes.py which can be used as a pre-processor to process include elements and create an XML Schema document containing all included content.
  • Modified generateDS.py so that it will read its input from a pipe when given the command line argument "-" (dash).
Version 1.9a (02/13/2007, again)
  • Changed naming of getter and setter methods. Default is to use get_var() and set_var() instead of getVar() and setVar(). The old behavior is available using the flag --use-old-getter-setter.
Version 1.9a (01/30/2007, again)
  • Fix so that validator methods for simpleType are also generated when the <xs:simpleType> occurs within an <xs:element>.
Version 1.9a (12/04/2006, again)
  • Fixed errors (occuring on import of superclass module) when an element is defined as an extension of an element that is defined as a simpleType restriction on an xs:string.
Version 1.9a (11/27/2006, again)
  • Fix for elements that have attributes and no nested children. Eliminated writing out new line chars in export methods.
Version 1.9a (10/22/2006, again)
  • Fix to capture text content of nodes defined with attributes but with no nested elements into member varialbe valueOf_.
Version 1.9a (10/10/2006)
  • Added minimal support for simpleType.
  • Generate stubs for and calls to validator methods for simpleType.
  • Retrieve bodies for validator methods for simpleTypes from files in a directory specified with the --validator-bodies command line flag.
Version 1.8d (10/4/2006, again)
  • Fixed several errors related to anyAttribute. It was generating bad code if an element was defined with anyAttribute but had no other attributes. And, in the same situation, it was not generating export code properly.
Version 1.8d (7/26/2006, again)
  • Allowed dot/period as special character in element tags/names.
  • Fixed several errors in generation of export and exportLiteral functions. Special names (e.g. 'type', 'class') were not being mapped to special spellings (e.g. 'ttype', 'klass', ).
  • Fixed error in determining ExplicitDefine, which was preventing export of some objects.
Version 1.8d (7/19/2006, again)
  • Added support for empty elements, i.e. elements that have no children and no attributes. Formerly, they were ignored due to a quirk in logic.
Version 1.8d (4/13/2006)
  • Added support for the following simple types: duration, anyURI and unsignedShort. They are coerced to (and treated the same as) xs:string, xs:string, and xs:integer, respectively
Version 1.8c (12/22/2005, again)
  • Fixed use of mapped names in generateExportLiteralFn().
Version 1.8c (12/20/2005, again)
  • Fix to generation of getters and setters for attributes. Formerly generating accessors that handled lists of attribute values.
Version 1.8c (12/15/2005, again)
  • Fix generated code so that it uses documentElement instead of childNodes[0] to get the root element.
Version 1.8c (5/10/2005, again)
  • Patch for <xs:attribute ref="xxxx"/> -- Use the value of ref as the name of the attribute. I'm not sure whether this is correct in all situations.
  • Fix for generation of ctor for mixed type elements. Before this fix, generateDS.py was failing to generate the initializers in the __init__ method signature.
  • Fix for generation of "class" declaration for extension classes whose base class name is qualified with a namespace (e.g., <xs:extension base="iodef:TextAbstractType">). Removed the namespace. This fix also corrected the order of generation of classes so that the base class is now correctly generated before the subclass.
Version 1.8c (4/26/2005)
  • Added support for several simple types: xs:token, xs:short, xs:long, xs:positiveInteger, xs:negativeInteger, xs:long, xs:nonPositiveInteger, xs:nonNegativeInteger, xs:date.
  • Fixed error produced when an element definition inherits from a simple type.
Version 1.8b (2/25/2005)
  • Added support for anyAttribute.
Version 1.8a (2/23/05, again)
  • Fixed incorrect generation of name and type for export functions for root element.
  • Fixed reference to root element type when root element name and type are different.
Version 1.8a (1/13/05, again)
  • Fixed incorrect handling of extension of in-line element definition.
  • Code cleanup in support of the above.
Version 1.8a (12/22/04)
  • Added support for attributeGroup. Enables an XML Schema to define attribute groups and then include them in element/complexType definition.
  • Added support for substitutionGroup. Enables use any of a set of element types as alternatives to another element type. Limitation: Does not work with simple element types.
Version 1.7b (11/15/04)
  • From an XML Schema, it is not possible to determine the outer-most element in instance documents. generateDS.py now generates a parser (parseSelect) that first uses a small SAX parser to determine the outer-most element in the input document, then invokes the normal parser with that element as the root.

Version 1.7a (10/28/04)

  • Thanks very much to Lloyd Kvam for help with these fixes and improvements. His ideas, suggestions, and work have been extremely valuable.
  • Implementd partial support for <xsd:extension base="">. Limitation: extension elements cannot override members defined in a base.
  • Refactored generated methods export and build, so that they can be called by subclasses.
  • The generated method exportLiteral has been left behind during recent work. Brought it up-to-date.
  • For Python, a super-class must be defined before the sub-classes that reference it. Implemented a delaying mechanism that enforces this ordering of generation of classes.
  • Fixed a bug that occurred when an element is defined with maxOccurs given a value other than "1" or "unbounded".

Version 1.6d (10/1/04)

  • Several bug fixes.
  • Added command-line flag --subclass-suffix="X". Changes the suffix appended to the class name in subclass files. Default if omitted is "Sub".
  • Added an underscore to certain local variables to avoid name conflicts.
  • Thanks to Lloyd Kvam for help with this release. Lloyd found and fixed a number of these problems.
  • Added command-line flag "--subclass-suffix", which specifies the suffix to be added to class names in subclass files. Default is "Sub".
  • Added command-line flag "--root-element", which makes a specified element name the assumed root of instance documents
  • In some schemas, attributes on a nested <complexType> pertain to the containing <element>. Added code to copy the attributes from the <complexType> to the <element>, if it is nested.

Version 1.6c (9/15/04)

  • generateDS.py was not walking lower levels of the tree data structure collected by the SAX parser that describes the classes to be generated. Now, function generate() calls function generateFromTree() to recursively walk lower levels of this tree structure.
  • Fixed various errors that were introduced or uncovered by the above change.
  • Strengthen handling of mixed content. When an element definition (<element> or <complexType>) contains the attribute "mixed=" with a true value, then we generate the code for text content, e.g. getValue(), setValue(), capture value in build(), etc.

Version 1.6b (9/10/04, yet again)

  • Still fixing bug related to generating all the sub-class stubs. All sub-classes were not being generated when no superclasses were generated (-o flag omitted), because there are data structures that are created when superclasses are generated and which are needed during sub-class generation. Now we always write out super-classes, but write them to a temp file if they are not requested.

Version 1.6b (8/26/04, again)

  • Fixed bug -- complexTypes defined in-line were omitted from the sub-class file. Now these sub-classes are being generated.

Version 1.6b (8/18/04)

  • Added ability to access the text content of elements that are defined but have no nested elements. The member variable is "valueOf_" (note underscore which will hopefully avoid name conflicts) and the getter and setter methods are "getValueOf_" and "setValueOf_".
  • Fixes to generation of exportLiteral methods. Formerly, export of attributes was omitted.
  • Removed un-used function that contained "yield" statement, which caused problems with older versions of Python.
Version 1.6a (7/23/04, again)
  • Added optional generation of new style classes with properties. This is experimental and, admittedly, not very useful, as the property functions are simple getters and setters. Maybe someday ... Use the "-m" flag to see the resulting code.
Version 1.6a (7/9/04, again)
  • Minor fixes. Replaced dashes in names used as attributes (see cleanupName().
Version 1.6a (7/6/04, again)
  • For XMLBehaviors, implemented ability to retrieve implementation bodies for behaviors and for ancillaries (pre-conditions and post-conditions) from a Web address (URL).
Version 1.6a (6/30/04)
  • Added generation of behaviors. An XML document can be used to specify behaviors (methods) to be added to generated sub-class files, including DBC (design by contract) pre- and post-condition tests. See generateDS.html for more information on XMLBehaviors.
Version 1.5b (6/20/04, again)
  • Fixed handling namespace prefix in the XMLSchema file itself. generateDS.py now attempts to pick-up the namespace prefix (alias) from the "xmlns:" attribute on the "schema" element.
Version 1.5b (5/7/04)
  • Fixed several minor problems related to XML namespaces. Namespace prefix ignored when creating Python names (e.g. of classes and namespace prefix ignored during parsing. That's about the best I know to do right now.
  • Fixed problems in generating code for names containing dashes. Now using underscore in place of dashes for Python names.
Version 1.5a (3/24/04)
  • Added keyword arguments to the generated factory functions.
  • Added generation of method "exportLiteral" and related support to export elements/instances to Python data structure literals.
Version 1.4c (3/10/04)
  • Element <complexType> in XSchema file not handled correctly. Fixed this so that when <complexType> is at top level, it will be handled the same way that an <element> is handled. Note: We still have problems with <complexType> elements that are more deeply nested.
Version 1.4c (3/8/04)
  • Added ability to pass namespace abbreviation from the command line. For example, the "-a" option enables you to replace "xs:" with "xsd:".
Version 1.4b (9/30/03, again)
  • Removed dependence on PyXML. Will now import XML support from PyXML, if it is available, but if not, imports from the Python standard library.
Version 1.4b (9/30/03)
  • Fixed name conflict in factory function (added underscore).
  • Added generation of saxParseString function (parse string, not file/URL).
  • Fixed error -- ome constructors not using factory.
Version 1.4a (9/17/03)
  • Added generation of a SAX parser.
Version 1.3c (9/11/03)
  • Fixed problem caused by shared content model, i.e. when a field (content) is declared with a complex type and the name and the type are different. The fix enabled the field name and the type of the object in that field to be different.
Version 1.3b (9/9/03)
  • Fixed error when a separate xs:element declaration is used for elements declared with a simple type.
Version 1.3a (8/18/03)
  • Removed YAML support.
  • Fixed error in name generation in generateBuildFn().
  • Various fixes and cleanup in tests/ and Demo/.
Version 1.2a (again, 5/16/03)
  • Fixed error in code generation for boolean attributes.

  • Fixed error in code generation for float values.

  • Added very simple unit tests in tests directory. Can be run with:

    cd tests python test.py

Version 1.2a (3/14/03)
  • Added support for XML Schema xs:double and xs:boolean types.
Version 1.1a (8/13/02)
  • Added ability to generate subclass stubs for user method implementation.
  • A bit of clean-up to the command line options.
Version 1.0a (3/15/02)
  • Initial release

To do

The following enhancements and fixes remain to be done:

  • The <sequence> element can have "minOccurs" and "maxOccurs" attributes. I'm guessing, but am not sure that this specifies repeated groups. For example, the following:

    <xs:sequence minOccurs="0" maxOccurs="unbounded">

    <xs:element name="description" type="xs:string"/> <xs:element name="size" type="xs:integer"/>


    specifies that we can have any number of pairs of elements "description" and "size". A future enhancement to generateDS.py would enable us to specify and enforce this restriction.

  • And so many more complexities in the XSchema specifications.

Dave Kuhlman dkuhlman@davekuhlman.org http://www.davekuhlman.org