Source

perl-begin / src / uses / xml / index.html.wml

Full commit
#include '../template.wml'

<latemp_subject "Perl for XML Processing" />

<h2 id="technologies">Technologies of Interest</h2>

<h3 id="XML-LibXML"><cpan_self_dist d="XML-LibXML" /></h3>

<p>
XML-LibXML is the de-facto standard for XML processing in Perl. It's a
comprehensive CPAN module based on the
<a href="http://xmlsoft.org/">libxml2</a> library, that provides DOM (Document
Object Module), SAX (a stream parser), a pull parser, XPath, and
<cpan_dist d="XML-LibXSLT">XSLT</cpan_dist>
support. XML-LibXML has good documentation and is actively maintained.
</p>

<p>
One note is that you should be aware of XML namespaces and how they interact
with the DOM and the XML-LibXML API before using this library.
</p>

<h2 id="web-pages">Web Pages about Perl and XML</h2>

<h3 id="perl-xml-project"><a href="http://perl-xml.sourceforge.net/">The Perl XML Project
Home Page</a></h3>

<h4 id="perl-xml-faq"><a href="http://perl-xml.sourceforge.net/faq/">Their Frequently Asked
Questions List (FAQ)</a></h4>

<h2 id="what-to-avoid">What to Avoid</h2>

<h3 id="xml-simple">XML-Simple</h3>

<p>
XML-Simple is not so simple when done properly and takes the wrong approach
to dealing with XML. Please avoid using it. Look at XML-LibXML for an easy
and fast alternative.
</p>

<h3 id="parsing-xml-using-regexes">Parsing XML Using Regular Expressions</h3>

<p>
You should also avoid parsing XML using regular expressions, because it
is difficult to handle the non-regular grammar of XML using them. Use a
parser. For more information see:
</p>

<ol>

<li>
<p>
<a href="http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html">“Parsing HTML the Cthulhu Way”</a>.
</p>
</li>

<li>
<p>
<a href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">Comment on Stack Overflow</a> (funny).
</p>
</li>

</ol>


<h2 id="xml-grammars">Modules for Dealing with Specific Grammars</h2>

<p>
In addition to generic XML parsers and manipulators, there are many
specialised modules on the CPAN for dealing with specific XML grammars. Many
of them reside under XML:: namespace. Some prominent examples include:
</p>

<ul>

<li>
<cpan_b_self_dist d="XML-RSS" /> - manipulate
RSS (Really Simple Syndication) 0.9, 0.91, 1.0 and 2.0.
</li>

<li>
<cpan_b_self_dist d="XML-Atom" /> -
manipulate Atom feeds. (Atom is an alternative syndication format)
</li>

<li>
<cpan_b_self_dist d="XML-Feed" /> -
generate, parse, mix and match web feeds (Atom or RSS).
</li>

<li>
<cpan_b_self_dist d="OpenOffice-OODoc" /> -
manipulate OpenOffice.org-like ODF (OpenDocument format) files.
</li>

</ul>