Commits

Toby Inkster committed c2daa85

Release XRD::Parser 0.01; other mini-changes.

Comments (0)

Files changed (5)

 requires            'LWP::UserAgent' => 0;
 requires            'XML::LibXML'    => 0;
 requires            'RDF::Trine'     => 0;
+requires            'LWP::Simple'    => 0;
+requires            'URI::Escape'    => 0;
+requires            'URI'            => 0;
 
 auto_install;
 
-RDF-XRD-Parser version 0.01
-===========================
+NAME
+    XRD::Parser - Parse XRD files into RDF::Trine models
 
-The README is used to introduce the module and provide instructions on
-how to install the module, any machine dependencies it may have (for
-example C compilers and installed libraries) and any other information
-that should be provided before the module is installed.
+VERSION
+    0.01
 
-A README file is required for CPAN modules since CPAN extracts the
-README file from a module distribution so that people browsing the
-archive can use it get an idea of the modules uses. It is usually a
-good idea to provide version information here so that people can
-decide whether fixes for the module are worth downloading.
+SYNOPSIS
+      use RDF::Query;
+      use XRD::Parser;
+  
+      my $parser = XRD::Parser->new(undef, "http://example.com/foo.xrd");
+      $parser->consume;
+  
+      my $results = RDF::Query->new(
+        "SELECT * WHERE {?who <http://spec.example.net/auth/1.0> ?auth.}")
+        ->execute($parser->graph);
+        
+      while (my $result = $results->next)
+      {
+        print $result->{'auth'}->uri . "\n";
+      }
 
-INSTALLATION
+DESCRIPTION
+    While XRD has a rather different history, it turns out it can mostly be
+    thought of as a serialisation format for a limited subset of RDF.
 
-To install this module type the following:
+    This package ignores the order of <Link> elements, as RDF is a graph
+    format with no concept of statements coming in an "order". The XRD spec
+    says that grokking the order of <Link> elements is only a SHOULD. That
+    said, if you're concerned about the order of <Link> elements, the callback
+    routines allowed by this package may be of use.
 
-   perl Makefile.PL
-   make
-   make test
-   make install
+    This package aims to be roughly compatible with RDF::RDFa::Parser's
+    interface.
 
-DEPENDENCIES
+    $p = XRD::Parser->new($content, $uri, \%options, $store);
+            This method creates a new XRD::Parser object and returns it.
 
-This module requires these other modules and libraries:
+            The $content variable may contain an XML string, or a
+            XML::LibXML::Document. If a string, the document is parsed using
+            XML::LibXML::Parser, which may throw an exception. XRD::Parser
+            does not catch the exception.
 
-  blah blah blah
+            $uri the supposed URI of the content; it is used to resolve any
+            relative URIs found in the XRD document. Also, if $content is
+            empty, then XRD::Parser will attempt to retrieve $uri using
+            LWP::Simple.
 
-COPYRIGHT AND LICENCE
+            Options [default in brackets]:
 
-Put the correct copyright and licence information here.
+              * tdb_service     - thing-described-by.org when possible. [0]
 
-Copyright (C) 2009 by Toby Inkster
+            $storage is an RDF::Trine::Storage object. If undef, then a new
+            temporary store is created.
 
-This library is free software; you can redistribute it and/or modify
-it under the same terms as Perl itself, either Perl version 5.10.1 or,
-at your option, any later version of Perl 5 you may have available.
+    $p->uri Returns the base URI of the document being parsed. This will
+            usually be the same as the base URI provided to the constructor.
 
+            Optionally it may be passed a parameter - an absolute or relative
+            URI - in which case it returns the same URI which it was passed as
+            a parameter, but as an absolute URI, resolved relative to the
+            document's base URI.
 
+            This seems like two unrelated functions, but if you consider the
+            consequence of passing a relative URI consisting of a zero-length
+            string, it in fact makes sense.
+
+    $p->dom Returns the parsed XML::LibXML::Document.
+
+    $p->set_callbacks(\&func1, \&func2)
+            Set callbacks for handling RDF triples extracted from the
+            document. The first function is called when a triple is generated
+            taking the form of (*resource*, *resource*, *resource*). The
+            second function is called when a triple is generated taking the
+            form of (*resource*, *resource*, *literal*).
+
+            The parameters passed to the first callback function are:
+
+            *   A reference to the "XRD::Parser" object
+
+            *   A reference to the "XML::LibXML element" being parsed
+
+            *   Subject URI or bnode
+
+            *   Predicate URI
+
+            *   Object URI or bnode
+
+            The parameters passed to the second callback function are:
+
+            *   A reference to the "XRD::Parser" object
+
+            *   A reference to the "XML::LibXML element" being parsed
+
+            *   Subject URI or bnode
+
+            *   Predicate URI
+
+            *   Object literal
+
+            *   Datatype URI (possibly undef or '')
+
+            *   Language (possibly undef or '')
+
+            In place of either or both functions you can use the string
+            'print' which sets the callback to a built-in function which
+            prints the triples to STDOUT as Turtle. Either or both can be set
+            to undef, in which case, no callback is called when a triple is
+            found.
+
+            Beware that for literal callbacks, sometimes both a datatype *and*
+            a language will be passed. (This goes beyond the normal RDF data
+            model.)
+
+            "set_callbacks" (if used) must be used *before* "consume".
+
+    $p->consume;
+            This method processes the input DOM and sends the resulting
+            triples to the callback functions (if any).
+
+    $p->graph()
+            This method will return an RDF::Trine::Model object with all
+            statements of the full graph.
+
+            It makes sense to call "consume" before calling "graph". Otherwise
+            you'll just get an empty graph.
+
+SEE ALSO
+    RDF::Trine, RDF::Query, RDF::RDFa::Parser.
+
+    <http://www.perlrdf.org/>.
+
+AUTHOR
+    Toby Inkster, <tobyink@cpan.org>
+
+COPYRIGHT AND LICENSE
+    Copyright (C) 2009 by Toby Inkster
+
+    This library is free software; you can redistribute it and/or modify it
+    under the same terms as Perl itself, either Perl version 5.8.1 or, at your
+    option, any later version of Perl 5 you may have available.
+

XRD-Parser-0.01.tar.gz

Binary file added.
+use lib "lib";
+use XRD::Parser;
+
+my $xrd = <<XRD;
+<XRD xmlns="http://docs.oasis-open.org/ns/xri/xrd-1.0" xml:id="foo">
+  <Expires>1970-01-01T00:00:00Z</Expires>
+  <Property type="http://spec.example.net/type/person" />
+  <Link rel="http://spec.example.net/auth/1.0"
+    href="http://services.example.com/auth" />
+  <Link rel="http://spec.example.net/photo/1.0" type="image/jpeg"
+    href="http://photos.example.com/gpburdell.jpg">
+    <Title xml:lang="en">User Photo</Title>
+	 <Property type="http://something.com/">Comment</Property>
+  </Link>
+</XRD>
+XRD
+
+my $parser = XRD::Parser->new($xrd, "http://example.org/", {'tdb_service'=>1});
+$parser->consume;
+my $iter = $parser->graph->as_stream;
+
+while (my $st = $iter->next)
+{
+	print $st->as_string . "\n";
+}
+
+#  <Subject>http://example.com/gpburdell</Subject>

lib/XRD/Parser.pm

   $parser->consume;
   
   my $results = RDF::Query->new(
-	"SELECT ?auth WHERE { ?person <http://spec.example.net/auth/1.0> ?auth. }")
-	->execute($parser->graph);
+    "SELECT * WHERE {?who <http://spec.example.net/auth/1.0> ?auth.}")
+    ->execute($parser->graph);
 	
   while (my $result = $results->next)
   {
 
 =item $p = XRD::Parser->new($content, $uri, \%options, $store);
 
-  * tdb_service
+This method creates a new XRD::Parser object and returns it.
+
+The $content variable may contain an XML string, or a XML::LibXML::Document.
+If a string, the document is parsed using XML::LibXML::Parser, which may throw an
+exception. XRD::Parser does not catch the exception.
+
+$uri the supposed URI of the content; it is used to resolve any relative URIs found
+in the XRD document. Also, if $content is empty, then XRD::Parser will attempt
+to retrieve $uri using LWP::Simple.
+
+Options [default in brackets]:
+
+  * tdb_service     - thing-described-by.org when possible. [0] 
+
+$storage is an RDF::Trine::Storage object. If undef, then a new
+temporary store is created.
 
 =cut
 
 =item $p->uri
 
 Returns the base URI of the document being parsed. This will usually be the
-same as the base URI provided to the constructor, but may differ if the
-document contains a <base> HTML element.
+same as the base URI provided to the constructor.
 
 Optionally it may be passed a parameter - an absolute or relative URI - in
 which case it returns the same URI which it was passed as a parameter, but
 	return $this->{DOM};
 }
 
+=item $p->set_callbacks(\&func1, \&func2)
+
+Set callbacks for handling RDF triples extracted from the document. The
+first function is called when a triple is generated taking the form of
+(I<resource>, I<resource>, I<resource>). The second function is called when a
+triple is generated taking the form of (I<resource>, I<resource>, I<literal>).
+
+The parameters passed to the first callback function are:
+
+=over 4
+
+=item * A reference to the C<XRD::Parser> object
+
+=item * A reference to the C<XML::LibXML element> being parsed
+
+=item * Subject URI or bnode
+
+=item * Predicate URI
+
+=item * Object URI or bnode
+
+=back
+
+The parameters passed to the second callback function are:
+
+=over 4
+
+=item * A reference to the C<XRD::Parser> object
+
+=item * A reference to the C<XML::LibXML element> being parsed
+
+=item * Subject URI or bnode
+
+=item * Predicate URI
+
+=item * Object literal
+
+=item * Datatype URI (possibly undef or '')
+
+=item * Language (possibly undef or '')
+
+=back
+
+In place of either or both functions you can use the string C<'print'> which
+sets the callback to a built-in function which prints the triples to STDOUT
+as Turtle. Either or both can be set to undef, in which case, no callback
+is called when a triple is found.
+
+Beware that for literal callbacks, sometimes both a datatype *and* a language
+will be passed. (This goes beyond the normal RDF data model.)
+
+C<set_callbacks> (if used) must be used I<before> C<consume>.
+
+=cut
+
+sub set_callbacks
+# Set callback functions for handling RDF triples.
+{
+	my $this = shift;
+
+	for (my $n=0 ; $n<2 ; $n++)
+	{
+		if (lc($_[$n]) eq 'print')
+			{ $this->{'sub'}->[$n] = ($n==0 ? \&_print0 : \&_print1); }
+		elsif ('CODE' eq ref $_[$n])
+			{ $this->{'sub'}->[$n] = $_[$n]; }
+		else
+			{ $this->{'sub'}->[$n] = undef; }
+	}
+}
+
+sub _print0
+# Prints a Turtle triple.
+{
+	my $this    = shift;
+	my $element = shift;
+	my $subject = shift;
+	my $pred    = shift;
+	my $object  = shift;
+	my $graph   = shift;
+	
+	if ($graph)
+	{
+		print "# GRAPH $graph\n";
+	}
+	if ($element)
+	{
+		printf("# Triple on element %s.\n", $element->nodePath);
+	}
+	else
+	{
+		printf("# Triple.\n");
+	}
+
+	printf("%s %s %s .\n",
+		($subject =~ /^_:/ ? $subject : "<$subject>"),
+		"<$pred>",
+		($object =~ /^_:/ ? $object : "<$object>"));
+	
+	return undef;
+}
+
+sub _print1
+# Prints a Turtle triple.
+{
+	my $this    = shift;
+	my $element = shift;
+	my $subject = shift;
+	my $pred    = shift;
+	my $object  = shift;
+	my $dt      = shift;
+	my $lang    = shift;
+	my $graph   = shift;
+	
+	# Clumsy, but probably works.
+	$object =~ s/\\/\\\\/g;
+	$object =~ s/\n/\\n/g;
+	$object =~ s/\r/\\r/g;
+	$object =~ s/\t/\\t/g;
+	$object =~ s/\"/\\\"/g;
+	
+	if ($graph)
+	{
+		print "# GRAPH $graph\n";
+	}
+	if ($element)
+	{
+		printf("# Triple on element %s.\n", $element->nodePath);
+	}
+	else
+	{
+		printf("# Triple.\n");
+	}
+
+	no warnings;
+	printf("%s %s %s%s%s .\n",
+		($subject =~ /^_:/ ? $subject : "<$subject>"),
+		"<$pred>",
+		"\"$object\"",
+		(length $dt ? "^^<$dt>" : ''),
+		((length $lang && !length $dt) ? "\@$lang" : '')
+		);
+	use warnings;
+	
+	return undef;
+}
+
 =item $p->consume;
 
+This method processes the input DOM and sends the resulting triples to 
+the callback functions (if any).
+
 =cut
 
 sub consume
 Copyright (C) 2009 by Toby Inkster
 
 This library is free software; you can redistribute it and/or modify
-it under the same terms as Perl itself, either Perl version 5.10.1 or,
+it under the same terms as Perl itself, either Perl version 5.8.1 or,
 at your option, any later version of Perl 5 you may have available.