Wiki

Clone wiki

BibSonomy / development / modules / webapp / Structured Data Markup

Structured Data Markup

Structured Data Markup is used to mark up content on a page with metadata (e.g. mark up titles, authors ratings). Search engines parse such markup and use it to highligt such data in the snippet. The markup is invisible on the website.

We use schema.org for markup.

Implementation

  • The markup is hierarchicalhierarchisch, e.g., an instance of Book has fields "author" or "reviews" which are again instances of Person or Review and have their own fields. It is not always trivial to incorporate these hierarchies in our jsp und tagx files. While they too are hierarchical, the problem is that tagx tags are used in several jspx pages and sometimes need markup and sometimes don't.
  • The markup is part of html-tags. Thus, all fields of an entity have to be declared within the html-subtree of the entities tag. This is sometimes inconvenient, e.g., the property itemReviewed of Review would have to be part of the tag defining the review. However, usually rather the review is part of the item that is reviewed.
  • Sometimes it is neccessary to use additional <span> tags to hold markup properties.
  • A tagx file can describe fields of an object without the surrounding object itself. If a tagx file contains a property (itemprop), this property is anywhere, the tagx is included. Thus using a tagx one has to ensure, that the correct fields are assigned to a fitting (surrounding) object. If the tagx is used on a page without markup, the markup in the tagx must be switched of using a jsp:directive.attribute.
  • The schema suffers from the usual ontology limitations: Not everything has a fitting class and similar classes can have different attributes like [http://schema.org/UserComments UserComments] has a "creator", no "author", [http://schema.org/Review Review] hasn an "author", no creator. Use conditionals to produce the fitting markup.
  • Searchengines are not bound to react to the markup. Therefore, it is difficult to test that it works.
  • The markup properties should never be used for logic.
  • While usually html attributes can be in arbitrary order, this is not the case with markup. This flaw is at least present in the Google Rich Snippet Tool (Last tested 2012).
  • Sometimes it is convenient to give some information to the search engine but not to explicitly display it on the website (e.g., the worst possible rating must not be displayed next to each rating. For the class AggregateRating "worstRating" is a property. It is generally possible to create and markup invisible content using meta tags. This is however discouraged by Google and not always picked up.

Testing

  • The Google Testing Tool shows all recognized markup markup and also a preview of the snippet as it might be displayed in a search hit. The tool suffers (last tested 2012) from the flaw, that
    • the attribute itemscope must come before itemtype,
    • in an instance-field relation, the itemprob of the instance must come before the itemscope of the field.
  • A major problem is, that we have no control of the order in that html attributes are rendered.
  • Follow up on the status of the testing tool at this bug report
  • To test the markup on a page that page has to be publicly accessible for the testing tool. One way to acomplish that is to copy the content into some accessible location. Then the order of the attributes has to be restored. This perl script produces the correct order an html to be used in Zope:
#!/usr/bin/perl

my $original = shift @ARGV; # contains original File
my $ordered = shift @ARGV; # will contain new file with ordered attributes
open ORIGINAL, "<$original";
open ORDERED, ">$ordered";

# skip Doctype
my $line = <ORIGINAL>;
print ORDERED "<!DOCTYPE html SYSTEM \"about:legacy-compat\">\n<html xmlns=\"http:\/\/www\.w3\.org\/1999\/xhtml\">";

while ($line = <ORIGINAL> ) {
    $line =~ s/<html>//;
    $line =~ s/(itemtype=\"[^>\"]*\")([^>]*)(itemscope=\"itemscope\")/$3$2$1/g;
    $line =~ s/(itemtype=\"[^>\"]*\")([^>]*)(itemprop=\"[^>\"]*\")/$3$2$1/g;
    $line =~ s/(itemscope=\"itemscope\")([^>]*)(itemprop=\"[^>\"]*\")/$3$2$1/g;

    print ORDERED $line;
}

close ORIGINAL;
close ORDERED;

TODO

Continue the integration of markup on all pages including

  • Resource pages
  • Post pages
  • Post lists

Possible entities include

Updated