Wiki

Clone wiki

lmf / Module-Stanbol

Documentation for the LMF Stanbol Integration

Introduction

The LMF Stanbol module offers integration with Apache Stanbol, a service providing fast entity lookup and textual enhancement. Apache Stanbol together with the LMF offers the following functionalities: - data reconciliation (in combination with [[Google Refine Extension|LMF Refine Integration]]): you have string data and you want to find matching resources from the Linked Data cloud for interlinking; this helps you in the task of publishing legacy data as Linked Data, as well as in enriching existing data with additional information from the Linked Data Cloud - content enhancement: you have text content and you want to extract relevant concepts (e.g. locations, persons, organisations) from it; this helps you in getting more structure out of your content, e.g. for improved semantic search or data analysis capabilities - local Linked Data cache: instead of retrieving Linked Data resources from sometimes unreliable remote servers in the Linked Data Cloud, Stanbol can be used as a local cache for data from the sites it indexes; in many cases this allows a much faster and more reliable Linked Data integration

The LMF Stanbol integration adds a number of functionalities to interact with Apache Stanbol (local or remote installation). The functionalities are described in the following.

LMF Stanbol Configuration

The Stanbol module simplifies configuring Apache Stanbol for many common tasks. For more advanced configurations, you can always access the Stanbol System Console.

Stanbol Connection

In order to communicate with Stanbol, the LMF must be configured with some parameters telling it where Stanbol is located and what functionalities to offer. In the standalone installer, the configuration is already set to reasonable values. In other settings, consider particularly the following options: - stanbol.server: needs to be set to the host where Stanbol is running, e.g. http://localhost:8080 (required) - stanbol.context: path under the server root where the Stanbol web application is running, e.g. /stanbol (optional, if left empty, Stanbol is assumed to run under the root application) - stanbol.path: filesystem path where Stanbol stores its data; needed for some auto-configuration options, e.g. when installing Referenced Sites (optional, but required for referenced site install)

To change these options, go to the Stanbol module in the LMF Admin interface, select the "configuration" submenu, update the settings and click on safe.

Referenced Site Configuration

If the LMF has full access to Stanbol (local only), it can simplify the installation of so-called "Referenced Sites". A referenced site is a pre-configured index over a commonly-used Linked Data site (e.g. DBPedia) and can be used for entity lookup and content enhancement.

LMF offers automatic installers for many existing Stanbol Referenced Sites. To install a referenced site, go to the LMF Admin Interface for Stanbol and select "sites". The page will offer you a list of already installed sites, as well as a list of available installers.

Clicking on the "Install" button of an installer will start downloading the referenced site index and OSGi bundle and automatically deploy it in the connected Stanbol server. Note that installation can take some time, as some indexes are several gigabytes in size and need to be downloaded over the Internet.

Enhancer Configuration

The LMF also offers installers for typical enhancement configurations. An enhancement configuration specifies how textual content is analysed by Stanbol and which datasets are used for entity linking.

To install one of the existing installer configurations, go the the LMF Admin Interface for Stanbol and select "enhancers". The page will offer you a list of already installed enhancers, as well as a list of available enhancers, and a simple interface for experimenting with the text analysis using different enhancers.

Clicking on the "Install" button of an installer will add the selected configuration to the Apache Stanbol instance. Note that some configurations depend on the availability of certain referenced sites. Read the description of the enhancer configuration to see its dependencies.

Publishing LMF Data to Stanbol

In case you want to make use of the content stored in the Linked Media Framework for enhancement (e.g. a custom SKOS thesaurus), you need to manually publish the LMF data to Apache Stanbol.

To simplify the task of publishing, the LMF offers a simple interface accessible in the LMF Admin Interface for Stanbol in the submenu "publish". In the interface, you can select a named graph from the LMF triple store and publish all its data to Stanbol. The data is then immediately available for enhancement.

Note that depending on the vocabularies you are using in your data you might need different kinds of enhancement configurations (see above). The LMF already offers installable configurations for SKOS (skos:prefLabel and skos:altLabel), RDFS (rdfs:label), and Dublin Core (dc:title). If you are using a different vocabulary, either add additional labels using one of the existing configurations (e.g. rdfs:label), or manually configure an enhancement configuration in the Stanbol System Console.

Using Stanbol Enhancement in Semantic Search

The LMF Stanbol module offers content enhancement functionality to all modules making use of the LDPath language via the fn:stanbol(...) LDPath function. This is particularly useful for entity extraction during [[Module Semantic Search|Semantic Search]] indexing.

The function offers the following signature: - fn:stanbol(PATH_SELECTOR, CHAIN_NAME): apply content enhancement on the selected literals using the chain given as argument; the chain must be available in the installed chains (see the list of enhancement chains in the admin interface under "enhancers") - fn:stanbol(PATH_SELECTOR): apply content enhancement on the selected literals using the default chain - fn:stanbol(): apply content enhancement on the current context node using the default chain

For example, the following LDPath expression could be used to detect persons in a dc:description using DBPedia for interlinking:

@prefix dbo : <http://dbpedia.org/ontology/>;

persons = fn:stanbol(dc:description, "DPBediaEnhancer") [is dbo:Person](rdf:type) / rdfs:label[@en] :: xsd:string;

Using Stanbol as Linked Data Cache

Referenced Sites configured in Stanbol can also be used as Linked Data cache endpoint inside the LMF. This can dramatically improve the performance of tasks that require lookup of Linked Data resources.

To enable Stanbol Linked Data caching, go to the LMF Admin Interface for Stanbol, select "configuration", set the option "stanbol.linkeddata_cache" to "true", save the configuration, and restart the server.

Note that not all Stanbol Referenced Sites contain the complete data from the Linked Data Sites they index. Most notably, the DBPedia Referenced Site contains only English language labels, so cannot be used as a full Linked Data Cache.

Using Stanbol Reconciliation in Refine

The Stanbol Referenced Sites can also be used as reconciliation endpoint in Google Refine. In the customized LMF version of Google Refine, Apache Stanbol is already preconfigured. Please consult the [[Google Refine Users Documentation|LMF Refine]] documentation for details.

Updated