CKAN Solr Support

CKAN up to version uses Postgres-based full-text search of datasets. This extension adds support for using Apache Solr to do the dataset search instead.

NB: From CKAN 1.5 onwards, this extension should not be used, since the code has been merged into CKAN itself and Solr is used by default.

The extension can be used in two distinct modes: as an in-process indexer or as a queue worker. In the first mode, changes will be sent to the Solr index using an ORM hook. While this guarantees immediate updates, it means multiple HTTP requests need to be dispatched for each write operation. In the latter mode, the indexer will be attached to an AMQP message queue and consume updates from this source. This means processing does not occur within the CKAN request threads, but there is a higher risk of having a stale index (as a result of worker downtime).

Setting up the basic module

In either mode of operation, ckanext-solr will need to be installed on the CKAN server (even in queue mode, queries are still dispatched directly). To configure CKAN to use ckanext-solr, first install the extension::

$ pip install -e hg+

This will download the code and register all entry points. You can then add the following configuration options in your CKAN site configuration file::

search_backend = solr
solr_url = http://your-solr-host:8983/solr

This will direct CKAN to use the given Solr server for any search queries.

You will also need to install and configure Solr somewhere. Locate the schema.xml file in this package and configure Solr to use that as its schema. On Ubuntu, for example, this is a matter of apt-get install solr-common, followed by copying schema.xml to /etc/solr/config, and restarting Tomcat.

Using in-process indexing

To configure in-process indexing, make sure the following line is present in your site configuration::

ckan.plugins = synchronous_search

This setting likely already exists (it is also needed for Postgres indexing) and your configuration may also list other plugins.

Using queue-based indexing

To enable queue-based indexing, first set up ckanext-queue completely and verify its functionality using the echo worker. You must also make sure that the ckan.site_id directive is set to the same value both in your CKAN configuration and in the worker configuration file. You must also replicate the solr_url setting in the worker config.

To launch the indexing process, run the worker command:

$ worker -c worker.cfg solr

You can also run many workers at once, by listing each of their names.

HTTP authentication support

Apache Solr does not support any means of authentication, but it can be used behind a reverse proxy if connections must be made across the public internet. In such cases, HTTP authentication may be used to restrict (write) operations to the index. To pass required authentication info to the extension, specify solr_user and solr_password in your configuration files.


From the ckanext-solr directory, run

nosetests --ckan