CKAN Solr Support
CKAN up to version 188.8.131.52 uses Postgres-based full-text search of datasets. This extension adds support for using Apache Solr to do the dataset search instead.
NB: From CKAN 1.5 onwards, this extension should not be used, since the code has been merged into CKAN itself and Solr is used by default.
The extension can be used in two distinct modes: as an in-process indexer or as a queue worker. In the first mode, changes will be sent to the Solr index using an ORM hook. While this guarantees immediate updates, it means multiple HTTP requests need to be dispatched for each write operation. In the latter mode, the indexer will be attached to an AMQP message queue and consume updates from this source. This means processing does not occur within the CKAN request threads, but there is a higher risk of having a stale index (as a result of worker downtime).
Setting up the basic module
In either mode of operation, ckanext-solr will need to be installed on the CKAN server (even in queue mode, queries are still dispatched directly). To configure CKAN to use ckanext-solr, first install the extension::
$ pip install -e hg+https://bitbucket.org/okfn/ckanext-solr#egg=ckanext-solr
This will download the code and register all entry points. You can then add the following configuration options in your CKAN site configuration file::
search_backend = solr solr_url = http://your-solr-host:8983/solr
This will direct CKAN to use the given Solr server for any search queries.
You will also need to install and configure Solr somewhere. Locate
schema.xml file in this package and configure Solr to use that
as its schema. On Ubuntu, for example, this is a matter of
install solr-common, followed by copying
/etc/solr/config, and restarting Tomcat.
Using in-process indexing
To configure in-process indexing, make sure the following line is present in your site configuration::
ckan.plugins = synchronous_search
This setting likely already exists (it is also needed for Postgres indexing) and your configuration may also list other plugins.
Using queue-based indexing
To enable queue-based indexing, first set up ckanext-queue completely and verify
its functionality using the
echo worker. You must also make sure that the
ckan.site_id directive is set to the same value both in your CKAN
configuration and in the worker configuration file. You must also replicate the
solr_url setting in the worker config.
To launch the indexing process, run the worker command:
$ worker -c worker.cfg solr
You can also run many workers at once, by listing each of their names.
HTTP authentication support
Apache Solr does not support any means of authentication, but it can be used behind
a reverse proxy if connections must be made across the public internet. In such
cases, HTTP authentication may be used to restrict (write) operations to the index.
To pass required authentication info to the extension, specify
solr_password in your configuration files.
From the ckanext-solr directory, run