Bitbucket is a code hosting site with unlimited public and private repositories. We're also free for small teams!

Close

CKAN Solr Support

CKAN up to version 1.4.3.1 uses Postgres-based full-text search of datasets. This extension adds support for using Apache Solr to do the dataset search instead.

NB: From CKAN 1.5 onwards, this extension should not be used, since the code has been merged into CKAN itself and Solr is used by default.

The extension can be used in two distinct modes: as an in-process indexer or as a queue worker. In the first mode, changes will be sent to the Solr index using an ORM hook. While this guarantees immediate updates, it means multiple HTTP requests need to be dispatched for each write operation. In the latter mode, the indexer will be attached to an AMQP message queue and consume updates from this source. This means processing does not occur within the CKAN request threads, but there is a higher risk of having a stale index (as a result of worker downtime).

Setting up the basic module

In either mode of operation, ckanext-solr will need to be installed on the CKAN server (even in queue mode, queries are still dispatched directly). To configure CKAN to use ckanext-solr, first install the extension::

$ pip install -e hg+https://bitbucket.org/okfn/ckanext-solr#egg=ckanext-solr

This will download the code and register all entry points. You can then add the following configuration options in your CKAN site configuration file::

search_backend = solr
solr_url = http://your-solr-host:8983/solr

This will direct CKAN to use the given Solr server for any search queries.

You will also need to install and configure Solr somewhere. Locate the schema.xml file in this package and configure Solr to use that as its schema. On Ubuntu, for example, this is a matter of apt-get install solr-common, followed by copying schema.xml to /etc/solr/config, and restarting Tomcat.

Using in-process indexing

To configure in-process indexing, make sure the following line is present in your site configuration::

ckan.plugins = synchronous_search

This setting likely already exists (it is also needed for Postgres indexing) and your configuration may also list other plugins.

Using queue-based indexing

To enable queue-based indexing, first set up ckanext-queue completely and verify its functionality using the echo worker. You must also make sure that the ckan.site_id directive is set to the same value both in your CKAN configuration and in the worker configuration file. You must also replicate the solr_url setting in the worker config.

To launch the indexing process, run the worker command:

$ worker -c worker.cfg solr

You can also run many workers at once, by listing each of their names.

HTTP authentication support

Apache Solr does not support any means of authentication, but it can be used behind a reverse proxy if connections must be made across the public internet. In such cases, HTTP authentication may be used to restrict (write) operations to the index. To pass required authentication info to the extension, specify solr_user and solr_password in your configuration files.

Testing

From the ckanext-solr directory, run

nosetests --ckan

Recent activity

John Glover

Commits by John Glover were pushed to okfn/ckanext-solr

4ff6fec - Add teardown_class method to synchronous search test
John Glover

Commits by John Glover were pushed to okfn/ckanext-solr

64bcc4f - Use Postgres for tests instead of sqllite, bug with sqllite when adding tables (try running test_package_search_synchronous_update.py with sqlite for example)
John Glover

Commits by John Glover were pushed to okfn/ckanext-solr

69bd92d - Add note explaining why one of the geographic coverage tests is failing
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.