HTTPS SSH

Peopleindex

What it does?

  • fetches user statistics from social networks (twitter, fb)
  • stores it in cache
  • serves stats via http api

Components

Installation

To install the package you can clone the repository:

$ hg clone https://bitbucket.org/cezio/people-index

and then install it:

$ ./setup.py install

You probably want to use virtualenv (or even better, virtualenvwrapper) to manage local installation.

Additional dependencies:
  • working Redis instance

Usage

There are two components needed to be working in the same time:
  • web frontend
  • queue worker (worker usage depends on queue type)

Web Frontend

To run web frontend you need to use configuration file, which provides configuration for binding Api component. Sample configuration can be found in resources/ directory in code repository.

You can specify cache type, queue type and parameters for their initialization.

After you create your own config file, you can run web server with one command:

$ fetch_server config_file.cfg

Queue Worker - RQ

If you use Redis locally, you can start rq worker without any parameters:

$ rqworker

Consult rqworker -h for more specific configuration.

Sample Clients

There are also two sample client commands available to immediate use:

Direct Fetcher Api use

$ fetch_user_direct USERNAME

USERNAME user name to search for

Async web api

$ fetch_user_async BASE_URL USERNAME

BASE_URL is a base web frontend url USERNAME user name to search for

Asynchronous HTTP Api

Async api resides at /user/async/ url. To ask for specific user, you need to issue GET /user/async/$username request.

Async api will return different responses depending on state of fetched user:

  • If user is already in cache, it will return 200 OK response with JSON payload:

    {

    "data": [ {

    "data": 5, "service": "twitter" }

    ], "key": "cezio", "status": "fetched"

    }

  • If user is not available in any of the services, response will be 404 Not Found

  • If user is not in cache, and it's not queued yet, response will be 202 Accepted

    with following payload:
    {

    "status": "queued"

    }

    Client should repeat request after short time to get updated status

Synchronous HTTP Api

This feature is for testing purposes only. Synchronous api resides at /user/sync/ url and behaves similary to async api, except for queued response.

Missing features & Comments

Currently, only Twitter fetcher is implemented, since it allows to grab data from public pages (No need to use their api, no need to generate access tokens).

In previous versions, base queue implementation was based on sqlite, but that wasn't efficient, since SQLite doesn't allow sharing a database between processes easily.

Web api is not actually a REST api - there is not much of REST convention used:
  • objects are created on server side not with POST, but during background job, so client doesn't actually do directly any state modification on server,
  • GET method on a resource is not a 'safe' method, since client uses GET to initiate background fetch, which changes state on server,
  • thus, resources are not cacheable (they can change in time),
  • there is no way to update/delete resources by a client.