HTTPS SSH

Key/Value Store Coding Exercise

This coding exercise came out of an interview process with a company that I ultimately chose not to continue interviewing with since I chose to accept an offer from a different employer. I found the problem interesting because it afforded me an opportunity to gain familiarity with tools and an application architecture with which I previously had no experience. Because of this and because of some of the challenges I encountered with my approach, I found myself motivated to bring it to a state of completion despite the fact that I opted out of continuing the interview process.

The following is the problem description passed to me by one of the interviewing engineers:

Implement the requirements below to create a basic key/value service:
* uses Python
* create a RESTful API, we suggest using AWS Lambda or Flask but use
  whatever you are comfortable with.
  * the route should be something like /key but should be versionable
  * show some example uses of the service; use cases:
    * user should be able to get all keys/values
    * user should be able to get a specific key/value
    * user should be able to add a key/value
    * user should be able to update a key/value
    * user should be able to delete a key/value
    * enable the use of 2 different backing stores of your choice. they can
      use real data stores or be mocked out to represent. which one is used
      should be determined via configuration
* demonstrate asynchronous handling
  * simultaneous (make 2 or more calls that are processed asynchronously and
    when all calls complete results are compiled to a single result object
    which is returned)
  * chained (make a call the result of which informs a subsequent call)
* The code should be runnable and have some form of demonstration. For
  example, a user would add a key of 'sports', its value the list of
  'baseball', 'hockey' and 'football'.
* The code should have automated tests
* Share via code repository or zip (repo preferred)

The goal of this project was to meet as many of these requirements as possible in a short amount of time. I began with a relatively small amount of experience using Flask, AWS (Lambda or API Gateway) and ended up with the working knowledge necessary to build a simple key value store that has strong test coverage for the various environments it is intended to be run in.

The outline of work I did for this project looks roughly like the following:

  • Meta/prep
  • Choose technology stacks.
  • Create initial git repo and throw together the basic directory structure.
  • Write very basic Flask "hello world!" app to validate HTTP test fixtures.
  • Write simple test fixture to validate dynamodb docker container runs and works with simple put_item/get_item workflow usint the boto3 AWS client library.
  • Define RESTful api with swagger docs definition.
  • Write unit tests to validate basic RESTful key-value store behavior.
  • Write unit tests to validate datastore backend behavior.
  • Implement dynamodb datastore abstraction.
  • Implement Flask HTTP endpoints that meet tested requirements.
  • Do a lot of refactoring of test fixtures to support different ways of running tests:
  • Local Flask + Local DynamoDB
  • Local Flask + AWS DynamoDB
  • AWS API Gateway + AWS Lambda + AWS DynamoDB (using Zappa to deploy Flask app as API Gateway + Lambda)

I chose Flask for the initial environment implementation because running a wsgi app locally has a much shorter test cycle time compared to a workflow that involves deplying an API gateway and Lambda functions during fixture setup.

Keeping in mind that the RESTful interface I was aiming for could be run using either approach when developing my test fixtures, I ended up with tests that could be run against either HTTP application type which significantly reduced the need for essentially redundant test cases at the cost of somewhat increased test fixtured complexity.

At the same time that I maintained a high degree of testability for the HTTP interface of my application, I also kept an eye out for interchangeability of the backend datastore component. The requirements of this project explicitly called out the desire for configurable data backends. To meet this requirement, I wrapped the DynamodDB usage in this project with an interface defined by an abstract base class called "datastore" whose concrete subclasses are produced using an abstract factory method. This approach effectively hides instantiation and usage details of different data backends from the HTTP application "director" code while choice of concrete datastore implementation is configurable by setting the datastore choice as a class object in the config object used to define application settings.

Other potential datastore backends could be anything that allows CRUD operations on set-like values indexed by keyword strings: eg, AWS S3, AWS RDS, PostgreSQL, MySQL, Redis, MongoDB, etc

Requirements

Before getting started, it is assumed that you have the following installed on your development system:

  • Python 3.6
  • docker

Pythone 3.6 is necessary to run install the zappa tool from pypi, which enforces the use of a python version compatible with what AWS Lambda suppots.

Docker is necessary to run the local flask functional tests in order to provide a local dynamodb test fixture.

Setting Up Your Deveopment Environment

Set yourself up a virtualenv somehow. I use virtualenvwrapper like so:

$ mkvirtualenv -p $(which python3.6) kvstore

Install the dev requirements:

$ pip install -r requirements/dev.txt

Running Tests

Tests for this project are written in such a way that they support multiple service implementations. The selection of different implementations is managed primarily through environment variables accessed in the nce.settings module. There are two service types that are configurable in this way: the HTTP service and the datastore service.

The HTTP service may either be a local Flask application or and AWS API Gateway + Lambda function. The AWS API Gateway + Lambda function is deployed using the library for a command line tool called zappa.

The only currently supported datastore implementation is DynamoDB, but the abstraction representing it in this git repo is written in such a way as to make implementation of others fairly trivial; although there is some additional refactoring to be done to make it work in the tests.test_datastore_dynamodb.py module.

To run all tests, run the following from the root of this git repository:

$ tox

Flask Tests w/ local DynamoDB

This environment runs the http tests, tests.test_kvstore.py, against a local Flask app and a docker container running a local dynamodb jar downloaded from AWS.

$ tox -e py36-local-flask-dynamodb

Flask Tests w/ AWS DynamoDB

This environment runs the http tests against a local Flask app that stores keyvalue sets in DynamoDB running in AWS.

$ tox -e py36-local-flask-aws-dynamodb

AWS Lambda w/DynamoDB Tests

This environment runs the http tests against an AWS API Gateway + Lambda application that stores keyvalue sets in DynamoDB running in AWS.

$ tox -e lambda-dynamodb

Deploying To AWS w/ Zappa

Zappa is a python command line tool and is included as a prod dependency in this project. It can be used to deploy the app to AWS using the following series of commands:

$ zappa init
# you will be asked a series of questions about your app and desired setup
$ zappa deploy <envname> # <envname> is whatever you chose in the last step