HTTPS SSH

KOS - Key-value store of the OS course

This is a project for the Operating Systems course at Instituto Superior Técnico in the 2013-14 academic year. The project consists on a system - called KOS - for data storage. KOS utilizes the key-value logic model of data organization. This kind of system of data storage is used in various contexts, such as Windows registry for storing configuration information and operating system metadata for example.

The data stored in KOS is organized in partitions ("shards"), identified by an integer numeric identifier (shardId). "Sharding" is a technique used to optimize the performance of the access primitives because it permits parallelizing the access to data by subdividing it in partitions (shards) and attributing partitions to dedicated servers.

KOS supports the following API:

  • char* get(int clientId, int shardId, char* key): returns the value associated with the key passed as parameter in case it exists, returns NULL otherwise.
  • char* put(int clientId, int shardId, char* key, char* value): inserts a key-value pair and returns NULL if the key does not exist or, otherwise, the previous value associated to this key.
  • char* remove(int clientId, int shardId, char* key): deletes the key-value pair, returning the value associated in case it exists, and NULL otherwise.
  • KV_t* kos_getAllKeys(int clientId, int sharedId, int* sizeKVarray): returns a pointer to an array of key-value pairs (with each pair being stored in a structure of the data type KV_t). The size of the array is written in the parameter sizeKVarray.

For simplicity, we assume that both the key and value are strings of fixed sized (defined by the constant KV_SIZE).

Internally, each partition of KOS is supported by a hashmap.

KOS executes in a single process made up of multiple threads - server threads and client threads. Server threads manage the state of KOS and serve requests generated by client threads.

The client threads do not access the state of KOS directly. Instead, they interact with the server threads using the functions specified in the file kos_client.h, which insert a request in a buffer in memory shared with the server threads. The server threads then execute these requests and return the answers using the same buffer.

The development of KOS was divided into 2 parts:

Part 1

  • Synchronization between client and server threads. Same number of client and server threads in which each client thread is associated to a specific server thread.
  • Synchronization between server threads, in which access to KOS is protected by a mutex (or semaphore).
  • Hashmap used to store KOS data in volatile memory (RAM).

Hashmap to store KOS data

The hashmap used to store each partition of KOS is made up of an array of linked lists. The array has a fixed number of elements (specified by the constant HT_SIZE). A key-value pair that belongs to the partition associated with the hashmap is inserted in the list in the i position of the array, where i is determined using a hash function. This function converts each character of the key to an integer, and sums each of these values into an accumulator variable called i. The position of the array of linked lists to access is determined by the result of i % HT_SIZE.

Buffer for communication between client and server threads

In this first phase it is assumed that the number of client and server threads is equal and is specified by the constant NUM_THREADS. The buffer is an array of size NUM_THREADS, where each element of the array is associated to a pair <client thread i, server thread i>. This means that the requests of a client thread are always processed by the same server thread.

Synchronization between server threads for the access to KOS data

In the first part we use a simple mutex/semaphore in order to synchronize the access of the server threads to the KOS data.

Before start processing any request in the buffer, each server thread should execute a call to a function with the signature void delay(), defined in the files include/delay.h and kos/delay.c. This function injects a delay (to simulate, for example, an indeterminism in the operating system), suspending the execution of the thread for a period of time (configured in the file delay.c), thus allowing to exercise the mechanisms of synchronization between the server threads.

Part 2

The second part of this project extends the first part. We now assume that:

  1. The number of clients is different - tendentially superior - to the number of servers.
  2. KOS is persistent, meaning that the data stored in it survive the activities that created it or accessed it.

In the first part there was a biunivocal correspondence between a client and its server. Now, a client puts a request in the buffer which will be picked up by an available server.

Parallelism in the access to KOS

In this part we increase the concurrency in the access to KOS data with the following options:

  1. Allow parallel access to different partitions (already implemented in part 1).
  2. Allow read access in parallel for a partition.
  3. Allow access in parallel to distinct lists in a partition.
  4. Allow for search sequences (iterate over a list before reaching the place to put or remove) and parallel read accesses in a partition.

KOS' data persistency

Each partition is stored in a file named fshardId. The file is updated as the partition in memory is changed. The flow of data between KOS' memory and the file system obeys the following rules:

  • The initialization of each partition in memory is done from the corresponding file in the system's initialization phase.
    • If the file does not exist we assume the partition has no data, so it is initialized empty in memory and the corresponding file is created.
  • Read operations do not alter the partition in memory, therefore do not imply any alteration in the partition-file.
  • Write is atomic, that is, each write operation resulting from a put or remove request is done in the partition in memory and is immediatelly propagated to the partition-file.
    • An operation that implies a write (put, remove) is only considered concluded after the data is written in the memory and in the file. We consider that the data was written to the file once the call to the function of the UNIX file system API, that performs the write operation, returns.

The data to store in the partition file has the following structure:

  • Number of key-value pairs stored in the file - int NPKV - followed by the sequence of key-value pairs without any particular order.

This structure, although simple, requires searching for the key-value pairs that we intend to access - to delete (remove) or update (put) - in a sequential scan of the file. The following optimizations are implemented to avoid the sequential scanning of the files and keep them compact.

  1. Store in volatile memory the offset of the key-value in the file, thus avoiding the necessity to scan the file.
  2. Periodically compact the file, removing empty registries resulting from remove operations.
  3. Reoccupy empty key-value slots on new insertions by keeping their offsets in memory.

Mistakes

  • Read-write lock wasn't used. Technically it is possible for a read operation to clash with a write operation, because the code as it is allows for parallel reads but only one write (which can be done in parallel with the reads, leading to a possible bad read).
  • Whilst we store the offsets for the empty positions in the file where key-values were removed in order to avoid scanning the file for these positions, we forgot to store the offsets of all key-values.
  • Poor organization of the code. It should be distributed across more source/header files.

Credits

Developers

  • Daniel Lascas
  • David Gonçalves
  • Ricardo Moura