Wiki

Clone wiki

pgenctypes / Home

Encrypted data types for PostgreSQL 9.1+

This project provides transparently encrypted data types for the PostgreSQL database. With it, table schemas can be designed which contain one or more data fields (columns) which are encrypted before being stored in the database.

The encryption used is AES-256-CFB, with random per-datum IVs.

This extension is developed for PostgreSQL 9.1 and this guide assumes a working installation of the database, with some knowledge of how to configure and use it.

Installation

The list of prerequisited for a clean installation of pgenctypes is:

  • A working installation of PostgreSQL 9.1, built with OpenSSL support
  • PostgreSQL development headers and libraries
  • The gcc compiler and associated headers and libraries

After downloading and unpacking (if necessary) pgenctypes in a directory which will permanently be used for the installation of pgenctypes, it is enough to execute make (or gmake if on BSD systems) in order to build the extension. A successful build will result in a shared library (usually named pgenctypes.so) and a SQL script file pgenctypes.sql.

The pgenctypes.sql will contain full paths to the shared library, and should be modified if the library is to be moved somewhere else.

The new data types can be installed in a PostgreSQL database by simply executing this script in the database context under a superuser account, for example by issuing the following shell command:

$ psql -U pgsql dbname < pgenctypes.sql

Using pgenctypes

The extension currently implements the following data types:

  • enctext - like "text"
  • encbigint - like "bigint"
  • encdouble - like "double"

These types can be used in a database schema in a straightforward way, e.g.:

CREATE TABLE mydata (
        id              SERIAL,
        name            TEXT,
        secret          ENCTEXT,
        PRIMARY KEY(id)
);

In this example, any data stored in the "secret" field will be encrypted.

The encryption key is per-session and can be (re)set multiple time within a session if needed (e.g. to encrypt different data with different keys.)

There are two ways to set the encryption key. The first is by calling the pgenctypes_set_key() SQL function. This is the preferred way as it ensures that the supporting shared library is loaded before any data operations are carried out.

SELECT pgenctypes_set_key('mykey');

The second way is to set the session variable pgenctypes.key by using the SET SQL command, but it only works if the library has been already loaded (e.g. by using shared/local_preload_libraries configuration settings or by doing any other operation which touches data of these types).

SET pgenctypes.key='mykey';

Both of these ways are equivalent.

Once the key is set, ordinary SQL operations (e.g. SELECT, INSERT and others) will work transparently as expected: the application will send and receive plain text data as if no encryption is going on. Only when the data is written to storage is it encrypted.

The data types are very rudimentary and do not (yet) support most of the normal SQL operators and functions. Specifically, they are not indexable (but they are castable, so effectively you can work around it; note that if you create a index around a casted expression involving enctype data, the data being indexed will be stored decrypted within this index!)

Technical details

Each data field is stored with a preceding header of 16 bytes containing metadata and a 64-bit random IV. The data is encrypted using AES-256-CFB, for which the key is derived from the user-supplied key material using SHA256.

Choosing encryption keys

The encryption keys are passed from the application to the database server and used in encrypting and decrypting the data. Technically, the keys should have 256 bits (32 bytes) of entropy for optimal security.

If the encryption keys are to be entered by a human being, it is better to use passphrases such as "it was a cloudy day when I bought my first car in 1999" than an incomprehensible computer generated passwords like "PlWwrHa!r4" simply because people tend write down things which they cannot easily remember.

On the other hand, if the password is only ever going to be passed from an application to the database server, it is better for it to have the optimal security properties. Such passwords can be generated by executing e.g. this command:

dd if=/dev/random count=1 | sha256

The resulting long hexadecimal string can safely be used as a computer-to-computer password (and is vastly stronger than either of the human-readable passwords given above).

Efficiency

Each stored data field is preceeded by a 16 byte header. This means that the data types are particularly inefficient for storing short strings, but that is unavoidable if security is to be maintained.

Security considerations

For most uses, the protection offered by pgenctypes is as good as the state of the art in cryptography can achieve. If the goal is to protect the stored data from unauthorized access, there is no practical chance that the protection will be circumevented. If the application uses SSL in communication with the server, all data sent from the application and stored at the server will be hidden in transit and while stored. This makes pgenctypes a very good solution e.g. for cloud storage, where the actual data is stored in untrusted environments.

However, there is still the question of what happens with the encryption keys in the application and in the server.

The application needs to set the encryption key which the server will use to protect the data (e.g. by calling pgenctypes_set_key() and because of this it needs to have unprotected access to this key. In case of Web applications, this means that the key will probably be stored as a plain string within the application code or its configuration files, which all need to be accessible by the Web server. Compiled applications offer only minimally better protection as it is a matter of fact that they can be decompiled and analysed regardless of any protections they use.

Note that the concept of encrypted data types means that the data is encrypted the whole time it is stored in the database. As soon as it enters an application (and this potentially also includes a backup application, though `pg_dump` doesn't know how to deal with keys), it is decrypted.

At the server side, if an attacker gets root access to the operating system (either directly by a back-door or a security hole or indirectly if the operating system is a virtual machine by controlling the VM host), he can examine the memory used by executing processes and so get a hold of the encryption key within the PostgreSQL server.

There are no feasible, small-scale defences against these types of attacks.

The described method of encrypting data is mostly useful if the requirement is to control the encryption key(s) from the application code. If this is not a requirement, simply using an encrypted file system or other low-level storage on which to install PostgreSQL data will probably be better and more portable.


See DefaultHome for wiki documentation.

Updated