Encrypted data types for PostgreSQL 9.1+
This project provides transparently encrypted data types for the PostgreSQL database. With it, table schemas can be designed which contain one or more data fields (columns) which are encrypted before being stored in the database.
The encryption used is AES-256-CFB, with random per-datum IVs.
This extension is developed for PostgreSQL 9.1 and this guide assumes a working installation of the database, with some knowledge of how to configure and use it.
The list of prerequisited for a clean installation of
- A working installation of PostgreSQL 9.1, built with OpenSSL support
- PostgreSQL development headers and libraries
- The gcc compiler and associated headers and libraries
After downloading and unpacking (if necessary)
pgenctypes in a directory
which will permanently be used for the installation of
pgenctypes, it is
enough to execute
gmake if on BSD systems) in order to
build the extension. A successful build will result in a shared library
pgenctypes.so) and a SQL script file
pgenctypes.sql will contain full paths to the shared library, and
should be modified if the library is to be moved somewhere else.
The new data types can be installed in a PostgreSQL database by simply executing this script in the database context under a superuser account, for example by issuing the following shell command:
$ psql -U pgsql dbname < pgenctypes.sql
The extension currently implements the following data types:
- enctext - like "text"
- encbigint - like "bigint"
- encdouble - like "double"
These types can be used in a database schema in a straightforward way, e.g.:
CREATE TABLE mydata ( id SERIAL, name TEXT, secret ENCTEXT, PRIMARY KEY(id) );
In this example, any data stored in the "secret" field will be encrypted.
The encryption key is per-session and can be (re)set multiple time within a session if needed (e.g. to encrypt different data with different keys.)
There are two ways to set the encryption key. The first is by calling the
pgenctypes_set_key() SQL function. This is the preferred way as it
ensures that the supporting shared library is loaded before any data
operations are carried out.
The second way is to set the session variable
SET SQL command, but it only works if the library has been
already loaded (e.g. by using shared/local_preload_libraries configuration
settings or by doing any other operation which touches data of these types).
Both of these ways are equivalent.
Once the key is set, ordinary SQL operations (e.g. SELECT, INSERT and others) will work transparently as expected: the application will send and receive plain text data as if no encryption is going on. Only when the data is written to storage is it encrypted.
The data types are very rudimentary and do not (yet) support most of the normal SQL operators and functions. Specifically, they are not indexable (but they are castable, so effectively you can work around it; note that if you create a index around a casted expression involving enctype data, the data being indexed will be stored decrypted within this index!)
Each data field is stored with a preceding header of 16 bytes containing metadata and a 64-bit random IV. The data is encrypted using AES-256-CFB, for which the key is derived from the user-supplied key material using SHA256.
Choosing encryption keys
The encryption keys are passed from the application to the database server and used in encrypting and decrypting the data. Technically, the keys should have 256 bits (32 bytes) of entropy for optimal security.
If the encryption keys are to be entered by a human being, it is better to use passphrases such as "it was a cloudy day when I bought my first car in 1999" than an incomprehensible computer generated passwords like "PlWwrHa!r4" simply because people tend write down things which they cannot easily remember.
On the other hand, if the password is only ever going to be passed from an application to the database server, it is better for it to have the optimal security properties. Such passwords can be generated by executing e.g. this command:
dd if=/dev/random count=1 | sha256
The resulting long hexadecimal string can safely be used as a computer-to-computer password (and is vastly stronger than either of the human-readable passwords given above).
Each stored data field is preceeded by a 16 byte header. This means that the data types are particularly inefficient for storing short strings, but that is unavoidable if security is to be maintained.
For most uses, the protection offered by
pgenctypes is as good as the
state of the art in cryptography can achieve. If the goal is to protect the
stored data from unauthorized access, there is no practical chance
that the protection will be circumevented. If the application uses SSL
in communication with the server, all data sent from the application and
stored at the server will be hidden in transit and while stored. This
pgenctypes a very good solution e.g. for cloud storage, where
the actual data is stored in untrusted environments.
However, there is still the question of what happens with the encryption keys in the application and in the server.
The application needs to set the encryption key which the server will use to
protect the data (e.g. by calling
pgenctypes_set_key() and because of
this it needs to have unprotected access to this key. In case of Web
applications, this means that the key will probably be stored as a plain
string within the application code or its configuration files, which all need
to be accessible by the Web server. Compiled applications offer only minimally
better protection as it is a matter of fact that they can be decompiled and
analysed regardless of any protections they use.
Note that the concept of encrypted data types means that the data is encrypted the whole time it is stored in the database. As soon as it enters an application (and this potentially also includes a backup application, though `pg_dump` doesn't know how to deal with keys), it is decrypted.
At the server side, if an attacker gets root access to the operating system (either directly by a back-door or a security hole or indirectly if the operating system is a virtual machine by controlling the VM host), he can examine the memory used by executing processes and so get a hold of the encryption key within the PostgreSQL server.
There are no feasible, small-scale defences against these types of attacks.
The described method of encrypting data is mostly useful if the requirement is to control the encryption key(s) from the application code. If this is not a requirement, simply using an encrypted file system or other low-level storage on which to install PostgreSQL data will probably be better and more portable.
See DefaultHome for wiki documentation.