Source

django-anonymizer / README.rst

Django Anonymizer

This app aims to help you anonymize data in a database used for development.

It is common practice in develpment to use a database that is very similar in content to the real data. The problem is that this can lead to having copies of sensitive customer data on development machines and other placers (like automatic backups). This Django app helps by giving an easy and customizable way to anonymize data in your models.

The basic method is go through all the models that you specify, and generate fake data for all the fields specified. Introspection of the models will produce an anonymizer that will attempt to provide sensible fake data for each field, leaving you to tweak for your needs.

Please note that the methods provided will not provide full anonymity. Even if you anonymize the names and other details of your customers, there may well be enough data to identify them. Relationships between records in the database are not altered, in order to preserve the characteristic structure of data in your application, but this may leave you open to information leaks which might not be acceptable for your data. This application should be good enough for simpler policies like 'remove all real telephone numbers from the database'.

Usage:

  • Install using setup.py or pip/easy_install.

  • Add 'anonymizer' to your INSTALLED_APPS setting.

  • To create some stub files for your anonymizers, do:

    ./manage.py create_anonymizers app_name1 [app_name2...]
    

    This will create a file anonymizers.py in each of the apps you specify. (It will not overwrite existing files).

    The file will contain autogenerated classes that attempt to use appropriate functions for generating fake data.

  • Edit the generated anonymizers.py files, adjusting as necessary, and adding any filtering. You can override any of the public methods defined in anonymizer.base.Anonymizer in order to do filtering and other customization.

    The 'attributes' dictionary is the key attribute to edit. The keys are the attribute names of attributes on the model that need to be set. The values are either strings or callables. If strings, they will be interpreted as a function in the module anonymizer.replacers. This module can be browsed to find suitable functions to use to anonymize data.

    If callables are used as the keys, they should have a signature compatible with the callables in anonymizer.replacers. You can use lambda *args: my_constant_value to return a constant.

    For some fields, you will want to remove them from the list of attributes, so that the values will be unchanged - especially things like denormalised fields. You can also override the 'alter_object' to do any fixing that may be necessary.

    An example Anonymizer for django.contrib.auth.models.User might look like this:

    from datetime import datetime
    
    from anonymizer import Anonymizer
    from django.contrib.auth.models import User
    
    class UserAnonymizer(Anonymizer):
    
        model = User
    
        attributes = {
            'username':   'username',
            'first_name': 'first_name',
            'last_name':  'last_name',
            'email':      'email',
            'date_joined': 'similar_datetime'
            # Set to today:
            'last_login': lambda *args: datetime.now()
        }
    
        def alter_object(self, obj):
            if obj.is_superuser:
                return False # don't change, so we can still log in.
            super(UserAnonymizer, self).alter_object(obj)
            # Destroy all passwords for everyone else
            obj.set_unusable_password()
    
  • If you need to create anonymizers for apps that you do not control, you may want to move the contents of the anonymizers.py file to an app that you do control. It doesn't matter if the anonymizer classes are for models that do not correspond to the applications they are contained it.

    (For example, if you want to anonymize the models in django.contrib.auth, you will probably want to move the contents of django/contrib/auth/anonymizers.py into yourprojectapp/anonymizers.py)

  • To run the anonymizers, do:

    ./manage.py anonymize_data app_name1 [app_name2...]
    

    This will DESTRUCTIVELY UPDATE all your data. Make sure you have backups, use at own risk, yada yada.