# Overview

## What is it?

Django-denormalize allows you to convert a tree of Django ORM objects into one data document. With 'data document' we mean a structure of dicts, lists and other primitive types, that can be serialized to JSON or a Python Pickle.

The resulting document can be used in combination with the Django cache layer to create blazingly fast views that do not hit the database. The data can also be synced to a NoSQL store like MongoDB, for consumption by other frameworks, like Meteor (NodeJS based).

If any data changes in the ORM (even if it's on a some deep many-to-many relationship far away from the root object), django-denormalize will automatically trigger a cache invalidation of the root object's document and/or sync the new document to your preferred NoSQL store.

This module also includes special support for content in FeinCMS objects: all regions and content types will be available under a 'content' dictionary.

## Example

For example, suppose you have the following models:

class Book(models.Model):
title = models.CharField(_("title"), max_length=80)
year = models.PositiveIntegerField(_("year"), null=True)
authors = models.ManyToManyField(Author)
...

class Author(models.Model):
name = models.CharField(_("name"), max_length=80)
...


You can write the following class to describe your document collection:

from denormalize.models import DocumentCollection

class BookCollection(DocumentCollection):
model = Book
name = "books"
prefetch_related = ['authors']


Let's print all documents:

bookcol = BookCollection()
for doc in bookcol.dump_collection():
print doc


Each document will have the following structure:

{
'_id': 42,
'title': u'Cooking for Geeks',
'year': 2010,
'authors': [
{
'_id': 18,
'name': u'Jeff Potter',
...
}
],
...
}


This in itself can be useful, but the real power of django-documentsync lies in its backends. Suppose we want to cache these documents, to avoid hitting the database. We can use these documents in our views, instead of accessing the Django ORM. Backend and view code:

# In models.py

from denormalize.backends.cache import CacheBackend

backend = CacheBackend()
backend.register(bookcol)

# In views.py

def our_book_view(request, book_id):
book_doc = backend.get_doc(bookcol, book)
if not book_doc:
return render(request, 'book.html', {'book': book_doc})


Our CacheBackend will try to fetch the book document from the Django cache. If it cannot be found, it will generate the document from the ORM and then store it in the cache.

And best of all: if any data on the Author or Book objects for this book changes, the cache will automatically be invalidated for us! The book_doc we retrieve, will always be up to date.

### How does this compare with simply using the Django page cache?

The traditional approach to Django scalability is using the page cache to cache the entire page rendered by the view. This works quite well, but it has two big disadvantages:

• The cache will not automatically be invalidated as soon as the underlying data changes. If you set the page cache time to 60 seconds, it will take up to 60 seconds for a change to be visible on the site.
• This approach does not work well for websites where users can login and see customized content.

In simpler cases, these problems can be worked around by using template fragment caching, as this allows you to cache common regions, and specify which variables should be incorporated into the cache key. But even in our simple Book example, it's not easy to invalidate the cache on changes to Author.

The disadvantages of the django-denormalize approach are:

• You no longer have access to the Django models and its methods in your templates. You are dealing with the raw data. Of course, you can add any extra information you might need in the template by extending the DocumentCollection, or by creating custom template filters to calculate some value.
• Writes by the ORM to models that are included in documents are slower, because they are monitored for changes.

## MongoDB backend

The MongoDB backend works quite similar to the CacheBackend:

# In models.py

from denormalize.backends.mongodb import MongoBackend

backend = MongoBackend(
name='mongo',
db_name='test_denormalize',
connection_uri='mongodb://localhost')
backend.register(bookcol)


Because the data is persistent and accessed directly through the MongoDB API, you need to make care to keep it in sync. You can trigger a full one-way sync using the following management command (TODO: currently not implemented yet for the MongoBackend, only for LocMemBackend. Coming soon!):

\$ ./manage.py denormalize_sync mongo books


Whenever you update the data through the ORM, the corresponding document will be updated automatically. The backend preserves any extra keys you may have set on the document root in MongoDB. Make sure, however, to not add or change keys on subdocuments created by the driver, because they will be overwritten. In the book example above, it is safe to set doc['foo'], but not safe to set doc['authors'][0]['foo'].

You should run full syncs in a cronjob, though, to prevent your data from going out of sync over time due to network outages and changes that bypass the ORM (see 'bugs and limitations' below).

## FeinCMS support

Django-denormalize has special support for FeinCMS.

## Performance optimization

@@Todo: explain how to prevent spurious updates using denormalize.context.

## Disadvantages, bugs and implementation notes

Bugs and limitations:

• Django-normalize had not yet been extensively tested in real world applications. Expect bugs. And since it's an early beta release, there is no guarantee that the API will not change without warning in the near future.
• Using django-denormalize on models that receive a lot of writes might significantly slow down your application, as every write will trigger database queries to determine the affected documents, and regeneration of the documents that have changes. Keep you view counters and last login timestamps out of the models included in documents! (You might want to move these to a NoSQL store anyway.)
• If you bypass the ORM (raw queries, manage.py dbshell, other applications, etc), django-denormalize cannot detect the changes made to the models. After perform a large batch operation, flush the Django cache, or run a full sync (denormalize_sync management command) to update your NoSQL backend, depending on how you use django-denormalize.
• If syncing to a NoSQL store and the NoSQL database is not available, you will lose the update, it is currently not rescheduled (TODO: implement a transaction log to keep track of changes and whether they have been properly synced or not). You should run a regular full sync in a cronjob.
• Syncing happens only one way. If you want to change data, you need to perform the modification on the ORM side, not a NoSQL side. We do try hard not to overwrite any extra attributes you added in the NoSQL backends.
• A full sync currently does not delete stale objects (TODO)
• Currently the LocMemBackend and CacheBackend do not support any indexing. If you need to lookup a document on another field than the primary key, you either need to build the index yourself, or hit the database to lookup the id. The latter might not be as bad as it sounds: if you have a database index on slug, a SELECT id FROM page WHERE slug='foo' should only hit the index. Next you can directly fetch the document with possibly data from tens of models. (TODO: maybe add support for using other fields as document key?)
• Keep in mind the storage limitations of your backends. Memcached can only store objects of up to 1MB, MongoDB has a limit of 16MB. Make sure your documents will not exceed these.

Types of projects that would benefit most of django-denormalize:

• Writes are rare and mostly occur due to content updates in the Django admin, like in CMS systems.
• There are a lot more reads than writes, and you want to speed up the read views, while keeping the front-end personalized and responsive to data changes.
• You want to use Meteor to build the front-end side of your application, but do not feel like implementing a CMS in Meteor. Django-denormalize allows you to build the CMS backend using the Django admin and FeinCMS. This was the original reason to start this project, so expect more updates to support this!
• You want to use MongoDB to access/query your data, but prefer to keep your primary data in a traditional, proven relation database system you have 10 years experience with, because it makes you or your DBA sleep better.

### Alternatives

Django-nonrel allows you to use the Django ORM to directly access a NoSQL database, but with limitations. If you do a lot of writes from your front-end views, or want to prevent data duplication, this might be a better solution.

PS: Need another backend? Writing one is quite simple! You only need to override a base class, and implement a few methods.