Source

django / docs / topics / serialization.txt

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
==========================
Serializing Django objects
==========================

Django's serialization framework provides a mechanism for "translating" Django
objects into other formats. Usually these other formats will be text-based and
used for sending Django objects over a wire, but it's possible for a
serializer to handle any format (text-based or not).

.. seealso::

    If you just want to get some data from your tables into a serialized
    form, you could use the :djadmin:`dumpdata` management command.

Serializing data
----------------

At the highest level, serializing data is a very simple operation::

    from django.core import serializers
    data = serializers.serialize("xml", SomeModel.objects.all())

The arguments to the ``serialize`` function are the format to serialize the data
to (see `Serialization formats`_) and a
:class:`~django.db.models.query.QuerySet` to serialize. (Actually, the second
argument can be any iterator that yields Django objects, but it'll almost
always be a QuerySet).

You can also use a serializer object directly::

    XMLSerializer = serializers.get_serializer("xml")
    xml_serializer = XMLSerializer()
    xml_serializer.serialize(queryset)
    data = xml_serializer.getvalue()

This is useful if you want to serialize data directly to a file-like object
(which includes an :class:`~django.http.HttpResponse`)::

    out = open("file.xml", "w")
    xml_serializer.serialize(SomeModel.objects.all(), stream=out)

.. note::

    Calling :func:`~django.core.serializers.get_serializer` with an unknown
    :ref:`format <serialization-formats>` will raise a
    :class:`~django.core.serializers.SerializerDoesNotExist` exception.

Subset of fields
~~~~~~~~~~~~~~~~

If you only want a subset of fields to be serialized, you can
specify a ``fields`` argument to the serializer::

    from django.core import serializers
    data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))

In this example, only the ``name`` and ``size`` attributes of each model will
be serialized.

.. note::

    Depending on your model, you may find that it is not possible to
    deserialize a model that only serializes a subset of its fields. If a
    serialized object doesn't specify all the fields that are required by a
    model, the deserializer will not be able to save deserialized instances.

Inherited Models
~~~~~~~~~~~~~~~~

If you have a model that is defined using an :ref:`abstract base class
<abstract-base-classes>`, you don't have to do anything special to serialize
that model. Just call the serializer on the object (or objects) that you want to
serialize, and the output will be a complete representation of the serialized
object.

However, if you have a model that uses :ref:`multi-table inheritance
<multi-table-inheritance>`, you also need to serialize all of the base classes
for the model. This is because only the fields that are locally defined on the
model will be serialized. For example, consider the following models::

    class Place(models.Model):
        name = models.CharField(max_length=50)

    class Restaurant(Place):
        serves_hot_dogs = models.BooleanField()

If you only serialize the Restaurant model::

    data = serializers.serialize('xml', Restaurant.objects.all())

the fields on the serialized output will only contain the `serves_hot_dogs`
attribute. The `name` attribute of the base class will be ignored.

In order to fully serialize your Restaurant instances, you will need to
serialize the Place models as well::

    all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
    data = serializers.serialize('xml', all_objects)

Deserializing data
------------------

Deserializing data is also a fairly simple operation::

    for obj in serializers.deserialize("xml", data):
        do_something_with(obj)

As you can see, the ``deserialize`` function takes the same format argument as
``serialize``, a string or stream of data, and returns an iterator.

However, here it gets slightly complicated. The objects returned by the
``deserialize`` iterator *aren't* simple Django objects. Instead, they are
special ``DeserializedObject`` instances that wrap a created -- but unsaved --
object and any associated relationship data.

Calling ``DeserializedObject.save()`` saves the object to the database.

This ensures that deserializing is a non-destructive operation even if the
data in your serialized representation doesn't match what's currently in the
database. Usually, working with these ``DeserializedObject`` instances looks
something like::

    for deserialized_object in serializers.deserialize("xml", data):
        if object_should_be_saved(deserialized_object):
            deserialized_object.save()

In other words, the usual use is to examine the deserialized objects to make
sure that they are "appropriate" for saving before doing so.  Of course, if you
trust your data source you could just save the object and move on.

The Django object itself can be inspected as ``deserialized_object.object``.

.. _serialization-formats:

Serialization formats
---------------------

Django supports a number of serialization formats, some of which require you
to install third-party Python modules:

==========  ==============================================================
Identifier  Information
==========  ==============================================================
``xml``     Serializes to and from a simple XML dialect.

``json``    Serializes to and from JSON_ (using a version of simplejson_
            bundled with Django).

``yaml``    Serializes to YAML (YAML Ain't a Markup Language). This
            serializer is only available if PyYAML_ is installed.
==========  ==============================================================

.. _json: http://json.org/
.. _simplejson: http://undefined.org/python/#simplejson
.. _PyYAML: http://www.pyyaml.org/

Notes for specific serialization formats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

json
^^^^

If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
serializer, you must pass ``ensure_ascii=False`` as a parameter to the
``serialize()`` call. Otherwise, the output won't be encoded correctly.

For example::

    json_serializer = serializers.get_serializer("json")()
    json_serializer.serialize(queryset, ensure_ascii=False, stream=response)

The Django source code includes the simplejson_ module. However, if you're
using Python 2.6 or later (which includes a builtin version of the module), Django will
use the builtin ``json`` module automatically. If you have a system installed
version that includes the C-based speedup extension, or your system version is
more recent than the version shipped with Django (currently, 2.0.7), the
system version will be used instead of the version included with Django.

Be aware that if you're serializing using that module directly, not all Django
output can be passed unmodified to simplejson. In particular, :ref:`lazy
translation objects <lazy-translations>` need a `special encoder`_ written for
them. Something like this will work::

    from django.utils.functional import Promise
    from django.utils.encoding import force_unicode

    class LazyEncoder(simplejson.JSONEncoder):
        def default(self, obj):
            if isinstance(obj, Promise):
                return force_unicode(obj)
            return super(LazyEncoder, self).default(obj)

.. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html

.. _topics-serialization-natural-keys:

Natural keys
------------

.. versionadded:: 1.2

   The ability to use natural keys when serializing/deserializing data was
   added in the 1.2 release.

The default serialization strategy for foreign keys and many-to-many relations
is to serialize the value of the primary key(s) of the objects in the relation.
This strategy works well for most objects, but it can cause difficulty in some
circumstances.

Consider the case of a list of objects that have a foreign key referencing
:class:`~django.contrib.conttenttypes.models.ContentType`. If you're going to
serialize an object that refers to a content type, then you need to have a way
to refer to that content type to begin with. Since ``ContentType`` objects are
automatically created by Django during the database synchronization process,
the primary key of a given content type isn't easy to predict; it will
depend on how and when :djadmin:`syncdb` was executed. This is true for all
models which automatically generate objects, notably including
:class:`~django.contrib.auth.models.Permission`.

.. warning::

    You should never include automatically generated objects in a fixture or
    other serialized data. By chance, the primary keys in the fixture
    may match those in the database and loading the fixture will
    have no effect. In the more likely case that they don't match, the fixture
    loading will fail with an :class:`~django.db.IntegrityError`.

There is also the matter of convenience. An integer id isn't always
the most convenient way to refer to an object; sometimes, a
more natural reference would be helpful.

It is for these reasons that Django provides *natural keys*. A natural
key is a tuple of values that can be used to uniquely identify an
object instance without using the primary key value.

Deserialization of natural keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Consider the following two models::

    from django.db import models

    class Person(models.Model):
        first_name = models.CharField(max_length=100)
        last_name = models.CharField(max_length=100)

        birthdate = models.DateField()

        class Meta:
            unique_together = (('first_name', 'last_name'),)

    class Book(models.Model):
        name = models.CharField(max_length=100)
        author = models.ForeignKey(Person)

Ordinarily, serialized data for ``Book`` would use an integer to refer to
the author. For example, in JSON, a Book might be serialized as::

    ...
    {
        "pk": 1,
        "model": "store.book",
        "fields": {
            "name": "Mostly Harmless",
            "author": 42
        }
    }
    ...

This isn't a particularly natural way to refer to an author. It
requires that you know the primary key value for the author; it also
requires that this primary key value is stable and predictable.

However, if we add natural key handling to Person, the fixture becomes
much more humane. To add natural key handling, you define a default
Manager for Person with a ``get_by_natural_key()`` method. In the case
of a Person, a good natural key might be the pair of first and last
name::

    from django.db import models

    class PersonManager(models.Manager):
        def get_by_natural_key(self, first_name, last_name):
            return self.get(first_name=first_name, last_name=last_name)

    class Person(models.Model):
        objects = PersonManager()

        first_name = models.CharField(max_length=100)
        last_name = models.CharField(max_length=100)

        birthdate = models.DateField()

        class Meta:
            unique_together = (('first_name', 'last_name'),)

Now books can use that natural key to refer to ``Person`` objects::

    ...
    {
        "pk": 1,
        "model": "store.book",
        "fields": {
            "name": "Mostly Harmless",
            "author": ["Douglas", "Adams"]
        }
    }
    ...

When you try to load this serialized data, Django will use the
``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
into the primary key of an actual ``Person`` object.

.. note::

    Whatever fields you use for a natural key must be able to uniquely
    identify an object. This will usually mean that your model will
    have a uniqueness clause (either unique=True on a single field, or
    ``unique_together`` over multiple fields) for the field or fields
    in your natural key. However, uniqueness doesn't need to be
    enforced at the database level. If you are certain that a set of
    fields will be effectively unique, you can still use those fields
    as a natural key.

Serialization of natural keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So how do you get Django to emit a natural key when serializing an object?
Firstly, you need to add another method -- this time to the model itself::

    class Person(models.Model):
        objects = PersonManager()

        first_name = models.CharField(max_length=100)
        last_name = models.CharField(max_length=100)

        birthdate = models.DateField()

        def natural_key(self):
            return (self.first_name, self.last_name)

        class Meta:
            unique_together = (('first_name', 'last_name'),)

That method should always return a natural key tuple -- in this
example, ``(first name, last name)``. Then, when you call
``serializers.serialize()``, you provide a ``use_natural_keys=True``
argument::

    >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True)

When ``use_natural_keys=True`` is specified, Django will use the
``natural_key()`` method to serialize any reference to objects of the
type that defines the method.

If you are using :djadmin:`dumpdata` to generate serialized data, you
use the `--natural` command line flag to generate natural keys.

.. note::

    You don't need to define both ``natural_key()`` and
    ``get_by_natural_key()``. If you don't want Django to output
    natural keys during serialization, but you want to retain the
    ability to load natural keys, then you can opt to not implement
    the ``natural_key()`` method.

    Conversely, if (for some strange reason) you want Django to output
    natural keys during serialization, but *not* be able to load those
    key values, just don't define the ``get_by_natural_key()`` method.

Dependencies during serialization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since natural keys rely on database lookups to resolve references, it
is important that the data exists before it is referenced. You can't make
a `forward reference` with natural keys--the data you are referencing
must exist before you include a natural key reference to that data.

To accommodate this limitation, calls to :djadmin:`dumpdata` that use
the :djadminopt:`--natural` option will serialize any model with a
``natural_key()`` method before serializing standard primary key objects.

However, this may not always be enough. If your natural key refers to
another object (by using a foreign key or natural key to another object
as part of a natural key), then you need to be able to ensure that
the objects on which a natural key depends occur in the serialized data
before the natural key requires them.

To control this ordering, you can define dependencies on your
``natural_key()`` methods. You do this by setting a ``dependencies``
attribute on the ``natural_key()`` method itself.

For example, let's add a natural key to the ``Book`` model from the
example above::

    class Book(models.Model):
        name = models.CharField(max_length=100)
        author = models.ForeignKey(Person)

        def natural_key(self):
            return (self.name,) + self.author.natural_key()

The natural key for a ``Book`` is a combination of its name and its
author.  This means that ``Person`` must be serialized before ``Book``.
To define this dependency, we add one extra line::

        def natural_key(self):
            return (self.name,) + self.author.natural_key()
        natural_key.dependencies = ['example_app.person']

This definition ensures that all ``Person`` objects are serialized before
any ``Book`` objects. In turn, any object referencing ``Book`` will be
serialized after both ``Person`` and ``Book`` have been serialized.