Wiki

Clone wiki

gnd / GND_File_Format

Introduction

For GND we're using a Document-oriented database. So, data isn't stored in a normalised way the way it is in a conventional RDBMS. The database stores documents whole. This brings great performance advantages in submitting data (since it doesn't have to be broken down into its constituent parts) and in retrieval (since the data is already available all in one place).

The document-oriented models also brings versatility in storage format: there isn't a database schema, so we can store any kind of data.

We are going to have a loose schema - to ease the exploitation of the data, but we aren't going to constrain the range of measurements/attributes in a dataset.

This document strategy may also be applicable to other ad-hoc data flows, particularly were we we'll need to support a range of data types. I intend extending Debrief to support this file format, so it can integrate directly with the geospatial database.

Detail

Constraints

  • JSON - since that's what the database stores

Aims

  • self-documenting format: so we can understand what's in the file by looking at it. This aim is met using the dataType element in the metadata. This tells us what attributes are recorded in this file.
  • extensible: we don't know what kinds of attributes we'll want to support in the future. We support this by being able to include new fields in the dataType element, then including the new attribute as a top-level array.
  • standard: exploit existing standards where we can, to reduce learning curve. We've an option to re-use the metadata from the Dublin Core Metadata Initiative. But, it doesn't seem a clear-cut decision: we may be unnecessarily squeezing our types into the standard Dublin ones.
  • free of business rules: there will be a layer of business logic between the database and the data consumer/producer. This layer will contain the knowledge of definition of each data-type parameters, and ensure adherence to that standard. E.g. The business layer will know that a 'temp' value is a floating point value representing temperature in degrees celsius. This layer is outside the GND project, though it's recommended that SI units be used when possible.

Strategy

Use JSON file format. Store measured data in arrays, one array per attribute. Store list of attributes in dataType element. We can look at this element to determine what we can do with this file type.

Structure

  • database params (optional, but CouchDb will insert _id, _rev, _attachments)
  • data arrays (location, course, time in this example - but could be any measurable attributes)
  • metadata (compulsary)
    • name - the name of this document (could be reference, filename)
    • data_type - list of attributes
    • sensor - the name of the sensor (NAVMAN-1233Z)
    • sensor_type - the type of sensor (GPS, Thermometer)
    • platform - name of the platform/vehicle (R1332/2)
    • platform_type - the name of the type of platform (bus, bike, car)
    • trial - the name of the trial/exercise
    • type - broad name for for what is being recorded. See the data types
  • metadata (optional)
    • geo_bounds - geographic bounding box
    • time_bounds - time period covered
    • created - the ISO date the recording was made
    • imported - the ISO date the document was added to the database (auto-populated on import)

May 2013

{
   "location" : {"type": "MultiPoint","coordinates": [ [1.1, 3.1],[2.5, 4.4],[1.6,5,4]]},
   "course"   : [150.1, 150.2, 150.2],
   "time"     : [ "2012-02-27T14:46:02+0000","2012-02-27T14:50:10+0000","2012-02-27T14:55:00+0000"],
   "metadata" :
     {
       "data_type"      : ["location", "course", "time"],
       "platform"      : "vehicle 1232",
       "platform_type" : "bike",
       "trial"         : "2012/Feb/1433",
       "sensor"        : "GARMIN-GO300",
       "type"          : "track",
       "name"          : "file_1232/C/Z",
       "sensor_type"   : "GPS",
       "geo_bounds"    : {  "type" : "envelope",
                            "coordinates" : [[1.02, 0.25], [0.85, 0,39]]},
       "time_bounds"   : { "start": "2012-02-27T14:46:02+0000", "end":"2012-02-27T14:55:00+0000"},
       "created"       : "2012-02-27T14:55:02+0000",
       "imported"      : "2012-03-29T09:46:02+0000",
     }
}

Original (superceded) Format

{
   "lat"      : [50.1, 50.2, 50.2],
   "lon"      : [-2.3, -2.4, -2.1],
   "time"     : [ "2012-02-27T14:46:02+0000","2012-02-27T14:50:10+0000","2012-02-27T14:55:00+0000"],
   "metadata" :
     {
       "data_type"      : ["lat", "lon", "time"],
       "platform"      : "vehicle 1232",
       "platform_type" : "bike",
       "trial"         : "2012/Feb/1433",
       "sensor"        : "GARMIN-GO300",
       "type"          : "track",
       "name"          : "file_1232/C/Z",
       "sensor_type"   : "GPS",
       "geo_bounds"    : { "tl":[50.3, -2.4],"br":[50.1, -2.1] },
       "time_bounds"   : { "start": "2012-02-27T14:46:02+0000", "end":"2012-02-27T14:55:00+0000"},
       "created"       : "2012-02-27T14:55:02+0000",
       "imported"      : "2012-03-29T09:46:02+0000",
     }
}

Updated