Commits

andrew cooke  committed ee9ca0e

c orm

  • Participants
  • Parent commits edea288

Comments (0)

Files changed (1)

File _posts/2012-05-16-orm-for-c.md

+---
+layout: post
+title: An ORM for C?
+tags: andrew c
+comments: true
+---
+
+An ORM for C?
+=============
+
+
+Introduction
+------------
+
+Hanging out on StackOverflow I was interested by [this
+question](http://stackoverflow.com/questions/10577006/is-there-some-convenient-orm-library-framework-for-c)
+and then disappointed by answers that argued:
+
+ - ORM is for business / domain models, where you don't use C;
+
+ - ORM for C is nonsensical because C doesn't have objects;
+
+ - ORM for C is impossible because C doesn't have reflection / RTTI;
+
+ - ORM is bad anyway.
+
+Taking each of these in turn:
+
+ - I have just been writing software to calibrate hardware devices.  Not only
+   did the client *require* a solution in C, it made a fair amount of sense,
+   given the low-level communication involved.  Yet way too much of my code was
+   spent reading and writing to an embedded database, used to configure and
+   schedule the various processes involved.
+
+ - While C doesn't have objects, it does have structs.  And I imagine it's
+   common (it was certainly the case in the code above) for a struct to
+   correspond to a table row.  So sure, perhaps we should call it SRM, but we
+   are arguing over naming details, not the big idea...
+
+ - The lack of RTTI makes implementing an ORM for C "interesting", but it
+   doesn't seem to be a deal-killer.  First, you can still do something
+   pre-runtime.  Second, you can pull metadata from the database.  So
+   *something* should be possible.
+
+ - Does anyone believe *all* ORM is bad, *everywhere*?  I thought these days
+   the consensus was that it's one more tool - appropriate in some situations,
+   to be avoided in others.
+
+Hence my disappointment.  So below I'm going to sketch what I think an ORM for
+C could look like.
+
+Note: I was just exploring the space here.  I hadn't thought about this before
+and, it turns out, I didn't get that far.
+
+Basic Requirements
+------------------
+
+This is what would have helped me recently:
+
+ - Utilities to retrieve, store and update structs associated with table rows.
+
+ - That works with existing structs.
+
+ - That works with existing schema.
+
+ - That supports changes to schema and structs during development.
+
+ - Being simple to use is more important than being efficient or complete:
+
+   - If I want complex SQL, I will specify it myself (no need for a complex
+     query language).
+
+   - If I want related items, I will retrieve them explicitly.
+
+I think these are all fairly self-explanatory.  Maybe the efficiency issue is
+controversial, but it reduces scope for an initial project and addresses my
+own needs.
+
+Initial Technology Choices
+--------------------------
+
+Since we're working in C, this has to be layered over ODBC - it's the
+recognised standard.
+
+As discussed earlier, the lack of RTTI suggests that we need a pre-runtime
+solution.  And that implies generating code.  Which doesn't fill me with
+pleasure, I must admit.  But I can't see a way around this, so will roll with
+the punches, make lemonade, etc, etc...
+
+The need to work with existing structs implies parsing C code (to read those
+structs).  It also, likely, implies talking to a database and constructing C
+code via a template engine or similar.  How should all that be implemented?
+Using C would require less additional tools (since the target is C), but if I
+am implementing it I'd prefer something I can work more quickly in.  Python
+might be a good choice - it's widely available and has tools like
+[pycparser](http://code.google.com/p/pycparser/) and a pile of templating
+engines.
+
+Foreign Keys
+------------
+
+How do we handle the relationship between pointers to other structs and
+foreign keys?  The requirements give part of the answer - we don't retrieve
+connected objects - but leave open how we store the information so that
+related values can be retrieved efficiently later.
+
+A "dirty" solution might store foreign keys in the pointer itself.  I think
+that would work (for integer keys on x86_64 architectures, at least), but it's
+likely offending something in the ISO spec and/or places an unclear limit on
+the values of keys.
+
+Another solution would be to have a separate cache for this information.  We
+could use the address of the struct in memory as a key.  But how do we manage
+the lifetime of this information?  This seems like it would be complex and
+intrusive (eg. requiring a custom `free()`).
+
+For guidance, I have looked at my own project.  In general, because I was
+having to deal with the database by hand, I stored related keys rather than
+nested structs.  That seems like a simple solution that avoids what is
+otherwise a hard problem.
+
+But there's another possible solution too, because most structs also store
+their own key, and that's to support, in the API, a request for the related
+struct (or structs).  This has a cost (a join) that could be avoided by some
+of the (rejected) ideas above, but it fits with the requirements and feels
+like it would make a "comfortable" API.
+
+So we will not map nested / related structs.  You can *also* have a pointer to
+a nested struct, and populate it from the key, or from the key of the struct
+you already have, plus the type.  But we won't have any "magic" for making
+links work.
+
+Relationships
+-------------
+
+Implicitly above I was considering one-to-one relationships.  But many
+different relationships as possible:
+
+ - Parent (containing foreign key) to child (as above)
+
+ - Child to parent (reverse of parent to child)
+
+ - Sibling (mutual foreign keys; not a good idea)
+
+ - Parent to children (via an association table)
+
+ - Child to parent (via an association table)
+
+There seems to be a simplification here, because C makes little distinction
+between 1 and many pointers, so a single API may be able to handle multiple
+cases.
+
+Tentative API
+-------------
+
+Given the above, it's probably worth making a strawman API.  For error
+handling all routines will return an int, 0 for success.  The database
+connection itself, any options, etc, will be stored via an ADT (opaque
+pointer - I'm exposing pointers so that const is useful):
+
+(Please excuse my C-like psuedocode).
+
+{% highlight c %}
+typedef struct {
+    int free(SRM **srm, int status);
+    ...
+} SRM;
+int srm_open(SRM **srm, const char *db_url);
+const char *srm_error(int error);
+{% endhighlight %}
+
+Here `SRM` provides a namespace for other operations.  It also provides names
+for structures.  In the examples below I will use `struct_a` etc as arbitrary
+structures.  These are already known (the library code is generated at compile
+time) and named via, for example, `SRM.struct_a`.
+
+The `status` parameter for `srm_free()` allows simple chaining of the status
+(the previous value is returned unless it was 0 (OK) and an error occurs while
+closing).
+
+Using `free(**SRM)` allows the ADT to be nulled.  Is this worth the
+non-intuitive API?
+
+### Instance Retrieval
+
+{% highlight c %}
+    int find(SRM *srm, const char *name, void **result, int *n, ...);
+
+    typedef struct {
+        int id;
+        int foo;
+        char *bar;
+    } struct_a;
+
+    SRM *srm = NULL;
+    struct_a *a;
+    int n;
+
+    int status;
+    if ((status = srm_open(&srm, "mysql://...."))) goto exit;
+    ...
+    if ((status = srm->find(srm, srm->struct_a, &a, &n, "foo", 42, NULL))) goto exit;
+    printf("retrieved %d instances where foo=42\n", n);
+
+exit:
+    if (a) free(a);
+    if (srm) status = srm->free(&srm, status);
+    return status;
+{% endhighlight %}
+
+Here the varargs name and provide values for fields in the struct.  The type
+*must* match the type in the struct, which is going to lead to errors.
+Perhaps there should be specialised versions with explicit arguments for small
+structs?  Or for larger structs with pre-selected fields?
+
+Do we need a constraint on the maximum number returned?  Would it be better to
+have a specialised version that raises an error if it doesn't return a single
+value?
+
+Would it be better to have separate functions (`srm->find_struct_a` etc)
+rather than the name parameter?  I suspect the name will re-appear on other
+functions, so is reused in a way that functions would not be (reducing the
+total number of components to the API).
+
+Similarly, should names and functions be namespaced to avoid conflicts?  For
+example, `srm->f.find` and `srm->n.struct_a` might be used.
+
+### Related fields
+
+{% highlight c %}
+    int related(SRM *srm, const char *from, void *from_id, const char *to, void **result, int n);
+
+    if ((status = srm->related(srm, srm->struct_a, a->id, srm->struct_b, &b, &n))) goto exit;
+{% endhighlight %}
+
+Can we automatically handle the different types of relations listed earlier at
+compile time by inspecting the database?  Is that too clever?  Are ambiguities
+likely to be important?
+
+Questions about size limits also apply here.
+
+We could add varargs constraints...
+
+Out of Time / Conclusions
+-------------------------
+
+OK, I'm out of time.  I think I'll publish this as it is and see if there are
+any useful comments.
+
+It seems to me that there *is* useful functionality here, but the API is
+already pretty complex.  And the smarts needed to implement it at compile time
+are significant.