Wiki

Clone wiki

webfs / Home

Why?

Right now with cloud storage, we have a Tower of Babel situation, where every provider makes up their own incompatible API and everybody picks a service and codes to that. It prevents interoperability, since developers can't port their code over to another cloud storage service without rewriting an entire layer of their application. It also leads to vendor lock-in, since competing providers can't implement each others' APIs - even when a an API is documented, providers are free to extend their own service beyond the documented parts, with the end result that services written for a specific service can't be ported seamlessly even if a competitor implemented the same API at one point.

What we need is a system where multiple storage interfaces can coexist, and be created or extended in a way that's discoverable to storage clients.

Status

Not usable yet. I'm still making regular refinements to the discovery protocol, which definitely means that it's not ready for prime-time yet. When the protocol is finalized, and the existing interfaces are working and documented, then I'll start doing releases.

What is webfs?

This project really has three parts:

  • A protocol for discovering storage interfaces,
  • A set of simple interfaces for cloud storage, and
  • An implementation of those interfaces.

It always gets on my nerves when people explain their projects starting from the most abstract layer and work their way down to actual implementation details, so I'm going to do the opposite.

webfs: The implementation

In a nutshell, this is a cloud storage system that you can deploy on any webhost that supports CGI. Right now it's still at the proof-of-concept stage; there are all sorts of important pieces that would need to be written before this could be used for anything serious. (Examples include authentication, space management, directories, support for concurrency, and other fundamental issues.) Still, none of these are insurmountable issues, and I fully intend to develop this into something that people can use to solve real problems.

The implementation was mostly designed to be a testbed for new interfaces, so it's cleanly extensible as well. Adding support for a new interface involves adding one file for the implementation of that interface, modifying the metadata file (a static xml document), and adding the name of the interface to a list in webfs.cgi.

webfs: The interfaces

The next layer up is some simple RESTful storage interfaces. One of the neat things about webfs (the protocol) is that you can attach as many storage interfaces as you want to a storage server, so it makes sense to factor out functionality into separate interfaces - for example, you access object metadata through a separate RESTful interface rather than through HTTP headers or query strings or some other random mechanism.

Simplicity is its own reward, when it comes to interface design. One of my main goals when coming up with these interfaces was to make it possible to implement an absolutely minimal webfs server in an afternoon. To facilitate this, this project also includes a test script, which can be run against any webfs server (it detects which interfaces are available on the server, and only tests the ones it knows about).

webfs: The protocol

If we're going to support an arbitrary number of storage interfaces on a single server, we need a discovery mechanism. At the root of a webfs server is a discovery document, which is just an XML document that lists the supported interfaces, and the paths they're available at.

Here's the incredibly simple idea that makes webfs possible: the module name is part of the path, before the object name. This lets you specify which module you want in a clean extensible way that's trivial for the server to handle. So, for example, if we have the metadata interface at the path "md", then the webfs URL for the size of an object would look something like this:

http://server.url/webfs/md/object-name?size

Interfaces are identified by a full URL, the same way as XML namespaces, so there's no chance of a collision.

Putting it all back together

So now we have:

  • A protocol which allows for easy extensibility
  • A set of simple-to-implement interfaces
  • An extensible and easy-to-deploy implementation

If we understand the web to be an operating system of sorts, then cloud storage services have the potential to be the "filesystem of the web". My goal with this project is to create the "VFS Layer" of the web - a system that enables innovation at the storage layer, by making it so that anybody can innovate.

FAQ

  • Why not just use WebDAV?

WebDAV does similar stuff, has some similar goals, and has the huge advantage of being widely implemented already. My main problem with WebDAV is that they're doing extensibility in an incredibly cumbersome way, so that there's no way to extend the protocol with your own modifications and let clients know that the extensions are available. Plus, they've fallen into the old trap of trying to communicate information about extensions in a single version number - a scheme that works great, until you have two orthogonal extensions, at which point you realize that it was a terrible idea.

(That, and I think the way they do extensions by making up a ton of new HTTP verbs is just sort of gross. >_>)

webfs, on the other hand, is designed to make extensibility trivial. In some sense, it could even be treated as a superset of WebDAV, since it's possible to implement WebDAV as a webfs module.

Documentation

Data interface

Metadata interface (TODO)

Hash interface (TODO)

Queue interface (TODO)

Updated