How I tried to keep OCCI alive (and failed miserably)

I was going to let this one slide but following a calumniatory missive to his “followers” by the Open Cloud Computing Interface‘s self-proclaimed “Founder & Chair”, Sun refugee Thijs Metsch, I have little choice but to respond in my defense (particularly as “The Chairs” were actively soliciting followup from others on-list in support).

Basically a debate came to a head that has been brewing on- and off-list for months regarding the Open Grid Forum (OGF)‘s attempts to prevent me from licensing my own contributions (essentially the entire normative specification) under a permissive Creative Commons license (as an additional option to the restrictive OGF license) and/or submit them to the IETF as previously agreed and as required by the OGF’s own policies. This was on the grounds that “Most existing cloud computing specifications are available under CC licenses and I don’t want to give anyone any excuses to choose another standard over ours” and that the IETF has an excellent track record of producing high quality, interoperable, open specifications by way of a controlled yet open process. This should come as no surprise to those of you who know I am and will always be a huge supporter of open cloud, open source and open standards.

The OGF process had failed to deliver after over 12 months of deadline extensions – the current spec is frozen in an incomplete state (lacking critical features like collections, search, billing, security, etc.) as a result of being prematurely pushed into public comment, nobody is happy with it (including myself), the community has all but dissipated (except for a few hard core supporters, previously including myself) and software purporting to implement it actually implements something completely different altogether (see for yourself). There was no light at the end of the tunnel and with both OGF29 and IETF78 just around the corner I yesterday took a desperate gamble to keep OCCI alive (as a CC-licensed spec, an IETF Internet-Draft or both).

I confirmed that I was well within my rights to revoke any copyright, trademark and other rights previously granted (apparently it was amateur hour as OGF had failed to obtain an irrevocable license from me for my contributions) and volunteered to do so if restrictions on reuse by others weren’t lifted and/or the specification submitted to the IETF process as agreed and required by their own policies. Thijs’ colleague (and quite probably his boss at Platform Computing), Christopher Smith (who doubles as OGF’s outgoing VP of Standards) promptly responded, questioning my motives (which I can assure you are pure) and issuing a terse legal threat about how the “OGF will protect its rights” (against me over my own contributions no less). Thijs then followed up shortly after saying that they “see the secretary position as vacant from now on” and despite claims to the contrary I really couldn’t give a rats arse about a title bestowed upon me by a past-its-prime organisation struggling (and failing I might add) to maintain relevance. My only concern is that OCCI have a good home and if anything Platform have just captured the sort of control over it as VMware enjoy over DMTF/vCloud, with Thijs being the only remaining active editor.

I thought that would be the end of it and had planned to let sleeping dogs lie until today’s disgraceful, childish, coordinated and most of all completely unnecessary attack on an unpaid volunteer that rambled about “constructive technical debate” and “community driven consensus”, thanking me for my “meaningful contributions” but then calling on others to take up the pitchforks by “welcom[ing] any comments on this statement” on- or off-list. The attacks then continued on Twitter with another OGF official claiming that this “was a consensus decision within a group of, say, 20+ active and many many (300+) passive participants” (despite this being the first any of us had heard of it) and then calling my claims of copyright ownership “genuine bullshit” and report of an implementor instantly pulling out because they (and I quote) “can’t implement something if things are not stable” a “damn lie“, claiming I was “pissed” and should “get over it and stop crying” (needless to say they were promptly blocked).

Anyway as you can see there’s more to it than Thijs’ diatribe would have you believe and so far as I’m concerned OCCI, at least in it’s current form, is long since dead. I am undecided as to whether to revoke have revoked OGF’s licenses at this time but it probably doesn’t matter as they agree I retain the copyrights and I think their chance of success is negligible – nobody in their right mind would implement the product of such a dysfunctional group and those who already did have long since found alternatives. That’s not to say the specification won’t live on in another form but now the OGF have decided to go nuclear it’s going to have to be in a more appropriate forum – one that furthers the standard rather than constantly holding it back.

Update: My actions have been universally supported outside of OGF and in the press (and here and here and here and here etc.) but unsurprisingly universally criticised from within – right up to the chairman of the board who claimed it was about trust rather than IPR (BS – I’ve been crystal clear about my intentions from the very beginning). They’ve done a bunch of amateur lawyering and announced that “OCCI is becoming an OGF proposed standard” but have not been able to show that they were granted a perpetual license to my contributions (they weren’t). They’ve also said that “OGF is not really against using Creative Commons” but clearly have no intention to do so, apparently preferring to test my resolve and, if need be, the efficacy of the DMCA. Meanwhile back at the ranch the focus is on bright shiny things (RDF/RDFa) rather than getting the existing specification finished.

Protip: None of this has anything to do with my current employer so let’s keep it that way.

Is HTTP the HTTP of cloud computing?

Ok so after asking Is OCCI the HTTP of cloud computing? I realised that the position may have already been filled and that the question was more Is AtomPub already the HTTP of cloud computing?

After all my strategy for OCCI was to follow Google’s example with GData by adding some necessary functionality (a search interface, caching directives, resource-specific attributes, etc.). Most of the heavy lifting was actually being done by AtomPub, thus avoiding a huge amount of tedious and error-prone protocol writing (around 20,000 words of it) – something which OGF and the OCCI working group isn’t really geared up for anyway. This is clearly a workable and well-proven approach as it as been adopted strategically by both Microsoft and Google and also tactically by Salesforce and IBM, among others. Best of all adding things like queries and versioning is a manageable workload while starting from scratch is most certainly not.

But what if there were an easier way? Recall that the problem we are trying to solve is exposing a flexible interface to an arbitrarily large collection of interconnected compute, storage and network resources. We need to be able to describe and manipulate the resources (CRUD), associate them with each other via rich links (e.g. links with attributes like local identifiers – eth0, sda, etc.) and change their state (start, stop, restart, etc.), among other things.

Representational State Transfer (REST)

Actually we’re not talking about exposing the resources themselves (that would be impossible) but various representations of those resources – like Plato’s shadows on the cave walls – this is the “REpresentational” in “REpresentational State Transfer (REST)”. There’s an infinite number of possible representations so it’s impossible to try to capture them all now, but here’s some examples:

  • An Open Virtualisation Format (OVF) serialisation of a compute resource
  • A platform-specific descriptor file (e.g. VMX)
  • A complete archive of the virtual machine with its dependencies (OVA)
  • A graphical image of the console at a given point in time (‘snapshot’)
  • A video stream of the console for archiving/audit purposes (ala Citrix’s Project Iris)
  • The console itself (e.g. SSH, ICA, RDP, VNC)
  • Build documentation (e.g. PDF, ODF)
  • Esoteric enterprise requirements (e.g. NMS configuration)

It doesn’t take a rocket scientist to spot the correlation between this and HTTP’s existing content negotiation functionality (whereby a client can ask for a specific representation of a given resource – e.g. HTML vs PDF) so this is already pretty much solved for us (see HTTP’s Accept: header for the details). For bonus points this information should be exposed in the URI as it’s not always possible or convenient to set headers ala:

Web Linking

But what about the links? As I explained yesterday the web is built on links embedded in HTML documents using the A tag. Atom also provides enhanced linking functionality via the LINK element, where it is also possible to specify content types, languages, etc. In this case however we want to allow resources to be arbitrary types and more often than not we won’t have the ability to link within the payload itself. This leaves us with two options: put the links in the payload anyway by relying on a meta-model like Atom (or one we roll ourselves) or find some way to represent them within HTTP itself.

Enter HTTP headers which are also extensible and, as it turns out, in the process of being extended (or at least refined) to handle this very requirement by fellow down under, Mark Nottingham. See the “Web Linking” IETF Internet-Draft (draft-nottingham-http-link-header, at the time of writing version 05) for the nitty gritty details and the ietf-http-wg list for some current discussions. Basically it clarifies the existing Link: headers and the result looks something like this:

Link: <http://example.com/TheBook/chapter2>; rel="previous"; title="previous chapter">

The Link: header itself is also extensible so we can faithfully represent our model by adding e.g. the local device name when linking storage and network resources to compute resources and other requisite attributes. It would be helpful if the content-type were also specified (Atom allows for multiple links of the same relation provided the content-type differs for example) but language is already covered by HTTP (it doesn’t seem useful to advertise French links to someone who already asked to speak English).

It’s also interesting to note that earlier versions of the HTTP RFCs actually [poorly] specified both the Link: headers as well as LINK and UNLINK methods for maintaining links between web resources. John Pritchard had a crack at clarification in the Efficient HyperLink Maintenance for HTTP I-D but like most I-Ds this one seems to have died after 6 months, and with it the methods themselves. It seems to me that adding HTTP methods at this time is a drastic (and almost certainly infeasible) action, especially for something that could just as easily be accomplished via headers ala Set-Cookie: (too bad the I-D doesn’t specify how to add/delete/modify links!). In the simplest sense a Link: header appearing in a PUT or POST could replace the existing one(s) but something more elegant for acting on individual links would be nice – probably a discussion worth having on the ietf-http-wg list.

Organisation of Information

Looking back to Atom for a second we’re still missing some key functionality:

  • Atom id -> HTTP URL
  • Atom updated -> HTTP Last-Modified: Header
  • Atom title and summary -> Atom/HTTP Slug: Header or equivalent
  • Atom link -> HTTP Link: Header
  • Atom category -> ???

Houston, we have a problem. OCCI use cases range from embedded hypervisors exposing a single resource to a single entry-point for an entire enterprise or the “Great Global Grid” – we need a way to organise, categories and search for the information, likely including:

  • Free text search via a Google-style “?q=firewall” syntax
  • Taxonomy via categories (already done for Atom) for things like “Operating System” and “Data Center”
  • Folksonomy via [user] tags (already done for Atom and bearing in mind that tag spaces are cool) for things like “testlab”

Fortunately the good work already done in this area for Atom would be realatively easy to port to a Category: HTTP header, following the Link: header example above. In the mean time a standard search interface (including category support) is trivial and thanks to Google, already done.

Structured Data Formats

HTML also resolves another pressing issue – what format to use for submitting key-value pairs (which constitutes a large part of what we need to do with OCCI). It gives us two options:

The advantages of being able to create a resource from a web form simply by POSTing to the collection of resources (e.g. http://example.com/compute), and with HTML 5 by PUTting the resource in place directly (e.g. http://example.com/compute/<uuid&gt;) are immediately obvious. Not only does this help make the human and programmable web one and the same (which in turn makes it much easier for developers/users to kick the tyres and understand the API) but it means that scripting even advanced tasks with curl/wget would be trivial. Plus there’s no place for time-wasting religious arguments about angle brackets (XML) over curly braces (JSON).

RESTful State Machines

Something else which has not sat well with me until I spent the weekend ingesting RESTful Web Services book (by Leonard Richardson and Sam Ruby) was the “actuator” concept we picked up from the Sun Cloud APIs. This breaks away from RESTful principles by exposing an RPC-style API for triggering state changes (e.g. start, stop, restart). Granted it’s an improvement on the alternative (GETting a resource and PUTting it back with an updated state) as Tim Bray explains in RESTful Casuistry (to which Roy Fielding and Bill de hÓra also responded), but it still “feels funky”. Sure it doesn’t make any sense to try to “force” a monitored status to some other value (for example setting a “state” attribute to “running”), especially when we can’t be sure that’s the state we’ll get to (maybe there will be an error or the transition will be dependent on some outcome over which we have no control). Similarly it doesn’t make much sense to treat states as nouns, for example adding a “running” state to a collection of states (even if a resource can be “running” and “backing up” concurrently). But is using URLs as “buttons” representing verbs/transitions the best answer?

What makes more sense [to me] is to request a transition and check back for updates (e.g. by polling or HTTP server push). If it’s RESTful to POST comments to an article (which in addition to its own contents acts as a collection of zero or more comments) then POSTing a request to change state to a [sub]resource also makes sense. As a bonus these can be parametrised (for example a “resize” request can be accompanied with a “size” parameter and a “stop” request sent with clarification as to whether an “ACPI Off” or “Pull Cord” is required). Transitions that take a while, like “format” on a storage resource, can simply return HTTP 201 Accepted so we’ve got support for asynchronous actions as well – indeed some requests (e.g. “backup”) may not even be started immediately. We may also want to consider using something like Post Once Exactly (POE) to ensure that requests like “restart” aren’t executed repeatedly and that we can cancel requests that the system hasn’t had a chance to deal with yet.

Exactly how this should look in terms of URL layout I’m not sure (perhaps http://example.com/<resource>/requests) but being able to enumerate the possible actions as well as acceptable parameters (e.g. an enum for variations on “stop” or a range for “resize”) would be particularly useful for clients.

Collections

This is all well and good for individual resources, but collections are still a serious problem. There are many use cases which involve retrieving an arbitrarily large number of resources and making a HTTP request for each (as well as requests for enumeration etc.) doesn’t make sense. More importantly, it doesn’t scale – particularly in enterprise environments where requests via proxies and filters can suffer from high latency (if not low bandwidth).

One potential solution is to strap multiple HTTP message entities together as a multipart document, but that’s hardly clean and results in some hairy coding on the client side (e.g. manual manipulation of HTTP messages that would otherwise be fully automated). The best solution we currently have for this problem (as evidenced by widespread deployment) is AtomPub so I’m still fairly sure it’s going to have to make an appearance somewhere, even if it doesn’t wrap all of the resources by default.

Is AtomPub already the HTTP of cloud computing?

A couple of weeks ago I asked Is OCCI the HTTP of cloud computing? I explained the limitations of HTTP in this context, which basically stem from the fact that the payloads it transfers are opaque. That’s fine when they’re [X]HTML because you can express links between resources within the resources themselves, but what about when they’re some other format – like OVF describing a virtual machine as may well be the case for OCCI? If I want to link between a virtual machine and its network(s) and/or storage device(s) then I’m out of luck… I need to either find an existing meta-model or roll my own from scratch.

That’s where Atom (or more specifically, AtomPub) comes in… in the simplest sense it adds a light, RESTful XML layer to HTTP which you can extend as necessary. It provides for collections (a ‘feed’ of multiple resources or ‘entries’ in a single HTTP message) as well as a simple meta-model for linking between resources, categorising them, etc. It also gives some metadata relating to unique identifiers, authors/contributors, caching information, etc., much of which can be derived from HTTP (e.g. URL <-> Atom ID, Last-Modified <-> updated).

Although it was designed with syndication in mind, it is a very good fit for creating APIs, as evidenced by its extensive use in:

I’d explain in more detail but Mohanaraj Gopala Krishnan has done a great job already in his AtomPub, Beyond Blogs presentation:

The only question that remains is whether or not this is the best we can do… stay tuned for the answer. The biggest players in cloud computing seem to think so (except Amazon, whose APIs predate Google’s and Microsoft’s) but maybe there’s an even simpler approach that’s been sitting right under our noses the whole time.

Is OCCI the HTTP of Cloud Computing?

The Web is built on the Hypertext Transfer Protocol (HTTP), a client-server protocol that simply allows client user agents to retrieve and manipulate resources stored on a server. It follows that a single protocol could prove similarly critical for Cloud Computing, but what would that protocol look like?

The first place to look for the answer is limitations in HTTP itself. For a start the protocol doesn’t care about the payload it carries (beyond its Internet media type, such as text/html), which doesn’t bode well for realising the vision of the [Semantic] Web as a “universal medium for the exchange of data”. Surely it should be possible to add some structure to that data in the simplest way possible, without having to resort to carrying complex, opaque file formats (as is the case today)?

Ideally any such scaffolding added would be as light as possible, providing key attributes common to all objects (such as updated time) as well as basic metadata such as contributors, categories, tags and links to alternative versions. The entire web is built on hyperlinks so it follows that the ability to link between resources would be key, and these links should be flexible such that we can describe relationships in some amount of detail. The protocol would also be capable of carrying opaque payloads (as HTTP does today) and for bonus points transparent ones that the server can seamlessly understand too.

Like HTTP this protocol would not impose restrictions on the type of data it could carry but it would be seamlessly (and safely) extensible so as to support everything from contacts to contracts, biographies to books (or entire libraries!). Messages should be able to be serialised for storage and/or queuing as well as signed and/or encrypted to ensure security. Furthermore, despite significant performance improvements introduced in HTTP 1.1 it would need to be able to stream many (possibly millions) of objects as efficiently as possible in a single request too. Already we’re asking a lot from something that must be extremely simple and easy to understand.

XML

It doesn’t take a rocket scientist to work out that this “new” protocol is going to be XML based, building on top of HTTP in order to take advantage of the extensive existing infrastructure. Those of us who know even a little about XML will be ready to point out that the “X” in XML means “eXtensible” so we need to be specific as to the schema for this assertion to mean anything. This is where things get interesting. We could of course go down the WS-* route and try to write our own but surely someone else has crossed this bridge before – after all, organising and manipulating objects is one of the primary tasks for computers.

Who better to turn to for inspiration than a company whose mission it is to “organize the world’s information and make it universally accessible and useful”, Google. They use a single protocol for almost all of their APIs, GData, and while people don’t bother to look under the hood (no doubt thanks to the myriad client libraries made available under the permissive Apache 2.0 license), when you do you may be surprised at what you find: everything from contacts to calendar items, and pictures to videos is a feed (with some extensions for things like searching and caching).

OCCI

Enter the OGF’s Open Cloud Computing Interface (OCCI) whose (initial) goal it is to provide an extensible interface to Cloud Infrastructure Services (IaaS). To do so it needs to allow clients to enumerate and manipulate an arbitrary number of server side “resources” (from one to many millions) all via a single entry point. These compute, network and storage resources need to be able to be created, retrieved, updated and deleted (CRUD) and links need to be able to be formed between them (e.g. virtual machines linking to storage devices and network interfaces). It is also necessary to manage state (start, stop, restart) and retrieve performance and billing information, among other things.

The OCCI working group basically has two options now in order to deliver an implementable draft this month as promised: follow Amazon or follow Google (the whole while keeping an eye on other players including Sun and VMware). Amazon use a simple but sprawling XML based API with a PHP style flat namespace and while there is growing momentum around it, it’s not without its problems. Not only do I have my doubts about its scalability outside of a public cloud environment (calls like ‘DescribeImages’ would certainly choke with anything more than a modest number of objects and we’re talking about potentially millions) but there are a raft of intellectual property issues as well:

  • Copyrights (specifically section 3.3 of the Amazon Software License) prevent the use of Amazon’s “open source” clients with anything other than Amazon’s own services.
  • Patents pending like #20070156842 cover the Amazon Web Services APIs and we know that Amazon have been known to use patents offensively against competitors.
  • Trademarks like #3346899 prevent us from even referring to the Amazon APIs by name.

While I wish the guys at Eucalyptus and Canonical well and don’t have a bad word to say about Amazon Web Services, this is something I would be bearing in mind while actively seeking alternatives, especially as Amazon haven’t worked out whether the interfaces are IP they should protect. Even if these issues were resolved via royalty free licensing it would be very hard as a single vendor to compete with truly open standards (RFC 4287: Atom Syndication Format and RFC 5023: Atom Publishing Protocol) which were developed at IETF by the community based on loose consensus and running code.

So what does all this have to do with an API for Cloud Infrastructure Services (IaaS)? In order to facilitate future extension my initial designs for OCCI have been as modular as possible. In fact the core protocol is completely generic, describing how to connect to a single entry point, authenticate, search, create, retrieve, update and delete resources, etc. all using existing standards including HTTP, TLS, OAuth and Atom. On top of this are extensions for compute, network and storage resources as well as state control (start, stop, restart), billing, performance, etc. in much the same way as Google have extensions for different data types (e.g. contacts vs YouTube movies).

Simply by standardising at this level OCCI may well become the HTTP of Cloud Computing.