Is AtomPub already the HTTP of cloud computing?

A couple of weeks ago I asked Is OCCI the HTTP of cloud computing? I explained the limitations of HTTP in this context, which basically stem from the fact that the payloads it transfers are opaque. That’s fine when they’re [X]HTML because you can express links between resources within the resources themselves, but what about when they’re some other format – like OVF describing a virtual machine as may well be the case for OCCI? If I want to link between a virtual machine and its network(s) and/or storage device(s) then I’m out of luck… I need to either find an existing meta-model or roll my own from scratch.

That’s where Atom (or more specifically, AtomPub) comes in… in the simplest sense it adds a light, RESTful XML layer to HTTP which you can extend as necessary. It provides for collections (a ‘feed’ of multiple resources or ‘entries’ in a single HTTP message) as well as a simple meta-model for linking between resources, categorising them, etc. It also gives some metadata relating to unique identifiers, authors/contributors, caching information, etc., much of which can be derived from HTTP (e.g. URL <-> Atom ID, Last-Modified <-> updated).

Although it was designed with syndication in mind, it is a very good fit for creating APIs, as evidenced by its extensive use in:

I’d explain in more detail but Mohanaraj Gopala Krishnan has done a great job already in his AtomPub, Beyond Blogs presentation:

The only question that remains is whether or not this is the best we can do… stay tuned for the answer. The biggest players in cloud computing seem to think so (except Amazon, whose APIs predate Google’s and Microsoft’s) but maybe there’s an even simpler approach that’s been sitting right under our noses the whole time.

Is OCCI the HTTP of Cloud Computing?

The Web is built on the Hypertext Transfer Protocol (HTTP), a client-server protocol that simply allows client user agents to retrieve and manipulate resources stored on a server. It follows that a single protocol could prove similarly critical for Cloud Computing, but what would that protocol look like?

The first place to look for the answer is limitations in HTTP itself. For a start the protocol doesn’t care about the payload it carries (beyond its Internet media type, such as text/html), which doesn’t bode well for realising the vision of the [Semantic] Web as a “universal medium for the exchange of data”. Surely it should be possible to add some structure to that data in the simplest way possible, without having to resort to carrying complex, opaque file formats (as is the case today)?

Ideally any such scaffolding added would be as light as possible, providing key attributes common to all objects (such as updated time) as well as basic metadata such as contributors, categories, tags and links to alternative versions. The entire web is built on hyperlinks so it follows that the ability to link between resources would be key, and these links should be flexible such that we can describe relationships in some amount of detail. The protocol would also be capable of carrying opaque payloads (as HTTP does today) and for bonus points transparent ones that the server can seamlessly understand too.

Like HTTP this protocol would not impose restrictions on the type of data it could carry but it would be seamlessly (and safely) extensible so as to support everything from contacts to contracts, biographies to books (or entire libraries!). Messages should be able to be serialised for storage and/or queuing as well as signed and/or encrypted to ensure security. Furthermore, despite significant performance improvements introduced in HTTP 1.1 it would need to be able to stream many (possibly millions) of objects as efficiently as possible in a single request too. Already we’re asking a lot from something that must be extremely simple and easy to understand.

XML

It doesn’t take a rocket scientist to work out that this “new” protocol is going to be XML based, building on top of HTTP in order to take advantage of the extensive existing infrastructure. Those of us who know even a little about XML will be ready to point out that the “X” in XML means “eXtensible” so we need to be specific as to the schema for this assertion to mean anything. This is where things get interesting. We could of course go down the WS-* route and try to write our own but surely someone else has crossed this bridge before – after all, organising and manipulating objects is one of the primary tasks for computers.

Who better to turn to for inspiration than a company whose mission it is to “organize the world’s information and make it universally accessible and useful”, Google. They use a single protocol for almost all of their APIs, GData, and while people don’t bother to look under the hood (no doubt thanks to the myriad client libraries made available under the permissive Apache 2.0 license), when you do you may be surprised at what you find: everything from contacts to calendar items, and pictures to videos is a feed (with some extensions for things like searching and caching).

OCCI

Enter the OGF’s Open Cloud Computing Interface (OCCI) whose (initial) goal it is to provide an extensible interface to Cloud Infrastructure Services (IaaS). To do so it needs to allow clients to enumerate and manipulate an arbitrary number of server side “resources” (from one to many millions) all via a single entry point. These compute, network and storage resources need to be able to be created, retrieved, updated and deleted (CRUD) and links need to be able to be formed between them (e.g. virtual machines linking to storage devices and network interfaces). It is also necessary to manage state (start, stop, restart) and retrieve performance and billing information, among other things.

The OCCI working group basically has two options now in order to deliver an implementable draft this month as promised: follow Amazon or follow Google (the whole while keeping an eye on other players including Sun and VMware). Amazon use a simple but sprawling XML based API with a PHP style flat namespace and while there is growing momentum around it, it’s not without its problems. Not only do I have my doubts about its scalability outside of a public cloud environment (calls like ‘DescribeImages’ would certainly choke with anything more than a modest number of objects and we’re talking about potentially millions) but there are a raft of intellectual property issues as well:

  • Copyrights (specifically section 3.3 of the Amazon Software License) prevent the use of Amazon’s “open source” clients with anything other than Amazon’s own services.
  • Patents pending like #20070156842 cover the Amazon Web Services APIs and we know that Amazon have been known to use patents offensively against competitors.
  • Trademarks like #3346899 prevent us from even referring to the Amazon APIs by name.

While I wish the guys at Eucalyptus and Canonical well and don’t have a bad word to say about Amazon Web Services, this is something I would be bearing in mind while actively seeking alternatives, especially as Amazon haven’t worked out whether the interfaces are IP they should protect. Even if these issues were resolved via royalty free licensing it would be very hard as a single vendor to compete with truly open standards (RFC 4287: Atom Syndication Format and RFC 5023: Atom Publishing Protocol) which were developed at IETF by the community based on loose consensus and running code.

So what does all this have to do with an API for Cloud Infrastructure Services (IaaS)? In order to facilitate future extension my initial designs for OCCI have been as modular as possible. In fact the core protocol is completely generic, describing how to connect to a single entry point, authenticate, search, create, retrieve, update and delete resources, etc. all using existing standards including HTTP, TLS, OAuth and Atom. On top of this are extensions for compute, network and storage resources as well as state control (start, stop, restart), billing, performance, etc. in much the same way as Google have extensions for different data types (e.g. contacts vs YouTube movies).

Simply by standardising at this level OCCI may well become the HTTP of Cloud Computing.

rel=shortlink: url shortening that really doesn’t hurt the internet

Inspired primarily by the fact that the guys behind the RevCanonical fiasco are still stubbornly refusing to admit they got it wrong (the whole while arrogantly brushing off increasingly direct protests from the standards community) I’ve whipped up a Google App Engine application which reasonably elegantly implements rel=shortlink: url shortening that really doesn’t hurt the internet:

http://rel-shortlink.appspot.com

It works just like TinyURL and its ilk, accepting a URL and [having a crack at] shortening it. It checks both the response headers and (shortly) the HTML itself for rel=shortlink and if they’re not present then you have the option of falling back to a traditional service (the top half a dozen are preconfigured or you can specify your own via the API’s “fallback” parameter).

An interesting facet of this implementation is the warnings it gives if it encounters the similar-but-ambiguous short_url proposal and the fatal errors it throws up when it sniffs out the nothing-short-of-dangerous rev=canonical debacle. Apparently people (here’s looking at you Ars Technica and Dopplr) felt there was no harm in implementing these “protocols”. Now there most certainly is.

Here’s the high level details (from the page itself):

Who
A community service by Sam Johnston (@samj / s…@samj.net) of Australian Online Solutions, loosely based on a relatively good (albeit poorly executed) idea by some some web developers purporting to “save the Internet” while actually hurting it.
What
A mechanism for webmasters to indicate the preferred short URL(s) for a given resource, thereby avoiding the need to consult a potentially insecure/unreliable third-party for same. Resulting URLs reveal useful information about the source (domain) and subject (path):
http://tinyurl.com/cgy9pu » http://purl.org/net/shortlink
Where
The shortlink Google Code project, the rel-shortlink Google App Engine application, the #shortlink Twitter hashtag and coming soon to a client or site near you.
When
Starting April 2009, pending ongoing discussion in the Internet standards community (in the mean time you can also use http://purl.org/net/shortlink in place of shortlink).
Why
Short URLs are useful both for space constrained channels (such as SMS and Twitter) and also for anywhere URLs need to be manually entered (e.g. when they are printed or spoken). Third-party shorteners can cause many problems, including link rot, performance problems, outages and privacy & security issues.
How
By way of <link rel="shortlink"> HTML elements and/or Link: ; rel=shortlink HTTP headers.

So have at it and let me know what you think. The source code is available under the AGPL license for those who are curious as to how it works.

CBS/CNET/ZDNet interview on cloud standards and platforms

I’m a bit too busy right now for putting together my usual meticulously crafted blog posts and random thoughts have found a good home at Twitter (@samj), so I thought I’d copy an interview this week with CBS/CNET/ZDNet on the emotive topic of cloud standards. As you know I’m busy putting the finishing touches on the Open Cloud Initiative and am one of the main people driving the Open Cloud Computing Interface (OCCI), where I’m representing the needs of my large enterprise clients… we’re on track to deliver a nice clean cloud infrastructure services (IaaS) API next month as promised.

Anyway not sure when/if this will appear as I took a few days to respond, but here goes:

Questions:

1. Regarding infrastructure-as-a-service: Does the infrastructure matter? Whether it’s on Amazon’s EC2 for example — does it matter where your app is hosted?

Cloud infrastructure services (IaaS) should be easily commoditised (that is, where product selection becomes more dependent on price than differentiating features, benefits and value added services), but this is not yet the case. At projects like Open Grid Forum‘s recently launched Open Cloud Computing Interface (OCCI) we are fast working to make this a reality (potentially as soon as next month).According to the Open Cloud Initiative the two primary requirements for “Open Cloud” are open APIs and open formats. In the context of cloud infrastructure services that means OCCI (a draft of which will be available next month) and OVF (which was released last month) respectively. These open standards will allow users to easily migrate workloads from one provider to another in order to ensure that they are receiving the best possible service at the best possible price.

In the mean time providers typically differentiate on reputation, reliability and value added features (such as complementary components like Amazon S3 and SQS and network features like load balancing and security).

2. Regarding platform-as-a-service providers: What sort of tools would you require, and what tools/services would help sway your vote toward one platform over another? 

Open standards (particularly for APIs and formats) are far more important for cloud platform services (PaaS) than any tools that a provider offers. The trend today (with providers like Amazon, Google, Salesforce and Aptana) is to extend the Eclipse software development platform. That said, I expect web based development environments like Mozilla Bespin to become increasingly popular – providers like Heroku are leading the charge here.On the other hand cloud hosting offerings like Rackspace/Mosso’s Cloud Sites could also be considered a cloud platform in that I can upload open source applications like Drupal and MediaWiki and they will take care of the scaling for me, billing me for what resources I use. I like this approach because I get the benefits of cloud computing but I could easily move to a competitor like Dreamhost PS becuase there is virtually no vendor lock-in.

Conversely, while an application written and optimised for Google App Engine will operate and scale extremely well there, it could be very difficult to move elsewhere thanks to the modifications they have made to the Python and Java runtimes. Note that many of these modifications are necessary to enforce security and scalability.

For example, Sun is coming out with a platform stack for the cloud, which will give developers common services to hook their Java apps into. Is this something significant? What else would you like or need from providers?

That all depends on the environment they create and what interfaces they expose – a good test is how many existing Java applications will run on it without modification. Very few applications will run “out of the box” on Google App Engine but the modifications that need to be made should make the platform more scalable and cheaper overall than one running stock standard Java.Sun’s Simon Phipps sharply criticised Google earlier in the week, noting that “sub-sets of the core classes in the Java platform was forbidden for a really good reason, and it’s wanton and irresponsible to casually flaunt the rules“. That would lead me to believe that their offerings will be somewhat more compliant (and therefore enterprise friendly), but also somewhat more expensive.

One of the major sources of incompatibility here is the migration from relational databases (RDBMSs) to their cloud counterparts such as BigTable and SimpleDB. In order to enable massive scalability significant changes had to be made to core concepts and until we have an open standard interface for cloud databases (possibly following the examples of ODBC and DBI) interoperability at the platform layer will be challenging.

I’m also writing to providers like Amazon and Microsoft, to see if they have anything to add. 🙂 

Amazon are at the forefront of what I would call the “cloud operating environment”. They offer a number of criticial “cloud architecture” components (most notably SQS queues and more recently elastic MapReduce services) which can be assembled together to create arbitrarily large, loosely coupled cloud computing systems.

Microsoft’s Azure offering will also be interesting in that it is based on the Common Language Runtime. This will allow developers using their language of choice to target the platform, which has been something that has restricted Google App Engine to subsets of the developer community (first Python developers and now Java). It should in theory also be relatively straightforward to migrate from traditional architectures to their cloud platform.

Introducing rel=”shortlink”: a better alternative to URL shorteners

Yesterday I wrote rather critically about a surprisingly successful drive to implement a deprecated “rev” relationship. This developed virtually overnight in response to the growing “threat” (in terms of linkrot, security, etc.) of URL shorteners including tinyurl.com, bit.ly and their ilk.

The idea is simple: allow the sites to specify short URLs in the document/feed itself, either automatically ([compressed] unique identifier, timestamp, “initials” of the title, etc.) or manually (using a human-friendly slug). That way, when people need to reference the URL in a space constrained environment (e.g. microblogging like Twitter) or anywhere they need to be manually entered (e.g. printed or spoken) they can do so in a fashion that will continue to work so long as the target does and which reveals information about the content (such as its owner and a concise handle).

Examples of such short URLs include:

The idea is sound but the proposed implementation is less so. There is (or at least was) provision for “rev”erse link references but these have been deprecated in HTML 5. There is also a way of hinting the canonical URI by specifying a rel=”canonical” link. This makes a lot of sense because often the same document can be referred to by an infinite number of URIs (e.g. in search results, with sort orders, aliases, categories, etc.). Combine the two and you’ve got a way of saying “I am the canonical URI and this other URI happens to point at me too”, only it can only ever (safely) work for the canonical URL itself and it doesn’t make sense to list one arbitrary URL when there could be an infinite number.

Another suggestion was to use rel=”alternate shorter” but the problem here is that the content should be identical (except for superficial formatting changes such as highlighting and sort order), while “alternate” means “an alternate version of the resource” itself – e.g. a PDF version. Clients that understand “alternate” versions shoult not list the short URL as the content itself is (usually) the same.

Ben Ramsay got closest to the mark with A rev=”canonical” rebuttal but missed the “alternate” problem above, nonetheless suggesting a new rel=”shorter” relation. Problem there is the “short” URI is not guaranteed to be “shortest” or indeed even “shorter” – it still makes sense, for example, to specify a “short” URI of http://example.com/promo to a user viewing http://example.com/123 because the longer “short” URI conveys information about the content in addition to its host.

Accordingly I have updated WHATWG RelExtensions and will shortly submit the following to the IANA IESG for addition to the Atom Link Relations registry:

Value:
shortlink (http://purl.org/net/shortlink)

Description:
A short URI that refers to the same document.

Expected Display Characteristics:
This relation may be used as a concise reference to the document. It will
typically be shorter than other URIs (including the canonical URI) and may
rely on a [compressed] unique identifier or a human readable slug. It is
useful for space constrained environments such as email and microblogs as
well as for URIs that need to be manually entered (e.g. printed, spoken).
The referenced document may differ superficially from the original (e.g.
sort order, highlighting).

Security Considerations:
Automated agents should take care when this relation crosses administrative domains (e.g., the URI has a different authority than the current document). Such agents should also avoid circular references by resolving only once.

Note that in the interim “http://purl.org/net/shortlink” can be used. Bearing in mind that you should be liberal in what you accept, and conservative in what you send, servers should use the interim identifier for now and clients should accept both. Nobody should be accepting or sending rev=”canonical” or rel=”alternate shorter” given the problems detailed above.

Update: It seems there are still a few sensible people out there, like Robert Spychala with his Short URL Auto-Discovery document. Unfortunately he proposes a term with an underscore (short_url) when it should be a space and causes the usual URI/URL confusion. Despite people like Bernhard Häussner claiming that “short_url is best, it’s the only one that does not sound like shortened content“, I don’t get this reading from a “short” link… seems pretty obvious to me and you can always still use relations like “abstract” for that purpose. In any case it’s a valid argument and one that’s easily resolved by using the term “shortcutlink” instead (updated accordingly above). Clients could fairly safely use any link relation containing the string “short”.

Update: You can follow the discussion on Twitter at #relshortcut, #relshort and #revcanonical.

Update: I forgot to mention again that the HTTP Link: header can be used to allow clients to find the shortlink without having to GET and parse the page (e.g. by doing a HEAD request):

Link: <http://example.com/promo> rel="shortlink"

Update: Both Andy Mabbett and Stan Vassilev also independently suggested rel=shortcut, which leads me to believe that we’re on a winner. Stan adds that we’ve other things to consider in addition to the semantics and Google’s Matt Cutts points out why taking rather than giving canonical-ness (as in RevCanonical) is a notoriously bad idea.

Update: Thanks to the combination of Microsoft et al recommending the use of “shortcut icon” for favicon.ico (after stuffing our logs by quietly introducing this [mis]feature) and HTML link types being a space separated list (thanks @amoebe for pointing this out – I’d been looking at the Atom RFCs and assuming they used the single link type semantics), the term “shortcut” is effectively scorched earth. Not only is there a bunch of sites that already have “shortcut” links (even if the intention was that “shortcut icon” be atomic), but there’s a bunch of code that looks for “shortcut”, “icon” or “shortcut icon”. FWIW HTML 5 specifies the “icon” link type. Moral of the story: get consensus before implementing code.

As I still have problems with the URI/URL confusion (thus ruling out “shorturl”) but have come around to the idea that this should be a noun rather than an adjective, I now propose “shortlink” as a suitable, self-explanatory, impossible-to-confuse term.

Update: I’ve created a shortlink Google Group and kicked off a discussion with a view to reaching a consensus. I’ve also created a corresponding Google Code project and modified the shorter links WordPress plugin to implement shortlinks.

rev=”canonical” considered harmful (complete with sensible solution)

Sites like http://tinyurl.com/ provide a very simple service: turning unwieldly but information rich URLs like https://samj.net/2009/04/open-letter-to-community-regarding-open.html into something more manageable like http://tinyurl.com/ceze29. This was traditionally useful for emails with some clients mangling long URLs but it also makes sense for URLs in documents, on TV, radio, etc. (basically anywhere a human has to manually enter it). Shorteners are a dime a dozen now – there’s over 90 of them listed here alone… and I must confess to having created one at http://tvurl.com/ a few years back (the idea being that you could buy a TV friendly URL). Not a bad idea but there were other more important things to do at the time and I was never going to be able to buy my first island from the proceeds. Unfortunately though there are many problems with adding yet another layer of indirection and the repurcussions could be quite serious (bearing in mind even the more trustworthy sites tend to come and go).

So a while back I whipped up a thing called “springboard” for Google Apps/AppEngine (having got bored with maintaining text files for use with Apache’s mod_rewrite) which allowed users to create redirect URLs like http://go.example.com/promo (and which was apparently a good idea because now Google have their own version called short links). This is the way forward – you can tell at a glance who’s behind the link from the domain and you even get an idea of what you’re clicking through to from the path (provided you’re not being told fibs). When you click on this link you get flicked over to the real (long) URL with a HTTP redirect, probably a 301 which means “Moved Permanently”, so the browsers know what’s going on too. If your domain goes down then chances are the target will be out of action too (much the same story as with third-party DNS) so there’s a lot less risk. It’s all good news and if you’re using a CMS like Drupal then it could be completely automated and transparent – you won’t even know it’s there and clients looking for a short URL won’t have to go ask a third party for one.

So the problem is that nowdays you’ve got every man and his dog wanting to feed your nice clean (but long) URLs through the mincer in order to post them on Twitter. Aside from being a security nightmare (the resulting URLs are completely opaque, though now clients like Nambu are taking to resolving them back again!?!), it breaks all sort of things from analytics to news sites like Digg. Furthermore there are much better ways to achieve this. If you have to do a round trip to shorten the URL anyway, why not ask the site for a shorter version of its canonical URL (that being the primary or ‘ideal’ URL for the content – usually quite long and optimised for SEO)? In the case of Drupal at least every node has an ID so you can immediately boil URLs down to http://example.com/node/123, http://example.com/123 or even use something like base32 to get even shorter URLs like http://example.com/3R.

So how do we express this for the clients? The simplest way is to embed LINK tags into the HEAD section of the HTML and specify a sensible relation (“rel”). Normally these are used to specify alternative versions of the content, icons, etc. but there’s nothing to say that for any given URL(s) the “short” url is e.g. http://example.com/3R. That’s right, rel=”short”, not rel=”alternate shorter” or other such rubbish (“alternate” refers to alternate content, usually in a different mime-type, not just an alternate URL – here the content is likely to be exactly the same). It can be performance optimised somewhat too by setting an e.g. X-Rel-Short header so that users (e.g. Twitter clients) can resolve a long URL to the preferred short URL via a HTTP HEAD request, without having to retrieve and parse the HTML.

Another even less sensible alternative being peddled by various individuals (and being discussed here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and of course here) is [ab]using the rightly deprecated and confusing rev attribute ala rev=”canonical”. Basically this is saying “I am the authorative/canonical URL and this other URL happens to point here too”, without saying anything whatsoever about the URL itself actually being short. There could be an infinite number of such inbound URLs and this only ever works for the one canonical URL itself. Essentially this idea is stillborn and I sincerely hope that when people come back to work next week it will be promptly put out of its misery.

So in summary someone’s got carried away and started writing code (RevCanonical) without first considering all the implications. Hopefully they will soon realise this isn’t such a great idea after all and instead get behind the proposal for rel=”short” at the WHATWG. Then we can all just add links like this to our pages:

<link href=”http://example.com/promo&#8221; rel=”short”>

Incidentally I say “short” and not “shorter” because the short URL may not in fact be the shortest URL for a given resource – “http://example.com/3R&#8221; could well also map back to the same page but the URL is meaningless. And I leave out “alternate” because it’s not alternate content, rather just an alternate URL – a subtle but significant difference.

Let’s hope sanity prevails…

Update: The HTTP Link: header is a much more sensible solution to the HTTP header optimisation:

Link: <http://example.com/promo>; rel="short"

An open letter to the community regarding “Open Cloud”

I write this letter in order to be 100% transparent with you about a new initiative that could prove critical to the development of computing and the Internet: the protection of the term “Open Cloud” with a certification trademark (like British Standards’ Kitemark® and the FAIRTRADE symbol) as well as its definition via an open community consensus process.

Cloud computing users will soon be able to rest assured that offerings bearing the “Open Cloud” brand are indeed “open” in that critical freedoms (such as the right to access one’s own data in an open format via an open interface) are strongly protected. It will also ensure a level playing field for all vendors while keeping the barriers to enter the marketplace low. Offerings also bearing the “Open Source” mark will have additional freedoms relating to the use, modification and distribution of the underlying software itself.

Cloud computing is Internet (“cloud”) based development and use of computer technology (“computing”). It is the first significant paradigm shift since the introduction of the PC three decades ago and it is already changing our lives. Not only is it helping to deliver computing to “the other 3 billion” people, but also facilitating communication and collaboration, slashing costs and improving reliability by delivering computing as a utility (like electricity).

The Open Source industry is built around the Open Source Definition (OSD), which is itself maintained by the non-profit Open Source Initiative (OSI). The fledgling “Open Cloud” industry should be built on a similar set of well-defined Open Cloud Principles (OCP) and the associated Open Cloud Initiative (OCI) will closely follow their example. The proposed mission is simply “To define and protect ‘Open Cloud’” and the body will be developed from inception via an open process. Even if USPTO eventually reject our pending registration, by drawing attention to this critical issue now we may have already won.

I need your help, which is why I have called on individuals like Joi Ito and Bruce Perens, as well as established vendors including Google and Amazon (and their respective developer communities) for assistance. By way of this open letter, I commit to donate assets held in trust (domains, trademarks, etc.) to a non-profit similar in spirit to the Open Source Initiative which acts to protect the rights of the number one stakeholder: You.

Sam Johnston
Founder

Introducing the Open Cloud Principles (OCP)

In light of the rapidly increasing (and at times questionable) use of the term “Open Cloud” I hereby propose the following (draft) set of principles, inspired by the Open Source Initiative (OSI) with their Open Source Definition (OSD).

I would be interested to hear any feedback people have with a view to reaching a community consensus for what constitutes “Open Cloud” (in the same way that we have had clear guidelines for what constitutes “Open Source” for many years). You can do so in reply to this post, on the document’s talk page or by being bold and editing directly – if I don’t hear from you I’ll assume you’re satisfied.

Examples of uses today include:

For the latest version of the document please refer to http://www.opencloudinitiative.org/principles

Open Cloud Principles (OCP)

Overview
In order to stem the abuse of the term “Open Cloud” the community is forming a set of principles which should be met by any entity that wishes to use it, similar in spirit to the OSI‘s Open Source Definition for free software licenses.
Principles

  • No Barriers to Entry: There must be no obstacles in the path of an entity that make it difficult to enter. For example, membership fees, disproportionate capital expenditure relative to operational expenditure or dependencies on non-compliant products.
  • Rationale: Open Cloud offerings should be available to the maximum number and diversity of persons and groups. Competition must not be restricted.
  • No Barriers to Exit: There must be no obstacles in the path of an entity that make it difficult to leave. For example, a user must be able to obtain their data in a utile machine-readable form on a self-service basis.
  • Rationale: Obstacles that prevent entites from abandoning one offering for another reduce competition, which must not be restricted. If the barriers of exit are significant; a firm may be forced to continue competing in a market, as the costs of leaving may be higher than those incurred if they continue competing in the market.
  • No Discrimination: There must be no discrimination, including against any person or group of persons or specific field of endeavor. For example, it may not restrict the program from being used in certain countries, select certain people, by a commercial endeavour, or from being used for genetic research.
  • Rationale: All users should be allowed to participate without arbitrary screening.
  • Note: Some countries, including the United States, have export restrictions for certain types of products. An OCP-conformant product may warn users of applicable restrictions and remind them that they are obliged to obey the law; however, it may not incorporate such restrictions itself.
  • Interoperability: Where an appropriate standard exists for a given function it must be used rather than a proprietary alternative. Standards themselves must be clean and minimalist so as to be easily implemented and consumed. For example, if there is a suitable existing standard for single sign on than it must be used by default, although including support for alternative interfaces is permissible.
  • Rationale: Standards foster interoperability and competition giving rise to a fairer marketplace. The absence of standards and to a lesser extent, complex standards, have the opposite effect.
  • Licensing Freedom: Any material that is conveyed to users must be done so under a free license; approved by the Open Source Initiative (OSI) based on their Open Source Definition (OSD) in the case of software and Creative Commons licenses (except NonCommercial and/or NoDerivatives versions) for everything else.
  • Rationale: Free licenses impose no significant legal restriction relative to people’s freedom to use, redistribute, and produce modified versions of and works derived from the content.
  • Technological Neutrality: No provision of any license or agreement may be predicated on any individual technology or style of interface. For example, it may not require that network clients run a certain operating system or be written in a certain programming language.
  • Rationale: Such restrictions limit the utility of the solution and freedom of the user by preventing them from using their preferred solution.
  • Transparency: All related processes should be transparent and subject to public scrutiny from inception. Feedback from stakeholders should be solicited and incorporated with a view to reaching a community consensus. Conflicts of interest must be disclosed and should be further mitigated.
  • Rationale: Transparency implies openness, communication, and accountability and prevents unfairly advantaging or disadvantaging certain parties.

Acknowledgements

See also

Cloud Standards Roadmap

Almost a year ago in “Cloud Standards: not so fast…” I explained why standardisation efforts were premature. A lot has happened in the interim and it is now time to start intensively developing standards, ideally by deriving the “consensus” of existing implementations.

To get the ball rolling I’ve written a Cloud Standards Roadmap which can be seen as an authorative source for information spanning the various standardisation efforts (including identification of areas where effort is required).

Currently it looks like this:

Cloud Standards Roadmap
The cloud standards roadmap tracks the status of relevant standards efforts underway by established multi-vendor standards bodies.

Layer Description Group Project Status Due
Client ? ? ? ? ?
Software (SaaS) Operating environment W3C HTML 5 Draft 2008
Event-driven scripting language ECMA ECMAScript Mature 1997
Data-interchange format IETF JSON (RFC4627) Mature 2006
Platform (PaaS) Management API ? ? ? ?
Infrastructure (IaaS) Management API OGF Cloud Infrastructure API (CIA) Formation 2009
Container format for virtual machines DMTF Open Virtualisation Format (OVF) Complete 2009
Descriptive language for resources DMTF CIM Mature 1999
Fabric ? ? ? ? ?
Other standards efforts
Vendor-owned standards
Other resources

Approaching cloud standards with *vendor* focus only is full of fail

So I was taking stock of the cloud standards situation and found an insightful article (Cloudy clouds and standards) over at ComputerWorld via a colourful counterpoint over at f5 (Approaching cloud standards with end-user focus only is full of fail), hence the title. I made a comment which quickly turned into a blog post of its own (and was held for moderation anyway) so here goes:

I followed a link to this “short-sighted and selfish” view from Lori @ f5’s Approaching cloud standards with end-user focus only is full of fail rant and have to say that as an independent consultant representing the needs of large enterprise clients it’s not surprising that I should agree with you (representing the needs of end users in general) rather than a vendor.

Cloud computing is a paradigm shift (like mainframe to client server) and attempting to document it all in one rigid “ontology” is a futile exercise, as evidenced by the epic failure of attempts to do so thus far. A birds eye view of the landscape is possible, but only in the retrospective sense. One of the great things about cloud computing is that it is user-centric – for once the end-user has an opportunity to call the shots rather than being told what to do by vendors.

My various efforts (writing the Wikipedia article, setting up the Cloud Computing Community and more recently working on cloud standards starting with Platform as a Service) have all involved looking at what innovation is taking place in the industry and determining the consensus. Now is a very good time to do so as well because there are enough data points but no de facto proprietary standards (though the EC2 API is worryingly close to becoming one).

I tend to take advice from vendors on this topic with a grain of salt because most of their input tends to involve pulling the resulting “open standard” closer towards their particular offering – the Unified Cloud Interface (UCI) for example not only focuses on VM provisioning but goes so far as to include them specifically alongside Amazon and Google.

The user doesn’t [need to] care about this level of detail any more than they need to care about how a coal-fired power station works to turn on a light. The whole point of the cloud is that it conceals or “abstracts” details that ultimately become somebody else’s problem. Using the power analogy again, our “interfaces” to the electricity grid are very well standardised (2-4 pins and a certain voltage cycling at a certain frequency) and “The Cloud” needs similar interfaces (for example for storing data and uploading and managing workloads).

Once we have that computing will be quickly commoditised, which is every users’ best dream and vendors’ worst nightmare (except for the few, like Amazon and Google, who still have a seat after the computer industry’s next round of musical chairs).

In summary, cloud computing is finally an opportunity to shift the focus from the vendor to the user, where it arguably belongs. Vendors don’t like this of course (and anything they say on the subject should be viewed accordingly) and are doing everything they can to stake a claim in what is something equivalent of a gold rush. Only this time (unlike the dotcom bust) it’s real gold we’re talking about (not fools’ gold) and a large, sustainable (albeit heavily consolidated) industry of “computer power stations” and associated “megacomputer” supply chains will result.