“Bare Metal” cloud infrastructure “compute” services arrive

Earlier in the year during the formation of the Open Cloud Computing Interface (OCCI) working group I described three types of cloud infrastructure “compute” services:

  • Physical Machines (“Bare Metal”) which are essentially dedicated servers provisioned on a utility basis (e.g. hourly), whether physically independent or just physically isolated (e.g. blades)
  • Virtual Machines which nowadays uses hypervisors to split the resources of a physical host amongst various guests, where both the host and each of the guests run a separate operating system instance. For more details on emulation vs virtualisation vs paravirtualisation see a KB article I wrote for Citrix a while back: CTX107587 Virtual Machine Technology Overview
  • OS Virtualisation (e.g. containers, zones, chroots) which is where a single instance of an operating system provides multiple isolated user-space instances.

While the overwhelming majority of cloud computing discussions today focus on virtual machines, the reason for my making the distinction was so as the resulting API would be capable of dealing with all possibilities. The clouderati are now realising that there’s more to life than virtual machines and that the OS is likea cancer that sucks energy (e.g. resources, cycles), needs constant treatment (e.g. patches, updates, upgrades) and poses significant risk of death (e.g. catastrophic failure) to any application it hosts“. That’s some good progress – now if only the rest of the commentators would quit referring to virtualisation as private cloud so we can focus on what’s important rather than maintaining the status quo.

Anyway such cloud services didn’t exist at the time but in France at least we did have providers like Dedibox and Kimsufi who would provision a fixed configuration dedicated server for you pretty much on the spot starting at €20/month (<€0.03/hr or ~$0.04/hr). I figured there was nothing theoretically stopping this being fully automated and exposed via a user (web) or machine (API) interface, in which case it would be indistinguishable from a service delivered via VM (except for a higher level of isolation and performance). Provided you’re billing as a utility (that is, users can consume resources as they need them and are billed only for what they use) rather than monthly or annually and taking care of all the details “within” the cloud there’s no reason this isn’t cloud computing. After all, as an end user I needn’t care if you’re providing your service using an army of monkeys, so long as you are. PCI compliance anyone?

Virtually all of the cloud infrastructure services people talk about today are based on virtual machines and the market price for a reasonably capable one is $0.10/hr or around $72.00 per month. That’s said to be 3-5x more than cost at “cloud scale” (think Amazon) so expect that price to drop as the market matures. Rackspace Cloud are already offering small Xen VMs for 1.5c/hr or ~$10/month. I won’t waste any more time talking about these offerings as everyone else already is. This will be a very crowded space thanks in no small part to VMware’s introduction of vCloud (which they claim turns any web hoster into a cloud provider) but with the hypervisor well and truly commoditised I assure you there’s nothing to see here.

On the lightweight side of the spectrum, VPS providers are a dime a dozen. These guys generally slice Linux servers up into tens if not hundreds of accounts for only a few dollars a month and take care of little more than the (shared) kernel, leaving end users to install the distribution of their choice as root. Solaris has zones and even Windows has MultiWin built in now days (that’s the technology, courtesy Citrix, that allows multiple users each having their own GUI session to coexist on the same machine – it’s primarily used for Terminal Services & Fast User Switching but applications and services can also run in their own context). This delivers most of the benefits of a virtual machine, only without the overhead and cost of running and managing multiple operating systems side by side. Unfortunately nobody’s really doing this yet in cloud but if they were you’d be able to get machines for tasks like mail relaying, spam filtering, DNS, etc. for literally a fraction of a penny per hour (VPSs start at <$5/m or around 0.7c/hr).

So the reason for my writing this post today is that SoftLayer this week announced the availability of “Bare Metal Cloud” starting at $0.15 per hour. I’m not going to give them any props for having done so thanks for their disappointing attempt to trademark the obvious and generic term “bare metal cloud” and due to unattractive hourly rates that are almost four times the price of the monthly packages by the time you take into account data allowances. I will however say that it’s good to see this prophecy (however predictable) fulfilled.

I sincerely hope that the attention will continue to move further away from overpriced and inefficient virtual machines and towards more innovative approaches to virtualisation.

“Twitter” Trademark in Trouble Too

Yesterday I apparently struck a nerve in revealing Twitter’s “Tweet” Trademark Torpedoed. The follow up commentary both on this blog and on Twitter itself was interesting and insightful, revealing that in addition to likely losing “tweet” (assuming you accept that it was ever theirs to lose) the recently registered Twitter trademark itself (#77166246) and pending registrations for the Twitter logo (#77721757, #77721751) are also on very shaky ground.

Trademarks 101

Before we get into details as to how this could happen lt’s start with some background. A trademark is one of three main types of intellectual property (the others being copyrights and patents) in which society grants a monopoly over a “source identifier” (e.g. a word, logo, scent, etc.) in return for being given some guarantee of quality (e.g. I know what I’m getting when I buy a bottle of black liquid bearing the Coke® branding). Anybody can claim to have a trademark but generally they are registered which makes the process of enforcing the mark much easier. The registration process itself is thus more of a sanity check – making sure everything is in order, fees are paid, the mark is not obviously broken (that is, unable to function as a source identifier) and perhaps most importantly, that it doesn’t clash with other marks already issued.

Trademarks are also jurisdictional in that they apply to a given territory (typically a country but also US states) but to make things easier it’s possible to use the Madrid Protocol to extend a valid trademark in one territory to any number of others (including the EU which is known as a “Community Trademark”). Of course if the first trademark fails (within a certain period of time) then those dependent on it are also jeopardised. Twitter have also filed applications using this process.

Moving right along, there are a number of different types of trademarks, starting with the strongest and working back:

  • Fanciful marks are created specifically to be trademarks (e.g. Kodak) – these are the strongest of all marks.
  • Arbitrary marks have a meaning but not in the context in which they are used as a trademark. We all know what an apple is but when used in the context of computers it is meaningless (which is how Apple Computer is protected, though they did get in trouble when they started selling music and encroached on another trademark in the process). Similarly, you can’t trademark “yellow bananas” but you’d probably get away with “blue bananas” or “cool bananas” because they don’t exist.
  • Suggestive marks hint at some quality or characteristic without describing the product (e.g. Coppertone for sun-tan lotion)
  • Descriptive marks describe some quality or characteristic of the product and are unregistrable in most trademark offices and unprotectable in most courts. “Cloud computing” was found to be both generic and descriptive by USPTO last year in denying Dell. Twitter is likely considered a descriptive trademark (but one could argue it’s now also generic).
  • Generic marks cannot be protected as the name of a product or service cannot function as a source identifier (e.g. Apple in the context of fruits, but not in the context of computers and music)


Twitter’s off to a bad start already in their selection of names – while Google is a deliberate misspelling of the word googol (suggesting the enormous number of items indexed), the English word twitter has a well established meaning that relates directly to the service Twitter, Inc. provides. It’s the best part of 1,000 years old too, derived around 1325–75 from ME twiteren (v.); akin to G zwitschern:

– verb (used without object)

1. to utter a succession of small, tremulous sounds, as a bird.
2. to talk lightly and rapidly, esp. of trivial matters; chatter.
3. to titter, giggle.
4. to tremble with excitement or the like; be in a flutter.

– verb (used with object)

5. to express or utter by twittering.

– noun

6. an act of twittering.
7. a twittering sound.
8. a state of tremulous excitement.

Although the primary meaning people associate these days is that of a bird, it cannot be denied that “twitter” also means “to talk lightly and rapidly, esp. of trivial matters; chatter“. The fact it is now done over the Internet matters not in the same way that one can “talk” or “chat” over it (and telephones for that matter) despite the technology not existing when the words were conceived. Had “twitter” have tried to obtain a monopoly over a more common words like “chatter” and “chat” there’d have been hell to pay, but that’s not to say they should get away with it now.

Let’s leave the definition at that for now as twitter have managed to secure registration of their trademark (which does not imply that it is enforceable). The point is that this is the weakest type of trademark already and some (including myself) would argue that it a) should never have been allowed and b) will be impossible to enforce. To make matters worse, Twitter itself has gained an entry in the dictionary as both a noun (“a website where people can post short messages about their current activities“) and a verb (“to write short messages on the Twitter website“) as well as the AP Sytlebook for good measure. This could constitute “academic credability” or “trademark kryptonite” depending how you look at it.


This brings us to the more pertinent point, trademark enforcement, which can essentially be summed up as “use it or lose it”. As at today I have not been able to find any reference whatsoever, anywhere on twitter.com, to any trademark rights claimed by Twitter, Inc. Sure they assert copyright (“© 2009 Twitter”) but that’s something different altogether – I have never seen this before and to be honest I can’t believe my eyes. I expect they will fix this promptly in the wake of this post by sprinking disclaimers and [registered®] trademark (TM) and servicemark (SM) symbols everywhere, but the Internet Archive never lies so once again it’s likely too little too late. If you don’t tell someone it’s a trademark then how are they supposed to avoid infringing it?

Terms of Service

The single reference to trademarks (but not “twitter” specifically) I found was in the terms of service (which are commendably concise):

We reserve the right to reclaim usernames on behalf of businesses or individuals that hold legal claim or trademark on those usernames.

That of course didn’t stop them suspending @retweet shortly after filing for the ill-fated “tweet” trademark themselves, but that’s another matter altogether. The important point is that they don’t claim trademark rights and so far as I can tell, never have.


To rub salt in the (gaping) wound they (wait for it, are you sitting down?) offer their high resolution logos for anyone to use with no mention whatsoever as to how they should and shouldn’t be used (“Download our logos“) – a huge no-no for trademarks which must be associated with some form of quality control. Again there is no trademark claim, no ™ or ® symbols, and for the convenience of invited infringers, no less than three different high quality source formats (PNG, Adobe Illustrator and Adobe Photoshop):


Then there’s the advertising, oh the advertising. Apparently Twitter HQ didn’t get the memo about exercising extreme caution when using your trademark; lest be the trademark holder who refers to her product or service as a noun or a verb but Twitter does both, even in 3rd-party advertisements (good luck trying to get an AdWords ad containing the word “Google”):

Internal Misuse

Somebody from Adobe or Google please explain to Twitter why it’s important to educate users that they don’t “google” or “photoshop”, rather “search using Google®” and “edit using Photoshop®”. Here’s some more gems from the help section:

  • Now that you’re twittering, find new friends or follow people you already know to get their twitter updates too.
  • Wondering who sends tweets from your area?
  • @username + message directs a twitter at another person, and causes your twitter to save in their “replies” tab.
  • FAV username marks a person’s last twitter as a favorite.
  • People write short updates, often called “tweets” of 140 characters or fewer.
  • Tweets with @username elsewhere in the tweet are also collected in your sidebar tab; tweets starting with @username are replies, and tweets with @username elsewhere are considered mentions.
  • Can I edit a tweet once I post it?
  • What does RT, or retweet, mean? RT is short for retweet, and indicates a re-posting of someone else’s tweet. This isn’t an official Twitter command or feature, but people add RT somewhere in a tweet to indicate that part of their tweet includes something they’re re-posting from another person’s tweet, sometimes with a comment of their own. Check out this great article on re-tweeting, written by a fellow Twitter user, @ruhanirabin. <- FAIL x 7


According to this domain search there are currently 6,263 domains using the word “twitter”, almost all in connection with microblogging. To put that number in perspective, if Twitter wanted to take action against these registrants given current UDRP rates for a single panelist we’re talking $9,394,500 in filing fees alone (or around 1.5 billion nigerian naira if that’s not illustrative enough for you). That’s not including the cost of preparing the filings, representation, etc. that their lawyers (Fenwick & West LLP) would likely charge them.

If you (like Doug Champigny) happen to be on the receiving end of one of these letters recently you might just want to politely but firmly point them at the UDRP and have them prove, among other things, that you were acting in bad faith (don’t bother coming crying to me if they do though – this post is just one guy’s opinion and IANAL remember ;).

I could go on but I think you get the picture – Twitter has done such a poor job of protecting the Twitter trademark that they run the risk of losing it forever and becoming a lawschool textbook example of what not to do. There are already literally thousands of products and services [ab]using their brand and while some have recently succombed to the recent batch legal threats they may well have more trouble now that people know their rights and the problem is being actively discussed. Furthermore, were it not for being extremely permissive with the Twitter brand from the outset they arguably would not have had anywhere near as large a following as they do now. It is only with the dedicated support of the users and developers they are actively attacking that they have got as far as they have.

The Problem: A Microblogging Monopoly

Initially it was my position that Twitter had built their brand and deserved to keep it, but that they had gone too far with “tweet”. Then in the process of writing this story I re-read the now infamous May The Tweets Be With You post that prompted the USPTO to reject their application hours later and it changed my mind too. Most of the media coverage took the money quote out of context but here it is in its entirity (emphasis mine):

We have applied to trademark Tweet because it is clearly attached to Twitter from a brand perspective but we have no intention of “going after” the wonderful applications and services that use the word in their name when associated with Twitter.

Do you see what’s happening here? I can’t believe I missed it on the first pass. Twitter are happy for you to tweet to your heart’s content provided you use their service. That is, they realised that outside of the network effects of having millions of users all they really do is push 1’s and 0’s around (and poorly at that). They go on to say:

However, if we come across a confusing or damaging project, the recourse to act responsibly to protect both users and our brand is important.

Today’s batch of microblogging clients are hard wired to Twitter’s servers and as a result (or vice versa) they have an effective microblogging monopoly. Twitter, Inc has every reason to be happy with that outcome and is naturally seeking to protect it – how better than to have an officially sanctioned method with which to beat anyone who dare stray from the path by allowing connections to competitors like identi.ca? That’s exactly what they mean with the “when associated with Twitter” language above and by “confusing or damaging” they no doubt mean “confusing or damaging [to Twitter, Inc]”.

The Solution: Distributed Social Networking

Distributed social networking and open standards in general (in the traditional rather than Microsoft sense) are set to change that, but not if the language society uses (and has used for hundreds of years) is granted under an official monopoly to Twitter, Inc – it’s bad enough that they effectively own the @ namespace when there are existing open standards for it. Just imagine if email was a centralised system and everything went through one [unreliable] service – brings a new meaning to “email is down”! Well that’s Twitter’s [now not so] secret strategy: to be the “pulse of the planet” (their words, not mine).

Don’t get me wrong – I think Twitter’s great and will continue to twitter and tweet as @samj so long as it’s the best microblogging platform around – but I don’t want to be forced to use it because it’s the only one there is. Twitter, Inc had ample chance to secure “twitter” as a trademark and so far as I am concerned they have long since missed it (despite securing dubious and likely unenforceable registrations). Now they need to play on a level playing field and focus on being the best service there is.

Update: Before I get falsely accused of brand piracy let me clarify one important point: so far as I am concerned while Twitter can do what they like with their logo (despite continuing to give it away to the entire Internet no strings attached), the words “twitter” and “tweet” are fair game as they have been for the last 700+ years and will be for the next 700. From now on “twitter” for me means “generic microblog” and “tweet” means “microblog update”.

If I had a product interesting enough for Twitter, Inc to send me one of their infamous C&D letters I would waste no time whatsoever in scanning it, posting it here and making fun of them for it. I’m no thief but I am a fervent believer in open standards.

Organising the Internet with Web Categories

In order to scratch an itch relating to the Open Cloud Computing Interface (OCCI) I submitted my first Internet-Draft to the IETF this week: Web Categories (draft-johnston-http-category-header).

The idea’s fairly simple and largely inspired by the work of others (most notably the original HTTP and Atom authors, and a guy down under who’s working on another draft). It defines an intuitive mechanism for web servers to express flexible category information for any resource (including opaque/binary/non-HyperText formats) in the HTTP headers, allowing users to categorise web resources into vocabularies or “schemes” and assign human-friendly “labels” in addition to the computer-friendly “terms”.

This approach to taxonomies was lifted directly from (and is thus 100% compatible with) Atom and is another step closer to being able to render individual resources natively over HTTP rather than encoded and wrapped in XML (which gets unwieldly when you’re dealing with multi-gigabyte virtual machines, as we are with OCCI).

It’s anybody’s guess where the document will go from here – it’s currently marked “Experimental” but with any luck it will pique the interest of the standards and/or semantic web community and go on to live a long and happy life.

Internet Engineering Task Force                              S. Johnston
Internet-Draft                               Australian Online Solutions
Intended status: Experimental                               July 1, 2009
Expires: January 2, 2010

                             Web Categories

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on January 2, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.


   This document specifies the Category header-field for HyperText
   Transfer Protocol (HTTP), which enables the sending of taxonomy
   information in HTTP headers.

Johnston                 Expires January 2, 2010                [Page 1]

Internet-Draft              Abbreviated Title                  July 2009

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . . . 3
   2.  Categories  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  The Category Header Field . . . . . . . . . . . . . . . . . . . 4
     3.1.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . 4
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
     4.1.  Category Header Registration  . . . . . . . . . . . . . . . 5
   5.  Security Considerations . . . . . . . . . . . . . . . . . . . . 5
   6.  Internationalisation Considerations . . . . . . . . . . . . . . 5
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
     7.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
     7.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
   Appendix A.  Notes on use with HTML . . . . . . . . . . . . . . . . 7
   Appendix B.  Notes on use with Atom . . . . . . . . . . . . . . . . 7
   Appendix C.  Acknowledgements . . . . . . . . . . . . . . . . . . . 8
   Appendix D.  Document History . . . . . . . . . . . . . . . . . . . 8
   Appendix E.  Outstanding Issues . . . . . . . . . . . . . . . . . . 8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 9

Johnston                 Expires January 2, 2010                [Page 2]

Internet-Draft              Abbreviated Title                  July 2009

1.  Introduction

   A means of indicating categories for resources on the web has been
   defined by Atom [RFC4287].  This document defines a framework for
   exposing category information in the same format via HTTP headers.

   The atom:category element conveys information about a category
   associated with an entry or feed.  A given atom:feed or atom:entry
   element MAY have zero or more categories which MUST have a "term"
   attribute (a string that identifies the category to which the entry
   or feed belongs) and MAY also have a scheme attribute (an IRI that
   identifies a categorization scheme) and/or a label attribute (a
   human-readable label for display in end-user applications).

   Similarly a web resource may be associated with zero or more
   categories as indicated in the Category header-field(s).  These
   categories may be divided into separate vocabularies or "schemes"
   and/or accompanied with human-friendly labels.

   [[ Feedback is welcome on the ietf-http-wg@w3.org mailing list,
   although this is NOT a work item of the HTTPBIS WG. ]]

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in BCP 14, [RFC2119], as
   scoped to those conformance targets.

   This document uses the Augmented Backus-Naur Form (ABNF) notation of
   [RFC2616], and explicitly includes the following rules from it:
   quoted-string, token.  Additionally, the following rules are included
   from [RFC3986]: URI.

2.  Categories

   In this specification, a category is a grouping of resources by
   'term', from a vocabulary ('scheme') identified by an IRI [RFC3987].
   It is comprised of:

   o  A "term" which is a string that identifies the category to which
      the resource belongs.

   o  A "scheme" which is an IRI that identifies a categorization scheme

Johnston                 Expires January 2, 2010                [Page 3]

Internet-Draft              Abbreviated Title                  July 2009

   o  An "label" which is a human-readable label for display in end-user
      applications (optional).

   A category can be viewed as a statement of the form "resource is from
   the {term} category of {scheme}, to be displayed as {label}", for
   example "'Loewchen' is from the 'dog' category of 'animals', to be
   displayed as 'Canine'".

3.  The Category Header Field

   The Category entity-header provides a means for serialising one or
   more categories in HTTP headers.  It is semantically equivalent to
   the atom:category element in Atom [RFC4287].

   Category           = "Category" ":" #category-value
   category-value     = term *( ";" category-param )
   category-param     = ( ( "scheme" "=" <"> scheme <"> )
                      | ( "label" "=" quoted-string )
                      | ( "label*" "=" enc2231-string )
                      | ( category-extension ) )
   category-extension = token [ "=" ( token | quoted-string ) ]
   enc2231-string     = 
   term               = token
   scheme             = URI

   Each category-value conveys exactly one category but there may be
   multiple category-values for each header-field and/or multiple
   header-fields per [RFC2616].

   Note that schemes are REQUIRED to be absolute URLs in Category
   headers, and MUST be quoted if they contain a semicolon (";") or
   comma (",") as these characters are used to separate category-params
   and category-values respectively.

   The "label" parameter is used to label the category such that it can
   be used as a human-readable identifier (e.g. a menu entry).
   Alternately, the "label*" parameter MAY be used encode this label in
   a different character set, and/or contain language information as per
   [RFC2231].  When using the enc2231-string syntax, producers MUST NOT
   use a charset value other than 'ISO-8859-1' or 'UTF-8'.

3.1.  Examples

   NOTE: Non-ASCII characters used in prose for examples are encoded
   using the format "Backslash-U with Delimiters", defined in Section
   5.1 of [RFC5137].

Johnston                 Expires January 2, 2010                [Page 4]

Internet-Draft              Abbreviated Title                  July 2009

   For example:
   Category: dog

   indicates that the resource is in the "dog" category.
   Category: dog; label="Canine"; scheme="http://purl.org/net/animals"

   indicates that the resource is in the "dog" category, from the
   "http://purl.org/net/animals" scheme, and should be displayed as

   The example below shows an instance of the Category header encoding
   multiple categories, and also the use of [RFC2231] encoding to
   represent both non-ASCII characters and language information.
   Category: dog; label="Canine"; scheme="http://purl.org/net/animals",
             lowchen; label*=UTF-8'de'L%c3%b6wchen";

   Here, the second category has a label encoded in UTF-8, uses the
   German language ("de"), and contains the Unicode code point \u'00F6'

4.  IANA Considerations

4.1.  Category Header Registration

   This specification adds an entry for "Category" in HTTP to the
   Message Header Registry [RFC3864] referring to this document:
   Header Field Name: Category
   Protocol: http
   Status: standard
   Author/change controller:
       IETF (iesg@ietf.org)
       Internet Engineering Task Force
   Specification document(s):
       [ this document ]

5.  Security Considerations

   The content of the Category header-field is not secure, private or
   integrity-guaranteed, and due caution should be exercised when using

6.  Internationalisation Considerations

   Category header-fields may be localised depending on the Accept-

Johnston                 Expires January 2, 2010                [Page 5]

Internet-Draft              Abbreviated Title                  July 2009

   Language header-field, as defined in section 14.4 of [RFC2616].

   Scheme IRIs in atom:category elements may need to be converted to
   URIs in order to express them in serialisations that do not support
   IRIs, as defined in section 3.1 of [RFC3987].  This includes the
   Category header-field.

7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
              Word Extensions: Character Sets, Languages, and
              Continuations", RFC 2231, November 1997.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3864]  Klyne, G., Nottingham, M., and J. Mogul, "Registration
              Procedures for Message Header Fields", BCP 90, RFC 3864,
              September 2004.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, January 2005.

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, January 2005.

   [RFC4287]  Nottingham, M. and R. Sayre, "The Atom Syndication
              Format", RFC 4287, December 2005.

   [RFC5137]  Klensin, J., "ASCII Escaping of Unicode Characters",
              RFC 5137, February 2008.

7.2.  Informative References

   [OCCI]     Open Grid Forum (OGF), Edmonds, A., Metsch, T., Johnston,
              S., and A. Richardson, "Open Cloud Computing Interface
              (OCCI)", .

   [RFC2068]  Fielding, R., Gettys, J., Mogul, J., Nielsen, H., and T.
              Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",

Johnston                 Expires January 2, 2010                [Page 6]

Internet-Draft              Abbreviated Title                  July 2009

              RFC 2068, January 1997.

              Raggett, D., Hors, A., and I. Jacobs, "HTML 4.01

              Hyatt, D. and I. Hickson, "HTML 5", April 2009,

              Nottingham, M., "Web Linking",
              draft-nottingham-http-link-header-05 (work in progress),
              April 2009.

              Celik, T., Marks, K., and D. Powazek, "rel="tag"
              Microformat", .

Appendix A.  Notes on use with HTML

   In the absence of a dedicated category element in HTML 4
   [W3C.REC-html401-19991224] and HTML 5 [W3C.WD-html5-20090423],
   category information (including user supllied folksonomy
   classifications) MAY be exposed using HTML A and/or LINK elements by
   concatenating the scheme and term:
   category-link = scheme term
   scheme        = URI
   term          = token

   These category-links MAY form a resolveable "tag space" in which case
   they SHOULD use the "tag" relation-type per [rel-tag-microformat].

   Alternatively META elements MAY be used:

   o  where the "name" attribute is "keywords" and the "content"
      attribute is a comma-separated list of term(s)

   o  where the "http-equiv" attribute is "Category" and the "content"
      attribute is a comma-separated list of category-value(s)

Appendix B.  Notes on use with Atom

   Where the cardinality is known to be one (for example, when
   retrieving an individual resource) it MAY be preferable to render the

Johnston                 Expires January 2, 2010                [Page 7]

Internet-Draft              Abbreviated Title                  July 2009

   resource natively over HTTP without Atom structures.  In this case
   the contents of the atom:content element SHOULD be returned as the
   HTTP entity-body and metadata including the type attribute and atom:
   category element(s) via HTTP header-field(s).

   This approach SHOULD NOT be used where the cardinality is guaranteed
   to be one (for example, search results which MAY return one result).

Appendix C.  Acknowledgements

   The author would like to thank Mark Nottingham for his work on Web
   Linking [draft-nottingham-http-link-header] (on which this document
   was based) and to the authors of [RFC2068] for specification of the
   Link: header-field on which this is based.

   The author would like to thank members of the OGF's Open Cloud
   Computing Interface [OCCI] working group for their contributions and
   others who commented upon, encouraged and gave feedback to this

Appendix D.  Document History

   [[ to be removed by the RFC editor should document proceed to
   publication as an RFC. ]]


      *  Initial draft based on draft-nottingham-http-link-header-05

Appendix E.  Outstanding Issues

   [[ to be removed by the RFC editor should document proceed to
   publication as an RFC. ]]

   The following issues are oustanding and should be addressed:

   1.  Is extensibility of Category headers necessary as is the case for
       Link: headers?  If so, what are the use cases?

   2.  Is supporting multi-lingual representations of the same
       category(s) necessary?  If so, what are the risks of doing so?

   3.  Is a mechanism for maintaining Category header-fields required?
       If so, should it use the headers themselves or some other

Johnston                 Expires January 2, 2010                [Page 8]

Internet-Draft              Abbreviated Title                  July 2009

   4.  Does this proposal conflict with others in the same space?  If
       so, is it an improvement on what exists?

Author's Address

   Sam Johnston
   Australian Online Solutions
   GPO Box 296
   Sydney, NSW  2001

   Email: samj@samj.net
   URI:   https://samj.net/

Johnston                 Expires January 2, 2010                [Page 9]


Is AtomPub already the HTTP of cloud computing?

A couple of weeks ago I asked Is OCCI the HTTP of cloud computing? I explained the limitations of HTTP in this context, which basically stem from the fact that the payloads it transfers are opaque. That’s fine when they’re [X]HTML because you can express links between resources within the resources themselves, but what about when they’re some other format – like OVF describing a virtual machine as may well be the case for OCCI? If I want to link between a virtual machine and its network(s) and/or storage device(s) then I’m out of luck… I need to either find an existing meta-model or roll my own from scratch.

That’s where Atom (or more specifically, AtomPub) comes in… in the simplest sense it adds a light, RESTful XML layer to HTTP which you can extend as necessary. It provides for collections (a ‘feed’ of multiple resources or ‘entries’ in a single HTTP message) as well as a simple meta-model for linking between resources, categorising them, etc. It also gives some metadata relating to unique identifiers, authors/contributors, caching information, etc., much of which can be derived from HTTP (e.g. URL <-> Atom ID, Last-Modified <-> updated).

Although it was designed with syndication in mind, it is a very good fit for creating APIs, as evidenced by its extensive use in:

I’d explain in more detail but Mohanaraj Gopala Krishnan has done a great job already in his AtomPub, Beyond Blogs presentation:

The only question that remains is whether or not this is the best we can do… stay tuned for the answer. The biggest players in cloud computing seem to think so (except Amazon, whose APIs predate Google’s and Microsoft’s) but maybe there’s an even simpler approach that’s been sitting right under our noses the whole time.

Is OCCI the HTTP of Cloud Computing?

The Web is built on the Hypertext Transfer Protocol (HTTP), a client-server protocol that simply allows client user agents to retrieve and manipulate resources stored on a server. It follows that a single protocol could prove similarly critical for Cloud Computing, but what would that protocol look like?

The first place to look for the answer is limitations in HTTP itself. For a start the protocol doesn’t care about the payload it carries (beyond its Internet media type, such as text/html), which doesn’t bode well for realising the vision of the [Semantic] Web as a “universal medium for the exchange of data”. Surely it should be possible to add some structure to that data in the simplest way possible, without having to resort to carrying complex, opaque file formats (as is the case today)?

Ideally any such scaffolding added would be as light as possible, providing key attributes common to all objects (such as updated time) as well as basic metadata such as contributors, categories, tags and links to alternative versions. The entire web is built on hyperlinks so it follows that the ability to link between resources would be key, and these links should be flexible such that we can describe relationships in some amount of detail. The protocol would also be capable of carrying opaque payloads (as HTTP does today) and for bonus points transparent ones that the server can seamlessly understand too.

Like HTTP this protocol would not impose restrictions on the type of data it could carry but it would be seamlessly (and safely) extensible so as to support everything from contacts to contracts, biographies to books (or entire libraries!). Messages should be able to be serialised for storage and/or queuing as well as signed and/or encrypted to ensure security. Furthermore, despite significant performance improvements introduced in HTTP 1.1 it would need to be able to stream many (possibly millions) of objects as efficiently as possible in a single request too. Already we’re asking a lot from something that must be extremely simple and easy to understand.


It doesn’t take a rocket scientist to work out that this “new” protocol is going to be XML based, building on top of HTTP in order to take advantage of the extensive existing infrastructure. Those of us who know even a little about XML will be ready to point out that the “X” in XML means “eXtensible” so we need to be specific as to the schema for this assertion to mean anything. This is where things get interesting. We could of course go down the WS-* route and try to write our own but surely someone else has crossed this bridge before – after all, organising and manipulating objects is one of the primary tasks for computers.

Who better to turn to for inspiration than a company whose mission it is to “organize the world’s information and make it universally accessible and useful”, Google. They use a single protocol for almost all of their APIs, GData, and while people don’t bother to look under the hood (no doubt thanks to the myriad client libraries made available under the permissive Apache 2.0 license), when you do you may be surprised at what you find: everything from contacts to calendar items, and pictures to videos is a feed (with some extensions for things like searching and caching).


Enter the OGF’s Open Cloud Computing Interface (OCCI) whose (initial) goal it is to provide an extensible interface to Cloud Infrastructure Services (IaaS). To do so it needs to allow clients to enumerate and manipulate an arbitrary number of server side “resources” (from one to many millions) all via a single entry point. These compute, network and storage resources need to be able to be created, retrieved, updated and deleted (CRUD) and links need to be able to be formed between them (e.g. virtual machines linking to storage devices and network interfaces). It is also necessary to manage state (start, stop, restart) and retrieve performance and billing information, among other things.

The OCCI working group basically has two options now in order to deliver an implementable draft this month as promised: follow Amazon or follow Google (the whole while keeping an eye on other players including Sun and VMware). Amazon use a simple but sprawling XML based API with a PHP style flat namespace and while there is growing momentum around it, it’s not without its problems. Not only do I have my doubts about its scalability outside of a public cloud environment (calls like ‘DescribeImages’ would certainly choke with anything more than a modest number of objects and we’re talking about potentially millions) but there are a raft of intellectual property issues as well:

  • Copyrights (specifically section 3.3 of the Amazon Software License) prevent the use of Amazon’s “open source” clients with anything other than Amazon’s own services.
  • Patents pending like #20070156842 cover the Amazon Web Services APIs and we know that Amazon have been known to use patents offensively against competitors.
  • Trademarks like #3346899 prevent us from even referring to the Amazon APIs by name.

While I wish the guys at Eucalyptus and Canonical well and don’t have a bad word to say about Amazon Web Services, this is something I would be bearing in mind while actively seeking alternatives, especially as Amazon haven’t worked out whether the interfaces are IP they should protect. Even if these issues were resolved via royalty free licensing it would be very hard as a single vendor to compete with truly open standards (RFC 4287: Atom Syndication Format and RFC 5023: Atom Publishing Protocol) which were developed at IETF by the community based on loose consensus and running code.

So what does all this have to do with an API for Cloud Infrastructure Services (IaaS)? In order to facilitate future extension my initial designs for OCCI have been as modular as possible. In fact the core protocol is completely generic, describing how to connect to a single entry point, authenticate, search, create, retrieve, update and delete resources, etc. all using existing standards including HTTP, TLS, OAuth and Atom. On top of this are extensions for compute, network and storage resources as well as state control (start, stop, restart), billing, performance, etc. in much the same way as Google have extensions for different data types (e.g. contacts vs YouTube movies).

Simply by standardising at this level OCCI may well become the HTTP of Cloud Computing.

rel=shortlink: url shortening that really doesn’t hurt the internet

Inspired primarily by the fact that the guys behind the RevCanonical fiasco are still stubbornly refusing to admit they got it wrong (the whole while arrogantly brushing off increasingly direct protests from the standards community) I’ve whipped up a Google App Engine application which reasonably elegantly implements rel=shortlink: url shortening that really doesn’t hurt the internet:


It works just like TinyURL and its ilk, accepting a URL and [having a crack at] shortening it. It checks both the response headers and (shortly) the HTML itself for rel=shortlink and if they’re not present then you have the option of falling back to a traditional service (the top half a dozen are preconfigured or you can specify your own via the API’s “fallback” parameter).

An interesting facet of this implementation is the warnings it gives if it encounters the similar-but-ambiguous short_url proposal and the fatal errors it throws up when it sniffs out the nothing-short-of-dangerous rev=canonical debacle. Apparently people (here’s looking at you Ars Technica and Dopplr) felt there was no harm in implementing these “protocols”. Now there most certainly is.

Here’s the high level details (from the page itself):

A community service by Sam Johnston (@samj / s…@samj.net) of Australian Online Solutions, loosely based on a relatively good (albeit poorly executed) idea by some some web developers purporting to “save the Internet” while actually hurting it.
A mechanism for webmasters to indicate the preferred short URL(s) for a given resource, thereby avoiding the need to consult a potentially insecure/unreliable third-party for same. Resulting URLs reveal useful information about the source (domain) and subject (path):
http://tinyurl.com/cgy9pu » http://purl.org/net/shortlink
The shortlink Google Code project, the rel-shortlink Google App Engine application, the #shortlink Twitter hashtag and coming soon to a client or site near you.
Starting April 2009, pending ongoing discussion in the Internet standards community (in the mean time you can also use http://purl.org/net/shortlink in place of shortlink).
Short URLs are useful both for space constrained channels (such as SMS and Twitter) and also for anywhere URLs need to be manually entered (e.g. when they are printed or spoken). Third-party shorteners can cause many problems, including link rot, performance problems, outages and privacy & security issues.
By way of <link rel="shortlink"> HTML elements and/or Link: ; rel=shortlink HTTP headers.

So have at it and let me know what you think. The source code is available under the AGPL license for those who are curious as to how it works.

CBS/CNET/ZDNet interview on cloud standards and platforms

I’m a bit too busy right now for putting together my usual meticulously crafted blog posts and random thoughts have found a good home at Twitter (@samj), so I thought I’d copy an interview this week with CBS/CNET/ZDNet on the emotive topic of cloud standards. As you know I’m busy putting the finishing touches on the Open Cloud Initiative and am one of the main people driving the Open Cloud Computing Interface (OCCI), where I’m representing the needs of my large enterprise clients… we’re on track to deliver a nice clean cloud infrastructure services (IaaS) API next month as promised.

Anyway not sure when/if this will appear as I took a few days to respond, but here goes:


1. Regarding infrastructure-as-a-service: Does the infrastructure matter? Whether it’s on Amazon’s EC2 for example — does it matter where your app is hosted?

Cloud infrastructure services (IaaS) should be easily commoditised (that is, where product selection becomes more dependent on price than differentiating features, benefits and value added services), but this is not yet the case. At projects like Open Grid Forum‘s recently launched Open Cloud Computing Interface (OCCI) we are fast working to make this a reality (potentially as soon as next month).According to the Open Cloud Initiative the two primary requirements for “Open Cloud” are open APIs and open formats. In the context of cloud infrastructure services that means OCCI (a draft of which will be available next month) and OVF (which was released last month) respectively. These open standards will allow users to easily migrate workloads from one provider to another in order to ensure that they are receiving the best possible service at the best possible price.

In the mean time providers typically differentiate on reputation, reliability and value added features (such as complementary components like Amazon S3 and SQS and network features like load balancing and security).

2. Regarding platform-as-a-service providers: What sort of tools would you require, and what tools/services would help sway your vote toward one platform over another? 

Open standards (particularly for APIs and formats) are far more important for cloud platform services (PaaS) than any tools that a provider offers. The trend today (with providers like Amazon, Google, Salesforce and Aptana) is to extend the Eclipse software development platform. That said, I expect web based development environments like Mozilla Bespin to become increasingly popular – providers like Heroku are leading the charge here.On the other hand cloud hosting offerings like Rackspace/Mosso’s Cloud Sites could also be considered a cloud platform in that I can upload open source applications like Drupal and MediaWiki and they will take care of the scaling for me, billing me for what resources I use. I like this approach because I get the benefits of cloud computing but I could easily move to a competitor like Dreamhost PS becuase there is virtually no vendor lock-in.

Conversely, while an application written and optimised for Google App Engine will operate and scale extremely well there, it could be very difficult to move elsewhere thanks to the modifications they have made to the Python and Java runtimes. Note that many of these modifications are necessary to enforce security and scalability.

For example, Sun is coming out with a platform stack for the cloud, which will give developers common services to hook their Java apps into. Is this something significant? What else would you like or need from providers?

That all depends on the environment they create and what interfaces they expose – a good test is how many existing Java applications will run on it without modification. Very few applications will run “out of the box” on Google App Engine but the modifications that need to be made should make the platform more scalable and cheaper overall than one running stock standard Java.Sun’s Simon Phipps sharply criticised Google earlier in the week, noting that “sub-sets of the core classes in the Java platform was forbidden for a really good reason, and it’s wanton and irresponsible to casually flaunt the rules“. That would lead me to believe that their offerings will be somewhat more compliant (and therefore enterprise friendly), but also somewhat more expensive.

One of the major sources of incompatibility here is the migration from relational databases (RDBMSs) to their cloud counterparts such as BigTable and SimpleDB. In order to enable massive scalability significant changes had to be made to core concepts and until we have an open standard interface for cloud databases (possibly following the examples of ODBC and DBI) interoperability at the platform layer will be challenging.

I’m also writing to providers like Amazon and Microsoft, to see if they have anything to add. 🙂 

Amazon are at the forefront of what I would call the “cloud operating environment”. They offer a number of criticial “cloud architecture” components (most notably SQS queues and more recently elastic MapReduce services) which can be assembled together to create arbitrarily large, loosely coupled cloud computing systems.

Microsoft’s Azure offering will also be interesting in that it is based on the Common Language Runtime. This will allow developers using their language of choice to target the platform, which has been something that has restricted Google App Engine to subsets of the developer community (first Python developers and now Java). It should in theory also be relatively straightforward to migrate from traditional architectures to their cloud platform.