"Cloud Computing is the realisation of Internet ('Cloud') based development and use of computer technology ('Computing') delivered by an ecosystem of providers." Sam Johnston
It's amazing that such a simple concept has caused so much confusion, but having reviewed the recent discussions it seems many are falling into the trap of trying to align Cloud Computing with (or contrast it against) existing terminology like SaaS and Utility Computing. It is in fact far more suitable as an umbrella term encompassing all of these related components.
'The Cloud'
While there can be multiple definitions for Cloud Computing, for The Cloud itself 'there can be only one' as it's a metaphor for the Internet; people talking about clouds (plural) are probably confusing it with grids. Yes you can replicate some of this in a 'private cloud', but it will always be exactly that: a replica, and it will likely be somehow connected to (and therefore part of) the real cloud anyway. Remember, much of the value of Cloud Computing comes from leveraging other services in The Cloud for a result greater than the sum of its parts.
Why 'The Cloud'?
Remember all those network diagrams with a fluffy cloud in the middle? Why a cloud and not a black box or some other device? Because we simply don't know, and better yet we don't need to know, what goes on in there - we just pass a packet down our pipe and (most of the time) it arrives at its destination. This is an abstraction (in reality the Internet is an incredibly complex beast) but an important one; it significantly reduces the complexity of our systems; a good example is relatively simple VPNs having quickly displaced many complex WANs.
Definition
Let's break down my definition (which I came to by collating the assertions that were in line with my view and then boiling the result down to the basic common elements):
"Cloud Computing...
...is the realisation of... While many of the requisite components have been available in various forms for some time (eg Software as a Service, Utility Computing, Web Services, Web 2.0, etc.) it is only now they are reaching critical mass that the Cloud Computing concept is working its way into the mainstream. As more of a collection of trends (a 'metatrend') we still have some way to go yet, but Cloud Computing solutions are a reality today and will rapidly mature and expand into virtually every corner of our lives and enterprises.
...Internet ('Cloud') based... Although some have [ab]used the 'Cloud Computing' term in reference to infrastructure (particularly grid computing, like Amazon's pioneering Elastic Compute Cloud), much of its value is derived from the universal connectivity of the Internet; between businesses (B2B e.g. Web Services like Amazon Web Services), businesses and consumers (B2C e.g. Web 2.0 like Google Apps) and between consumers themselves (C2C e.g. peer to peer like BitTorrent). Many of us are now connected to 'The Cloud' where we work (office), rest (home) and play (mobile) and there are solutions (eg Gears) for when we are not.
...development and use of computer technology'... an accepted, all-encompassing definition of computing - there are very few areas which will not be affected in some way by Cloud Computing so I've gone for the broadest possible definition.
...delivered by an ecosystem of providers." While it is possible to enjoy some of the advantages using a single provider (eg Google), it is hard to imagine a functionally complete solution which does not draw on multiple providers (in much the same way as we install task-specific applications onto our legacy computers). Your electricity is almost certainly generated by wholesale providers who pump it into the grid and similarly Cloud Computing will typically be delivered by layered (eg Smugmug on Amazon S3) and/or interconnected (eg Facebook<->Twitter) systems.
Cloud Computing Architecture
Cloud Computing is typically universally accessible, massively scalable (with vast pools of multi-tenant 'on-demand' resources), highly reliable (see my TrustSaaS site for proof that the main services are up over 99% of the time), cost effective and utility priced with low barriers to entry (eg capital expenditure, professional services), but none of these attributes are absolute requirements (no, not even massive scalability - even an esoteric web service may still be an absolute requirement for a small handful of users and thus an important part of the ecosystem).
Cloud Computing architecture looks something like this, with layers similar to the OSI networking stack:
Client
which consumes these applications via a browser and/or programmatically
Composite
(Composite Applications or Mashups) which are linked together using APIs like REST (eg TrustSaaS), in much the same way as 'pipes' are used in Unix to create arbitrarily complex systems from simple tools
Application
which ideally follow proven Unix philosophy of 'do one thing and do it well', but which may grow quite complex
Platform
on which applications are built, including the language itself (eg Java, Python) as well as supporting systems like storage
Infrastructure
consisting of the physical computing resources (and virtualisation layer(s) at the hardware and/or operating system layers)
Networking
courtesy the existing Internet (eg TCP/IP)
Cloud Computing Components
Although many of these are solutions to the same problems, most of them are actually components of Cloud Computing, rather than Cloud Computing itself (working from the ground up):
Grid computing (Amazon EC2, GoGrid, AppNexus), essentially any network of loosely-coupled computers acting in concert, is mostly concerned with tackling complexity and improving managability of computing resources (for example, production servers not being taken down by server failures or routine maintenance). Where previously you might have deployed a physical server you can now deploy a virtual one, and increased automation of operating system and application deployment is pushing the interface further and further up towards the application layer itself (eg Desktone's Desktop as a Service). While Internet ('cloud') connected grids are particularly useful (and a natural progression for virtualisation and SOA solutions being rolled out en-masse in enterprises today), implying that this is somehow equivalent to cloud computing is too narrow a view. Cloud based grids are more cost effective, reliable, scalable and user friendly than their disconnected counterparts and are one big step closer to the panacea of autonomic computing. Expect to see existing 'virtual infrastrucutre' providers like VMware and Citrix seamlessly complementing on-premises solutions with cloud based services.
Platform as a Service (PaaS) (Google's AppEngine, Salesforce's force.com, Heroku, Joyent, Rackspace's Mosso): takes grid computing to the next level of abstraction by pushing the interface up to the platform or 'stack' on which applications themselves are built (eg Django, Ruby on Rails, Apex Code). This is primarily interesting for developers and power users and is an increasingly important component of the cloud computing ecosystem. It allows them to focus on development without the overhead of hardware and operating system maintenance, database tuning, load balancing, network connectivity etc. while exposing technology like BigTable (and massive scalability) which might not otherwise be available to them. More importantly, it eliminates capital expenditure requirements, allowing boutique Independent Software Vendors like us to 'stay in the game'.
Utility Computing (Amazon S3) is more about a 'utility' (gas, water, electricity) pricing model, yet one can derive the benefits of cloud computing with a more traditional pricing model, or indeed without having to pay for it at all (consider Google's AppEngine for example, where it's utility-style pricing only applies to the more demanding users).
Web Services (Amazon Web Services): 'the 'glue' that holds cloud computing components together', are finally maturing and being adopted 'en-masse' thanks in no small part to simplification by way of protocols like REST, commercilisation by providers like Amazon (Jeff Bezos' Risky Bet) and the abundance of web toolkits (e.g. Ruby on Rails) which lower the barrier to entry by providing native support. You can do everything from payments to 'human intelligence tasks' with Web Services now and mashups rely on them heavily to make products that are greater than the sum of their parts. Companies like Ariba and Rearden Commerce are taking this to the extreme.
Web 2.0 (Wikipedia, Facebook, WebEx) which while a force in itself, deals more with making the web 'read/write', shifting power towards the consumer and leveraging their collective energy. While AJaX does a lot to make this environment more user friendly, the underlying theme is turning the 'reader' into a 'contributor'. Most of the players in cloud computing exhibit Web 2.0 attributes.
Software as a Service (SaaS): (Google Apps, Salesforce CRM) falls under the cloud computing umbrella and is a primary component, but to align the two definitions is too narrow a view. SaaS is typically sold per user as pizza is per slice, but what is more important is that it is implemented and maintained by a provider who handles much of the complexity of running software on your behalf (eg scaling, backups, updates, etc.).
'Cloud' System Integrators (Australian Online Solutions) and consultancies deploy the various components, make them work in concert together (using services like RightScale), integrate them to each other and with legacy systems using the exposed APIs as well as migrating data (eg email, calendars, contacts, documents, etc.) so that users can 'hit the ground running' and continue to collaborate efficiently with those who have not yet migrated 'to the cloud'. Seamless migration is a reality today, and a critical component for cloud adoption.
Cloud Computing Today The Cloud Computing revolution is upon us. Expect it to rapidly proliferate your enterprise, with much of the drive coming from individual grassroots users (who are almost certainly already improving operational efficiency with Web 2.0 tools like Google, Salesforce and WebEx) so plan accordingly. It must be embraced for competitiveness rather than resisted (in much the same way as the PC was embraced decades ago) but it also requires careful governance and change management by experts. Low risk, high return offerings like messaging and web security are available for those who want to 'test the water' without opting for a complete Enterprise 2.0 deployment.
The draw of loosely coupled, massively scalable services will eventually result in most enterprises being swallowed by the cloud (or by more agile, possibly 'digital native' competitors who already were), or at least becoming nodes on it; indeed many already have. Barriers to adoption (eg offline support, security and compliance services) are being torn down every day and practical solutions exist for those that remain (eg encryption) so there are less and less reasons to sit on the sidelines.
Even the largest of enterprises are now starting to jump (typically having completed controlled pilots) and just as company officers would have difficulty explaining downtime losses caused by continuing to generate their own power after cheap, reliable utility electricity became available, shareholders will not accept companies wasting resources on commotitised infrastructure rather than focusing on their core competencies.
Most of us rely heavily (more heavily than we realise, and indeed should) on this rickety old thing called DNS (the Domain Name System), which was never intended to scale as it did, nor to defend against the kinds of attacks it is subjected to today.
The latest DNS related debacle is (as per usual) related to cache poisoning, which is where your adversary manages to convince your resolver (or more specifically, one of the caches between your resolver and the site/service you are intending to connect to) that they are in fact the one you want to be talking to. Note that these are not man-in-the-middle (MitM) attacks; if someone can see your DNS queries you're already toast - these are effective, remote attacks that can be devastating:
Consider for example your average company using POP3 to retrieve mail from their mail server every few minutes, in conjunction with single sign on; convince their cache that you are their mail server and you will have everyone's universal cleartext password in under 5 minutes.
The root of the problem(s) is that the main security offered in a DNS transaction is the query ID (QID) for which there are only 16 bits (eg 65,536 combinations). Even when properly randomised (as was already the case for sensible implementations like djbdns, but not for earlier attempts which foolishly used sequential numbering), fast computers and links can make a meal of this in no time (read, seconds), given enough queries. Fortunately you typically only get one shot for a given name (for any given TTL period - usually 86,400 seconds; 1 day), and even then you have to beat the authorative nameserver with the (correct) answer. Unfortunately, if you can convince your victim to resolve a bunch of different domains (a.example.com, b.example.com ... aa.example.com and so on) then you'll eventually (read, seconds) manage to slip one in.
So what you say? You've managed to convince a caching server that azgsewd.victim.com points at your IP - big deal. But what happens if you slipped in extra resource records (RRs) for, say, www.victim.com or mail.victim.com? A long time ago you might have been able to get away with this attack simply by smuggling unsolicited answers for victim.com queries along with legitimate answers to legitimate queries, but we've been discarding unsolicited answers (at least those that were not 'in-baliwick'; eg from the same domain) for ages. However here you've got a seemingly legitimate answer to a seemingly legitimate question and extra RRs from the same 'in-baliwick' domain, which can be accepted by the cache as legitimate and served up to all the clients of that cache for the duration specified by the attacker.
This is a great example of multiple seemingly benign vulnerabilities being [ab]used together such that the result is greater than the sum of its parts, and is exactly why you should be very, very sure about discounting vulnerabilities (for example, a local privilege escalation vulnerability on a machine with only trusted users can be turned into a nightmare if coupled with a buffer overrun in an unprivileged daemon).
Ok so if you're still reading you've either patched already or you were secure beforehand, as we were at Australian Online Solutions given our DNS hosting platform doesn't cache; we separate authorative from caching nameservers, and our caches have used random source ports from the outset. This increases the namespace from 16 bits (65k combinations) to (just shy of, since some ports are out of bounds) 32 bits (4+ billion combinations). If you're not secure, or indeed not sure if you are, then contact us to see how we can help you.
There's a lot of good reasons to write valid XHTML (even if the vastmajorityofsites don't bother):
Your site will render better, faster and more consistently across all browsers.
Your layout will be pushed from tables and tags to CSS, separating data from presentation and reducing maintenance costs.
Computers (most notably, search engines) will be able to parse and make sense of your content easier than they might otherwise have been able to.
You're supporting standards compliance (which translates to freedom for you and your users) and you can advertise valid XHTML using the W3C logos:
Once you've gone to the effort of writing valid XHTML and CSS and the W3C Markup Validation Service (http://validator.w3.org/) is happy with your efforts you'll still want to make sure you're serving your content with the right mime-type: application/xhtml+xml, but only to browsers that support it (and ask for it via the HTTP Accept: header)... most notably not IE6 :|
It's unfortunate those who care about standards compliance have to jump through hoops by implementing content negotiation, but it's not too hard to do.... for example in PHP you can do something like this:
Notice that the validator won't send an Accept header by default. You can force it to, but I'm just checking for the user agent; if you don't you'll get a warning about the mime-type even if the document is valid (and you're serving it correctly).
So thoseofyouwhoanticipatedaJabber/XMPPchatclient on the iPhone (and iPod Touch) after TUAW rumoured that 'a new XMPP framework has been spotten(sic) in the latest iPhone firmware' back in April were close... but no cigar. Same applies for those who hypothesised about P-IMAP or IMAP IDLE being used by MobileMe for push mail.
The real story, as it turns out, is that Jabber (the same open protocol behind many instant messaging networks including Google Talk) is actually being used for delivering push mail notifications to the iPhone. That's right, you heard it here first. This would explain not only why the libraries were curiously private (in that they are not exposed to developers) but also why IMAP IDLE support only works while Mail.app is open (it's a shame because Google Apps/Gmail supports IMAP IDLE already).
While it's in line with Apple's arguments about background tasks hurting user experience (eg performance and battery life), cluey developers have noted that the OS X (Unix) based iPhone has many options to safely enable this functionality (eg via resource limiting) and that the push notification service for developers is only a partial solution. It's no wonder though with the exclusive carrier deals which are built on cellular voice calls and SMS traffic, both of which could be eroded away entirely if products like Skype and Google Talk were given free reign (presumably this is also why Apple literally hangs onto the keys for the platform). If you want more freedom you're going to have to wait for Google Android, or for ultimate flexibility one of the various Linux based offerings. We digress...
So without further ado, here's the moment we've all been waiting for: a MobileMe push mail notification (using XMPP's pubsub protocol) from aosnotify.mac.com:5223 over SSL:
I'll explain more about the setup I used to get my hands on this in another post later on. So what's the bet that this same mechanism will be used for the push notification service to be released later in the year?
I've just put the finishing touches on the first
proof-of-concept Acid Test for OpenDocument Format (ODF) which I hope
will become a useful tool for encouraging and testing interoperability.
The tests themselves (148 of them in the 14x14 grid following the Web Standards Project's Acid2 test) still need development, as explained below, but the framework is in place.
---------- Forwarded message ----------
From: Sam Johnston
Date: Mon, Jun 16, 2008 at 2:34 AM
Subject: ODF Acid Test - Proof of Concept
To: oiic-formation-discuss@lists.oasis-open.org
Morning all,
With
a view to starting this week afresh I have been busy over the weekend
preparing the first proof of concept ODF Acid Test for the spreadsheet
component. The results are surprisingly impressive, thanks in no small
part to conditional styles
which allow me to set the cell colour depending on whether tests pass
(1+1=2) or fail (1+1=3). For more information about the test
methodology, samples, and the files themselves, refer to http://sites.google.com/a/odfiic.org/acid/ods
This
is not to be confused with an interop panacea (there is no such thing),
but it can be used to focus attention where it is most needed (provided
the attention is not too focused!). It also allows users to get on board the interop bandwagon and has proven a potentincentive for the browser vendors. Kudos to Google's Ian Hickson and the rest of the Web Sandards Project for their pioneering efforts in this area.
Getting SSL up and running on OS X is not too difficult these days. First you need to tell it to read the SSL config file:
--- /etc/apache2/httpd.conf 2008-06-11 03:42:25.000000000 +0200 +++ /etc/apache2/httpd.conf.dist 2008-06-11 04:15:15.000000000 +0200 @@ -470,7 +470,7 @@ #Include /private/etc/apache2/extra/httpd-default.conf # Secure (SSL/TLS) connections -Include /private/etc/apache2/extra/httpd-ssl.conf +#Include /private/etc/apache2/extra/httpd-ssl.conf # # Note: The following must must be present to support # starting without SSL on platforms with no /dev/random equivalent
Then you need to fix this config file for your environment:
# General setup for the virtual host
DocumentRoot "/Library/WebServer/Documents" -ServerName www.example.com:443 -ServerAdmin you@example.com +ServerName secure.samj.net:443 +ServerAdmin xxxx@samj.net
ErrorLog "/private/var/log/apache2/error_log"
TransferLog "/private/var/log/apache2/access_log"
@@ -125,6 +125,7 @@
# Makefile to update the hash symlinks after changes.
#SSLCACertificatePath "/private/etc/apache2/ssl.crt"
#SSLCACertificateFile "/private/etc/apache2/ssl.crt/ca-bundle.crt" +SSLCACertificateFile "/private/etc/apache2/server-ca.crt"
# Certificate Revocation Lists (CRL):
# Set the CA revocation path where to find CA CRLs for client
@@ -143,6 +144,8 @@
# issuer chain before deciding the certificate is not valid.
#SSLVerifyClient require
#SSLVerifyDepth 10 +SSLVerifyClient require +SSLVerifyDepth 2
# Access Control:
# With SSLRequire you can do per-directory access control based
Notice that I'm using client certificates for authentication but you can comment out the SSLCACertificateFile, SSLVerifyClient and SSLVerifyDepth options if you don't need this. If you do you'll want to grab the root from CAcert:
You'll want to generate random nubmers (key) and a certificate signing request (csr) in order to get a certificate (crt) file, and despite most information on the topic this can be done in one command:
# openssl req -newkey rsa:2048 -nodes -keyout server.key -out server.csrGenerating a 2048 bit RSA private key .........+++ .....................................................................+++ writing new private key to 'server.key' ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]: State or Province Name (full name) [Some-State]:New South Wales Locality Name (eg, city) []:Sydney Organization Name (eg, company) [Internet Widgits Pty Ltd]:Australian Online Solutions Pty Ltd Organizational Unit Name (eg, section) []:Security Common Name (eg, YOUR name) []:secure.samj.net Email Address []:xxxx@samj.net
Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []:
Actually in the case of CAcert.org everything except the common name is ignored so you can leave it as defaults.
For testing we'll use a script which prints all the environment variables (this is what I was after for my certificate authentication anyway):
So like me you've been hanging out for another Long Term Support (LTS) Ubuntu release and having arrived last month (8.04) you've got it up and running in VMware (Fusion in my case).
To make VMware tools install you need to:
Virtual Machine->Install VMware Tools (that's the easy part)
Press enter for everything until it won't go any further because it wants the real location of your kernel headers. Give it '/lib/modules/2.6.24-16-server/build/include' and then keep pressing enter again until you get back to your prompt.
VMware Tools should start up (except perhaps for the advanced networking guff).
You'll want to change the second '$USER' to `id -gn` so it picks up your group name (eg 'staff') by itself, and while you're there you can comment out the two TunnelRunner lines if you want to set up tunnels on privileged ports and don't care about the security implications of setuid root binaries. You can do this by copying SSHKeychain.pkg from the mounted disk image, and right clicking to 'Show Package Contents'... then you can browse for Content->Resources->postinstall, or apply this diff:
So it's Apple's Worldwide Developer Conference (WWDC) today in San Francisco and Steve Jobs will certainly have some new goodies for us Mac junkies, likely:
iPhone 2.0
Immediately available, probably worldwide, perhaps with new partnerships, probably cheaper again (who ever said being an Apple early adopter wasn't without its costs?)
New toys including 3G, GPS, probably something unexpected
Support for native applications via the (excellent) SDK - I was already building these on the first day it was released and it's already improved significantly with handful of updates
App Store in iTunes which means iTunes version bump (and like music et al takes a solid 30% cut on sales)
Software Updates
iTunes for iPhone stuff
OS X 10.6 seeding, but OS X is already pretty good so not holding my breath... maybe some more connection to the cloud including:
.Mac Rethink
Google Apps based .Mac accounts - after all they've had a Google Apps account for mac.com since last year some time, if not before.
Hardware
Nothing that will detract from the iPhone announcement... maybe some refreshes here and there
To celebrate today's Google Apps launch here's a screencast I whipped up showing just how easy it is with Google Apps to get 'On in 60 seconds'.
Update:On advice from Dave Girouard (VP, Google Enterprise) we've changed the name to 'On in 60 seconds' from the somewhat less enthralling 'Setting up Google Apps'.
Update: Thanks for the amazing response - 20,000 views! Let's hope Google Apps is as popular as its video!