Twitter’s down for the count. What are we going to do about it?

What’s wrong with this picture?

  • There’s not a single provider for telephony (AT&T, T-Mobile, etc.)
  • There’s not a single provider for text messaging (AT&T, T-Mobile, etc.)
  • There’s not a single provider for instant messaging (GTalk, MSN, AIM, etc.)
  • There’s not a single provider for e-mail (GMail, Hotmail, Yahoo!, etc.)
  • There’s not a single provider for blogging (Blogger, WordPress, etc.)
  • There’s not a single provider for “mini” blogging (Tumblr, Posterous, etc.)
  • There IS a single provider for micro blogging (Twitter)
  • And it’s down for the count (everything from the main site to the API is inaccessible)
  • And it’s been down for an Internet eternity (the best part of an hour and counting)

What are we going to do about it?

How Open Cloud could have saved Sidekick users’ skins

The cloud computing scandal of the week is looking like being the catastrophic loss of millions of Sidekick users’ data. This is an unfortunate and completely avoidable event that Microsoft’s Danger subsidiary and T-Mobile (along with the rest of the cloud computing community) will surely very soon come to regret.

There’s plenty of theories as to what went wrong – the most credible being that a SAN upgrade was botched, possibly by a large outsourcing contractor, and that no backups were taken despite space being available (though presumably not on the same SAN!). Note that while most cloud services exceed the capacity/cost ceiling of SANs and therefore employ cheaper horizontal scaling options (like the Google File System) this is, or should I say was, a relatively small amount of data. As such there is no excuse whatsoever for not having reliable, off-line backups – particularly given Danger is owned by Microsoft (previously considered one of the “big 4” cloud companies even by myself). It was a paid-for service too (~$20/month or $240/year?) which makes even the most expensive cloud offerings like Apple’s MobileMe look like a bargain (though if it’s any consolation the fact that the service was paid for rather than free may well come back to bite them by way of the inevitable class action lawsuits).

“Real” cloud storage systems transparently ensure that multiple copies of data are automatically maintained on different nodes, at least one of which is ideally geographically independent. That is to say, the fact I see the term “SAN” appearing in the conversation suggests that this was a legacy architecture far more likely to fail. This is in the same way that today’s aircraft are far safer than yesterday’s and today’s electricity grids far more reliable than earlier ones (Sidekick apparently predates Android & iPhone by some years after all). It’s hard to say with any real authority what is and what is not cloud computing though, beyond saying that “I know it when I see it, and this ain’t it”.

Whatever the root cause the result is the same – users who were given no choice but to store their contacts, calendars and other essential day-to-day data on Microsoft’s servers look like having irretrievably lost it. Friends, family, acquaintances and loved ones – even (especially?) the boy/girl you met at the bar last night – may be gone for good. People will miss appointments, lose business deals and in the most extreme cases could face extreme hardship as a result (for example, I’m guessing parole officers don’t take kindly to missed appointments with no contact!). The cost of this failure will (at least initially) be borne by the users, and yet there was nothing they could have done to prevent it short of choosing another service or manually transcribing their details.

The last hope for them is that Microsoft can somehow reverse the caching process in order to remotely retrieve copies from the devices (which are effectively dumb terminals) before they lose power; good luck with that. While synchronisation is hard to get right, having a single cloud-based “master” and a local cache on the device (as opposed to a full, first-class citizen copy) is a poor design decision. I have an iPhone (actually I have a 1G, 3G, 3GS and an iPod Touch) and they’re all synchronised together via two MacBooks and in turn to both a Time Machine backup and Mozy online backup. As if that’s not enough all my contacts are in sync with Google Apps’ Gmail over the air too so I can take your number and pretty much immediately drop it in a beer without concern for data loss. Even this proprietary system protects me from such failures.

The moral of the story is that externalised risk is a real problem for cloud computing. Most providers [try to] avoid responsibility by way of terms of service that strip away users’ rights but it’s a difficult problem to solve though because enforcing liability for anything but gross negligence can exclude smaller players from the market. That is why users absolutely must have control over their data and be encouraged if not forced to take responsibility for it.

Open Cloud simply requires open formats and open APIs – that is to say, users must have access to their data in a transparent format. Even if it doesn’t make sense to maintain a local copy on the users’ computer, there’s nothing stopping providers from pushing it to a third party storage service like Amazon S3. In fact it makes a lot of sense for applications to be separated from storage entirely. We don’t expect our operating system to provide all the functionality we’ll ever need (or indeed, any of it) so we install third party applications which use the operating system to store data. What’s to stop us doing the same in the cloud, for example having Google Apps and Zoho both saving back to a common Amazon S3 store which is in turn replicated locally or to another cloud-based service like Rackspace Cloud Files?

In any case perhaps it’s time for us to dust off and revisit the Cloud Computing Bill of Rights?

Cloud Computing Crypto: GSM is dead. Long live GSM!

GSM, at least in its current form, is dead and the GSMA‘s attempts to downplay serious vulnerabilities in claiming otherwise reminds me of this rather famous Monty Python sketch about a dead parrot:

Fortunately consumers these days are savvy and have access to information with which to verify (or not) vendors’ claims about security. So when they get together and say things like “the researchers still would need to build a complex radio receiver to process the raw radio data” the more cynical of us are able to dig up 18 month old threads like this one which concludes:

So it appears you might be able to construct a GSM sniffer from a USRP board and a bunch of free software, including a Wireshark patch. (It appears that one of the pieces of free software required is called “Linux” or “GNU/Linux”, depending on which side of that particular debate you’re on :-), i.e. it works by using Linux’s tunnel device to stuff packets into a fake network interface on which Wireshark can capture.

Ok so extracting the 1’s and 0’s from the airwaves and getting them into the most convenient (open source) framework we have for the dissection of live protocols is a problem long since solved. Not only are the schematics publicly available, but devices are commercially available online for around $1,000. One would have assumed that the GSMA should have known this, and presumably they did but found it preferable to turn a blind eye to the inconvenient truth for the purposes of their release.

The real news though is in the cracking of the A5/1 encryption which purports to protect most of us users by keeping the voice channels “secure”. Conversely the control information which keeps bad guys from stealing airtime is believed to remain safe for the time being. That is to say that our conversations are exposed while the carriers’ billing is secure – an “externalisation” of risk in that the costs are borne by the end users. You can bet that were the billing channels affected then there would have been a scramble to widely deploy a fix overnight rather than this poor attempt at a cover-up.

The attack works by creating a 2Tb rainbow table in advance which allows one to simply look up a secret key rather than having to brute force it. This should be infeasible even for A5/1’s 64-bit key but “the network operators decided to pad the key with ten zeros to make processing faster, so it’s really a 54-bit key” and there are other weaknesses that combine to make this possible. A fair bit of work goes into creating the table initially, but this only needs to be done once and you can buy access to the tables as a service as well as the tables themselves for many common hashes (such as those used to protect Windows and Unix passwords – and no doubt GSM soon too!). The calculations themselves can be quite expensive but advances like OpenCL in the recently released Mac OS X (Snow Leopard) can make things a lot better/faster/cheaper by taking advantage of extremely performant graphics processing units (GPUs).

Of course thanks to cloud computing you don’t even need to do the work yourself – you can just spin up a handful of instances on a service like Amazon EC2 and save the results onto Amazon S3/Amazon EBS. You can then either leave it there (at a cost of around $300/month for 2Tb storage) and use instances to interrogate the tables via a web service, or download it to a local 2Tb drive (conveniently just hitting the market at ~$300 once off).

Cloud storage providers could make the task even easier with services like public data sets which bring multi-tenancy in the form of de-duplication benefits to common data sets. For example, if Amazon found two or more customers storing the same file they could link the two together and share the costs between all of them (they may well do this today, only if they do they keep the benefit for themselves). In the best case such benefits would be exposed to all users in which case the cost of such “public domain” data would be rapidly driven down towards zero.

Ignoring A5/2 (which gives deliberately weakened protection for countries where encryption is restricted), there’s also a downgrade attack possible thanks to A5/0 (which gives no protection) and the tendency for handsets to happily transmit in the clear rather than refusing to transmit at all or at least giving a warning as suggested by the specifications. A man in the middle just needs to be the strongest signal in the area and they can negotiate an unencrypted connection while the user is none the wiser. This is something like how analog phones used to work in that there was no encryption at all and anyone with a radio scanner could trivially eavesdrop on [at least one side of] the conversation. This vulnerability apparently doesn’t apply where a 3G signal is available, in which case the man in the middle also needs to block it.

Fortunately there’s already a solution in the form of A5/3, only it’s apparently not being deployed:

A5/3 is indeed much more secure; not only is it based on the well known (and trusted) Kasumi algorithm, but it was also developed to encrypt more of the communication (including the phone numbers of those connecting together), making it much harder for ne’er-do-wells to work out which call to intercept. A5/3 was developed, at public expense, by the European Telecommunications Standards Institute (ETSI) and is mandated by the 3G standard, though can also be applied to 2.5G technologies including GPRS and EDGE.

That GSMA consider a 2Tb data set in any way a barrier to these attacks is telling about their attitude to security, and to go as far as to compare this to a “20 kilometre high pile of books” is offensively appalling for anyone who knows anything about security. Rainbow tables, cloud computing and advances in PC hardware put this attack well within the budget of individuals (~$1,000), let alone determined business and government funded attackers. Furthermore groups like the GSM Software Project, having realised that “GSM analyzer[s] cost a sh*tload of money for no good reason” are working to “build a GSM analyzer for less than $1000” so as to, among other things, “crack A5 and proof[sic] to the public that GSM is insecure”. Then there’s the GNU Radio guys who have been funded to produce the software to drive it.

Let’s not forget too that, as Steve Gibson observes in his recent Cracking GSM Cellphones podcast with Leo Laporte: “every single cellphone user has a handset which is able to decrypt GSM“. It’s no wonder then that Apple claim jailbreaking the iPhone supports terrorists and drug dealers, but at about the same price as an iPhone ($700 for the first generation USRP board) it’s a wonder why anyone would bother messing with proprietary hardware when they can deal with open hardware AND software in the same price range. What’s most distressing though is that this is not news – according to Steve an attack was published some 6 years ago:

There’s a precomputation attack. And it was published thoroughly, completely, in 2003. A bunch of researchers laid it all out. They said, here’s how we cracked GSM. We can either have – I think they had, like, a time-complexity tradeoff. You’d have to listen to two minutes of GSM cellphone traffic, and then you could crack the key that was used to encrypt this. After two minutes you could crack it in one second. Or if you listen to two seconds of GSM cellphone traffic, then you can crack it in two minutes. So if you have more input data, takes less time; less input data, more time. And they use then tables exactly like we were talking about, basically precomputation tables, the so-called two terabytes that the GSM Alliance was pooh-poohing and saying, well, you know, no one’s ever going to be able to produce this.

Fortunately us users can now take matters into our own hands by handling our own encryption given those entrusted with doing it for us have been long since asleep at the wheel. I’ve got Skype on my MacBook and iPhone for example (tools like 3G Unrestrictor on a jailbroken iPhone allow you to break the digital shackles and use it as a real GSM alternative) and while this has built in encryption (already proving a headache for the authorities) it is, like GSM, proprietary:

Everything about this is worrisome. I mean, from day one, the fact that they were keeping this algorithm, their cipher, a secret, rather than allowing it to be exposed publicly, tells you, I mean, it was like the first thing to worry about. We’ve talked often about the dangers of relying on security through obscurity. It’s not that some obscurity can’t also be useful. But relying on the obscurity is something you never want because nothing remains obscure forever.

We all know that open systems are more secure – for example, while SSL/TLS has had its fair share of flaws it can be configured securely and is far better than most proprietary alternatives. That’s why I’m most supportive of solutions like (but not necessarily) Phil Zimmerman‘s Zfone – an open source implementation of the open ZRTP specification (submitted for IETF standardisation). This could do the same for voice as what his ironically named Pretty Good Privacy did for email many years ago (that is – those who do care about their privacy can have it). Unfortunately draft-zimmermann-avt-zrtp expired last week but let’s hope it’s not the end of the road as something urgently needs to be done about this. Here you can see it successfully encrypting a Google Talk connection (with video!):

Sure there may be some performance and efficiency advantages to be had by adding encryption to compression codecs but I rather like the separation of duties as it’s unlikely a team of encryption experts will be good at audio and video compression and vice versa.

Widespread adoption of such standards would also bring us one big step closer to data-only carriers that I predict will destroy the telco industry as we know it some time soon.

Amazon VPC trojan horse finds its mark: Private Cloud

Now we’ve all had a chance to digest the Amazon Virtual Private Cloud announcement and the dust has settled I’m joining the fray with a “scoop of interpretation“. Positioned as “a secure and seamless bridge between a company’s existing IT infrastructure and the AWS cloud” the product is (like Google’s Secure Data Connector for App Engine which preceded Amazon VPC by almost 6 months) quite simply a secure connection back to legacy infrastructure from the cloud – nothing more, nothing less. Here’s a diagram for those who prefer to visualise (Virtual Private Cloud.svg on Wikimedia Commons):

Notice that “private cloud” (at least in the sense that it is most often [ab]used today) is conspicuously absent. What Amazon and Google are clearly telling customers is that they don’t need their own “private cloud”. Rather, they can safely extend their existing legacy infrastructure into the [inter]cloud using VPN-like connections and all they need to do to get up and running is install the software provided or configure a new VPN connection (Amazon uses IPsec).

Remember, a VPN is the network you have when you’re not having a network – it behaves just like a “private network” only it’s virtual. Similarly a VPC is exactly that: a virtual “private cloud” – it behaves like a “private cloud” (in that it has a [virtual] perimeter) but users still get all the benefits of cloud computing – including trading capex for opex and leaving the details to someone else.

Also recall that the origin of the cloud was network diagrams where it was used to denote sections of the infrastructure that were somebody else’s concern (e.g. a telco). You just needed to poke your packets in one side and [hopefully] they would reappear at the other (much like the Internet). Cloud computing is like that too – everything within the cloud is somebody else’s concern, but if you install your own physical “private cloud” then that no longer holds true.

Of course the “private cloud” parade (unsurprisingly consisting almost entirely of vendors who peddle “private cloud” or their agents, often having some or all of their existing revenue streams under direct threat from cloud computing) were quick to jump on this and claim that Amazon’s announcement legitimised “private cloud”. Au contraire mes amis – from my [front row] seat the message was exactly the opposite. Rather than “legitimis[ing] private cloud” or “substantiating the value proposition” they completely undermined the “private cloud” position by providing a compelling “public cloud” based alternative. This is the mother of all trojan horses and even the most critical of commentators wheeled it right on in to the town square and paraded it to the world.

Upon hearing the announcement Christofer Hoff immediately claimed that Amazon had “peed on [our] fire hydrant” and Appistry’s Sam Charrington chimed in, raising him by claiming they had also “peed in the pool” ([ab]using one of my favourite analogies). Sam went on to say that despite having effectively defined the term Amazon’s product was not, in fact, “virtual private cloud” at all, calling into question the level of “logical isolation”. Reuven Cohen (another private cloud vendor) was more positive having already talked about it a while back, but his definition of VPC as “a method for partitioning a public computing utility such as EC2 into quarantined virtual infrastructure” is a little off the mark – services like EC2 are quarantined by default but granular in that they don’t enforce the “strong perimeter” characteristic of VPCs.

Accordingly I would (provisionally) define Virtual Private Cloud (VPC) as follows:

Virtual Private Cloud (VPC) is any private cloud existing within a shared or public cloud (i.e. the Intercloud).

This is derived from the best definition I could find for “Virtual Private Network (VPN)”.

An open letter to the board and members

This is an open letter to the board and membership (including my fellow 20-30 official “Association Members” (copied) as well as the 150,000 or so account holders we effectively represent) concerning recent events that could affect the ongoing viability of the organisation. Bearing in mind that this is an organisation built on trust, I implore you to follow my example in exercising extreme caution when we are called to necessarily intervene in resolving the deadlock. Despite claims to the contrary there is no urgency and the last thing we need now is an Iran style election (whether or not legitimate, perception is everything).

The Problem

It appears (from my perspective as an outsider, albeit with the benefit of various insider accounts) that the board has split into two factions. On one hand we have the “old school” who have been on the board for a while (some would say too long) and the other “reformist(s)” who seek change, yesterday. They are now on a crash course that will invariably result in the loss of committed contributors, or worse, loss of trust from the community. In any case a confrontation poses a serious risk to the organisation’s future, and with it the community’s access to an alernative to commercial certification authorities.

In requesting and receiving the official member list as well as proposing a number of new members (who are presumably sympathetic to their position and will vote for any motion they submit) it was already clear that plans were afoot for a “coup d’état”. Now that an SGM has been proposed to “get this over with” complete with a clear agenda there is absolutely no doubt about it:

  1. Acceptance of new members. (E.Schwob, A.Bürki, I.Grigg)
  2. Vote that the committee of management no longer enjoys the confidence of the members.
  3. Vote that the committee is hereby removed from office and election of a committee shall immediately follow adoption of this resolution.
  4. Election of a new committee of management.

It is no wonder that the existing board feel they are under attack – they effectively are – and given the “soonest this could be done is in 7 days” they are no doubt starting to feel the pressure. I don’t buy it. Yes, the auditor recently resigned and yes we will eventually need to get the audit back on track, but right now the number one issue is restoring stability to an unstable structure and minimising collateral damage. This needs to be done slowly and carefully and those promoting panic are perhaps deserving of the suspicion they have raised.

It is not my intent to start (yet another) discussion, rather to propose a safe and sensible way forward that will ensure CAcert’s ongoing viability while protecting our most valuable asset: the trust of the community. Should the SGM proceed as planned (whether or not it is successful) I will be the first to admit that the trust is lost.

The Solution

The very first thing we need to do is expand the membership base by one or two orders of magnitude, as Patrick explains:

Increasing the number of members, will increase the stability of your organization. It is more difficult to try a Coup d’Etat or a revolution when you have to convince 200 voting members than 20. On the other hand, major changes will be slower for the same reason.

Any structure with a broad base is far more stable than the top heavy structure we have today (the subversion of which requires a mere THREE new members to be proposed at SGM!).

The two main obstacles to becoming a member today are:

  • A convoluted process requiring a “personally known” proposer and seconder as well as an explicit vote from the committee
  • A token USD10 annual fee, the proceeds of which (around €200) are a drop in the ocean

Fortunately the committee has the power to require “some other amount” (including zero) at least until such time as the organisation’s rules can be updated accordingly (see CAcertIncorporated and the Associations Incorporation Act for more details). Accordingly the membership fees for 2009/2010 should be immediately suspended as members are far more important than money right now.

The process for becoming a member should also be streamlined, if not completely overhauled. Surely I’m not the only one who considers it ironic that an open, community driven organisation should in fact be closed. Building the broadest possible membership base offers the best protection against attacks like this (and yes, I consider this an attack and urge the attackers to back off while the structure is stabilised). Associations are typically limited by guarantee – which means that becoming a member involves a commitment to pay a certain (usually token) amount in the event that the organisation should be would up (as opposed to companies limited by shares, where the liability is limited to the value of the shares themselves). People are far more likely to agree to this than reach into their own pockets (even if only due to laziness) so this change alone should make a huge difference.

The invitation to become a member should then be extended to some (e.g. assurers, assured, active cert holders, etc.) or all of the existing users, whose membership applications should be processed as efficiently as possible. Ideally this would be able to be done online as [an optional] part of the signup process (perhaps relying on Australia’s Electronic Transactions Act to capture electronic signatures) but for now the rules require writing or digitally signed email. A temporary “pipeline” consisting of one or more dedicated proposers and seconders could be set up, processing digitally signed applications from members as they arrive. The proposer and seconder requirement (who must be “personally known” to the applicant) should be eventually dropped and the “default deny” committee vote be dropped or replaced with a “default accept” [after 7 days?] veto. In any case only those with an existing interest in CAcert (e.g. a user account) will be eligible at this time so there is little risk of outsider influence.

Once we have a significantly larger membership base (at least 100 members but ideally more like 200-2000) we can proceed to an orderly election of a new board with each candidate providing a concise explanation of their experience and why they (individually) should be selected as representatives. The resulting board would likely be a mix of the two factions (who would hopefully have agreed to work together) as well as some “new blood”.

I hope that you will agree that this is the best way forward and that those of you who have offered support to the revolutionary(s) reconsider in the presence of this far safer alternative. Should they press on with the SGM I for one will be voting against the motions (and encourage you to do the same), not because I don’t agree “it’s time for change” but because of the way it has been effected.

On the Google Docs sharing security incident

I was just trying to respond to ZDnet’s hot-off-the-press article (The cloud bites back: Google bug shared private Google Docs data) about the recent Google Docs sharing vulnerability but ZDnet’s servers are throwing errors. Anyway now that Google have announced that they “believe the issue affected less than 0.05% of all documents” (rather than just emailing the affected users) I was considering writing a post anyway so killing two birds with one stone:

It’s convenient that they should prefer to use a percentage of an unknown number rather than a meaningful statistic, but given that sharing even a single document inappropriately could destroy a business or someone’s life it is still very serious. Fortunately I’ve not heard of any such incidences resulting from this breach (then again often you won’t).

Putting it in perspective though, for the same sample of documents over the same period how many do you think would have suffered security breaches under the “old way” of storing them locally and emailing them? And by security breaches I include availability (loss) and integrity (corruption) as well as confidentiality (disclosure).

People still lose/steal latops and leave data laden USB keys all over the place so I don’t see that this is much different from before and may well be better. Security researchers need statistics though so it would be useful if vendors were more transparent with information about breaches.

It would be great to see some more objective analysis and reporting comparing cloud computing with legacy systems – I’d say the fear mongerers would be surprised by the results.

Here’s some tips that cloud vendors should ideally try to follow:

  • Work with researchers to resolve reported issues
  • Always be transparent about security issues (even if you think nobody noticed)
  • Limited liability is not an excuse to be negligent – always write secure code and test thoroughly
  • Remember that at least until cloud computing is widely accepted (and even thereafter) you are in the business of trust, which is hard to gain and easy to lose.

That’s all for today – back to cloud standards…

Windows 7: Windows Vista Lite?

There’s no denying that Vista was a failure. A complete and utter disappointment. An unmitigated disaster. Microsoft have even essentially admitted it themselves, finally accepting what users, reviewers and wary businesses have been saying since before it even hit the shelves. It just didn’t bring enough benefit for its significant cost (early estimates were talking about $5k per seat to upgrade by the time you deliver new hardware, support it and train users), users hated it and some have even called it the most serious technical misstep in computing history. The fluff (transparent windows et al) exacted a heavy toll on the hardware and the delicate minimum requirements ‘balance’ was way off – set it too high and nobody can afford your software; too low and those who do complain about inadequate performance. Plus the long overdue security system was invasive and yet still largely ineffective.

The reality is that while XP has been ‘good enough’ for most users, Google and friends have been quietly the playing field from the corpse littered battlefields of operating systems and file formats to (now) mostly standardised browsers. It simply doesn’t matter now what your operating system is, and between Firefox’s rise to fame and so many heterogeneous mobile devices converging on the Internet it’s long since been impossible for webmasters to deny admittance to non-IE (and therefore non-Windows) clients.

In arriving at this point Free & Open Source Software (FOSS) has proven itself a truly disruptive force. Without it there would be no Google and no Amazon Web Services (and quite possibly no Amazon!). While Linux on the desktop may be a pipe dream, it’s carved a large slice out of the server market (powering the vast majority of cloud computing infrastructure) and its adoption is steadily rising on connected devices from mobiles and netbooks to television sets. There are multiple open source browsers, multiple open source scripting engines (to power web based applications), a new breed of client architecture emerging (thanks in no small part to Google Chrome) and even Microsoft are now talking about unleashing IE on the open source community (for better or worse).

So how did we get to Windows 7 (and back onto a sensible version numbering scheme) anyway? Here’s a look from an architecture point of view:

  • Windows 1/2: Rudimentary text based environment, didn’t introduce mouse/arrow keys until 2.x. Something like XTree Gold (which was my preference environment at the time).
  • Windows 3: A revolutionary step and the first version of Windows that didn’t suck and that most people are likely to remember.
  • Windows 95/98/ME: Evolution of 3.x and the first real mainstream version of Windows.
  • Windows NT 3.5x/4.0: Another revolutionary step with the introduction of the vastly superior NT (‘New Technologies’) kernel.
  • Windows 2000/XP: Refinement of NT and the result of recombining separate development streams for business and home users.
  • Windows Vista: Bloat, bloat and more bloat. Available in at least half a dozen different (expensive and equally annoying) versions, but many (most?) of its sales were for downgrade rights to XP.
  • Windows 7: Tommorow’s Windows. Vista revisited.

Before I explain why Windows 7 is to Vista what Windows Millennium Edition (WinMe) was to Windows 98 (and why that isn’t necessarily such a bad thing), let’s talk quickly about the Microsoft’s MinWin project. Giving credit where credit is due, the NT kernel is really quite elegant and was far ahead of its time when unleashed on the world over a dozen years ago. It’s stable, extensible, performant and secure (when implemented properly). It’s also been steadily improved through 3.51, 4.0, 2000, XP and Vista releases. It must be quite annoying for the bearded boffins to see their baby struggling under the load heaped on it by their fellow developers, and therein lies the problem.

That’s why the MinWin project (which seeks to deliver the minimum set of dependencies for a running system, albeit without even a graphics interface) is interesting both from a client, and especially from a cloud computing point of view. While MinWin weighs in at forty-something megabytes, Vista is well over a thousand (and usually a few gigabytes), but the point is that Microsoft now know how to be slim when they need to be.

Now that the market has spoken with its feet Microsoft are paying attention and Windows 7 lies somewhere on the Vista side of the MinWin to Vista bloat scale. The interface is a significant departure from Vista, borrowing much from other wildly successful operating systems like OS X, and like OS X it will be simpler, faster and easier to use. This is very similar to Windows ME’s notoriously unsuccessful bolting of the Windows 2000 interface onto Windows 98, only this time rather than putting a silk shirt on a pig we should end up with a product actually worth having. This is good news, especially for business users who by this time will have already been waiting too long to move on from XP.

Conversely, Azure (their forthcoming cloud computing OS) is on the MinWin side of the bloat scale. It is almost certainly heavily based on the Windows 2008 Server Core (which follows Novell’s example by evicting the unwanted GUI from the server), needing to do little more than migrate the management functions to a service oriented architecture. If (and only if) they get the management functions right then they will have a serious contender in the cloud computing space. That means sensible, scalable protocols which follow Amazon and Google’s examples (where machines are largely independent, talking to their peers for state information) rather than simply a layer on top of the existing APIs. Unfortunately Microsoft Online Services (MOS) feels currently more like the latter (even falling back to the old school web management tools for some products), but with any luck this will improve with time.

Provided they find the right balance for both products, this is good for IT architects (like myself), good for Microsoft, and most importantly, good for users. Perhaps the delay was their strategy all along, and why not when you can extract another year or two of revenue from the golden goose of proprietary software? In any case we’re at the dawn of a new era, and it looks like Microsoft will be coming to the party after all.