Amazon VPC trojan horse finds its mark: Private Cloud

Now we’ve all had a chance to digest the Amazon Virtual Private Cloud announcement and the dust has settled I’m joining the fray with a “scoop of interpretation“. Positioned as “a secure and seamless bridge between a company’s existing IT infrastructure and the AWS cloud” the product is (like Google’s Secure Data Connector for App Engine which preceded Amazon VPC by almost 6 months) quite simply a secure connection back to legacy infrastructure from the cloud – nothing more, nothing less. Here’s a diagram for those who prefer to visualise (Virtual Private Cloud.svg on Wikimedia Commons):

Notice that “private cloud” (at least in the sense that it is most often [ab]used today) is conspicuously absent. What Amazon and Google are clearly telling customers is that they don’t need their own “private cloud”. Rather, they can safely extend their existing legacy infrastructure into the [inter]cloud using VPN-like connections and all they need to do to get up and running is install the software provided or configure a new VPN connection (Amazon uses IPsec).

Remember, a VPN is the network you have when you’re not having a network – it behaves just like a “private network” only it’s virtual. Similarly a VPC is exactly that: a virtual “private cloud” – it behaves like a “private cloud” (in that it has a [virtual] perimeter) but users still get all the benefits of cloud computing – including trading capex for opex and leaving the details to someone else.

Also recall that the origin of the cloud was network diagrams where it was used to denote sections of the infrastructure that were somebody else’s concern (e.g. a telco). You just needed to poke your packets in one side and [hopefully] they would reappear at the other (much like the Internet). Cloud computing is like that too – everything within the cloud is somebody else’s concern, but if you install your own physical “private cloud” then that no longer holds true.

Of course the “private cloud” parade (unsurprisingly consisting almost entirely of vendors who peddle “private cloud” or their agents, often having some or all of their existing revenue streams under direct threat from cloud computing) were quick to jump on this and claim that Amazon’s announcement legitimised “private cloud”. Au contraire mes amis – from my [front row] seat the message was exactly the opposite. Rather than “legitimis[ing] private cloud” or “substantiating the value proposition” they completely undermined the “private cloud” position by providing a compelling “public cloud” based alternative. This is the mother of all trojan horses and even the most critical of commentators wheeled it right on in to the town square and paraded it to the world.

Upon hearing the announcement Christofer Hoff immediately claimed that Amazon had “peed on [our] fire hydrant” and Appistry’s Sam Charrington chimed in, raising him by claiming they had also “peed in the pool” ([ab]using one of my favourite analogies). Sam went on to say that despite having effectively defined the term Amazon’s product was not, in fact, “virtual private cloud” at all, calling into question the level of “logical isolation”. Reuven Cohen (another private cloud vendor) was more positive having already talked about it a while back, but his definition of VPC as “a method for partitioning a public computing utility such as EC2 into quarantined virtual infrastructure” is a little off the mark – services like EC2 are quarantined by default but granular in that they don’t enforce the “strong perimeter” characteristic of VPCs.

Accordingly I would (provisionally) define Virtual Private Cloud (VPC) as follows:

Virtual Private Cloud (VPC) is any private cloud existing within a shared or public cloud (i.e. the Intercloud).

This is derived from the best definition I could find for “Virtual Private Network (VPN)”.

An open letter to the CAcert.org board and members

This is an open letter to the CAcert.org board and membership (including my fellow 20-30 official “Association Members” (copied) as well as the 150,000 or so account holders we effectively represent) concerning recent events that could affect the ongoing viability of the organisation. Bearing in mind that this is an organisation built on trust, I implore you to follow my example in exercising extreme caution when we are called to necessarily intervene in resolving the deadlock. Despite claims to the contrary there is no urgency and the last thing we need now is an Iran style election (whether or not legitimate, perception is everything).

The Problem

It appears (from my perspective as an outsider, albeit with the benefit of various insider accounts) that the board has split into two factions. On one hand we have the “old school” who have been on the board for a while (some would say too long) and the other “reformist(s)” who seek change, yesterday. They are now on a crash course that will invariably result in the loss of committed contributors, or worse, loss of trust from the community. In any case a confrontation poses a serious risk to the organisation’s future, and with it the community’s access to an alernative to commercial certification authorities.

In requesting and receiving the official member list as well as proposing a number of new members (who are presumably sympathetic to their position and will vote for any motion they submit) it was already clear that plans were afoot for a “coup d’état”. Now that an SGM has been proposed to “get this over with” complete with a clear agenda there is absolutely no doubt about it:

  1. Acceptance of new members. (E.Schwob, A.Bürki, I.Grigg)
  2. Vote that the committee of management no longer enjoys the confidence of the members.
  3. Vote that the committee is hereby removed from office and election of a committee shall immediately follow adoption of this resolution.
  4. Election of a new committee of management.

It is no wonder that the existing board feel they are under attack – they effectively are – and given the “soonest this could be done is in 7 days” they are no doubt starting to feel the pressure. I don’t buy it. Yes, the auditor recently resigned and yes we will eventually need to get the audit back on track, but right now the number one issue is restoring stability to an unstable structure and minimising collateral damage. This needs to be done slowly and carefully and those promoting panic are perhaps deserving of the suspicion they have raised.

It is not my intent to start (yet another) discussion, rather to propose a safe and sensible way forward that will ensure CAcert’s ongoing viability while protecting our most valuable asset: the trust of the community. Should the SGM proceed as planned (whether or not it is successful) I will be the first to admit that the trust is lost.

The Solution

The very first thing we need to do is expand the membership base by one or two orders of magnitude, as Patrick explains:

Increasing the number of members, will increase the stability of your organization. It is more difficult to try a Coup d’Etat or a revolution when you have to convince 200 voting members than 20. On the other hand, major changes will be slower for the same reason.

Any structure with a broad base is far more stable than the top heavy structure we have today (the subversion of which requires a mere THREE new members to be proposed at SGM!).

The two main obstacles to becoming a member today are:

  • A convoluted process requiring a “personally known” proposer and seconder as well as an explicit vote from the committee
  • A token USD10 annual fee, the proceeds of which (around €200) are a drop in the ocean

Fortunately the committee has the power to require “some other amount” (including zero) at least until such time as the organisation’s rules can be updated accordingly (see CAcertIncorporated and the Associations Incorporation Act for more details). Accordingly the membership fees for 2009/2010 should be immediately suspended as members are far more important than money right now.

The process for becoming a member should also be streamlined, if not completely overhauled. Surely I’m not the only one who considers it ironic that an open, community driven organisation should in fact be closed. Building the broadest possible membership base offers the best protection against attacks like this (and yes, I consider this an attack and urge the attackers to back off while the structure is stabilised). Associations are typically limited by guarantee – which means that becoming a member involves a commitment to pay a certain (usually token) amount in the event that the organisation should be would up (as opposed to companies limited by shares, where the liability is limited to the value of the shares themselves). People are far more likely to agree to this than reach into their own pockets (even if only due to laziness) so this change alone should make a huge difference.

The invitation to become a member should then be extended to some (e.g. assurers, assured, active cert holders, etc.) or all of the existing users, whose membership applications should be processed as efficiently as possible. Ideally this would be able to be done online as [an optional] part of the signup process (perhaps relying on Australia’s Electronic Transactions Act to capture electronic signatures) but for now the rules require writing or digitally signed email. A temporary “pipeline” consisting of one or more dedicated proposers and seconders could be set up, processing digitally signed applications from members as they arrive. The proposer and seconder requirement (who must be “personally known” to the applicant) should be eventually dropped and the “default deny” committee vote be dropped or replaced with a “default accept” [after 7 days?] veto. In any case only those with an existing interest in CAcert (e.g. a user account) will be eligible at this time so there is little risk of outsider influence.

Once we have a significantly larger membership base (at least 100 members but ideally more like 200-2000) we can proceed to an orderly election of a new board with each candidate providing a concise explanation of their experience and why they (individually) should be selected as representatives. The resulting board would likely be a mix of the two factions (who would hopefully have agreed to work together) as well as some “new blood”.

I hope that you will agree that this is the best way forward and that those of you who have offered support to the revolutionary(s) reconsider in the presence of this far safer alternative. Should they press on with the SGM I for one will be voting against the motions (and encourage you to do the same), not because I don’t agree “it’s time for change” but because of the way it has been effected.

On the Google Docs sharing security incident

I was just trying to respond to ZDnet’s hot-off-the-press article (The cloud bites back: Google bug shared private Google Docs data) about the recent Google Docs sharing vulnerability but ZDnet’s servers are throwing errors. Anyway now that Google have announced that they “believe the issue affected less than 0.05% of all documents” (rather than just emailing the affected users) I was considering writing a post anyway so killing two birds with one stone:

It’s convenient that they should prefer to use a percentage of an unknown number rather than a meaningful statistic, but given that sharing even a single document inappropriately could destroy a business or someone’s life it is still very serious. Fortunately I’ve not heard of any such incidences resulting from this breach (then again often you won’t).

Putting it in perspective though, for the same sample of documents over the same period how many do you think would have suffered security breaches under the “old way” of storing them locally and emailing them? And by security breaches I include availability (loss) and integrity (corruption) as well as confidentiality (disclosure).

People still lose/steal latops and leave data laden USB keys all over the place so I don’t see that this is much different from before and may well be better. Security researchers need statistics though so it would be useful if vendors were more transparent with information about breaches.

It would be great to see some more objective analysis and reporting comparing cloud computing with legacy systems – I’d say the fear mongerers would be surprised by the results.

Here’s some tips that cloud vendors should ideally try to follow:

  • Work with researchers to resolve reported issues
  • Always be transparent about security issues (even if you think nobody noticed)
  • Limited liability is not an excuse to be negligent – always write secure code and test thoroughly
  • Remember that at least until cloud computing is widely accepted (and even thereafter) you are in the business of trust, which is hard to gain and easy to lose.

That’s all for today – back to cloud standards…

Windows 7: Windows Vista Lite?

There’s no denying that Vista was a failure. A complete and utter disappointment. An unmitigated disaster. Microsoft have even essentially admitted it themselves, finally accepting what users, reviewers and wary businesses have been saying since before it even hit the shelves. It just didn’t bring enough benefit for its significant cost (early estimates were talking about $5k per seat to upgrade by the time you deliver new hardware, support it and train users), users hated it and some have even called it the most serious technical misstep in computing history. The fluff (transparent windows et al) exacted a heavy toll on the hardware and the delicate minimum requirements ‘balance’ was way off – set it too high and nobody can afford your software; too low and those who do complain about inadequate performance. Plus the long overdue security system was invasive and yet still largely ineffective.

The reality is that while XP has been ‘good enough’ for most users, Google and friends have been quietly the playing field from the corpse littered battlefields of operating systems and file formats to (now) mostly standardised browsers. It simply doesn’t matter now what your operating system is, and between Firefox’s rise to fame and so many heterogeneous mobile devices converging on the Internet it’s long since been impossible for webmasters to deny admittance to non-IE (and therefore non-Windows) clients.

In arriving at this point Free & Open Source Software (FOSS) has proven itself a truly disruptive force. Without it there would be no Google and no Amazon Web Services (and quite possibly no Amazon!). While Linux on the desktop may be a pipe dream, it’s carved a large slice out of the server market (powering the vast majority of cloud computing infrastructure) and its adoption is steadily rising on connected devices from mobiles and netbooks to television sets. There are multiple open source browsers, multiple open source scripting engines (to power web based applications), a new breed of client architecture emerging (thanks in no small part to Google Chrome) and even Microsoft are now talking about unleashing IE on the open source community (for better or worse).

So how did we get to Windows 7 (and back onto a sensible version numbering scheme) anyway? Here’s a look from an architecture point of view:

  • Windows 1/2: Rudimentary text based environment, didn’t introduce mouse/arrow keys until 2.x. Something like XTree Gold (which was my preference environment at the time).
  • Windows 3: A revolutionary step and the first version of Windows that didn’t suck and that most people are likely to remember.
  • Windows 95/98/ME: Evolution of 3.x and the first real mainstream version of Windows.
  • Windows NT 3.5x/4.0: Another revolutionary step with the introduction of the vastly superior NT (‘New Technologies’) kernel.
  • Windows 2000/XP: Refinement of NT and the result of recombining separate development streams for business and home users.
  • Windows Vista: Bloat, bloat and more bloat. Available in at least half a dozen different (expensive and equally annoying) versions, but many (most?) of its sales were for downgrade rights to XP.
  • Windows 7: Tommorow’s Windows. Vista revisited.

Before I explain why Windows 7 is to Vista what Windows Millennium Edition (WinMe) was to Windows 98 (and why that isn’t necessarily such a bad thing), let’s talk quickly about the Microsoft’s MinWin project. Giving credit where credit is due, the NT kernel is really quite elegant and was far ahead of its time when unleashed on the world over a dozen years ago. It’s stable, extensible, performant and secure (when implemented properly). It’s also been steadily improved through 3.51, 4.0, 2000, XP and Vista releases. It must be quite annoying for the bearded boffins to see their baby struggling under the load heaped on it by their fellow developers, and therein lies the problem.

That’s why the MinWin project (which seeks to deliver the minimum set of dependencies for a running system, albeit without even a graphics interface) is interesting both from a client, and especially from a cloud computing point of view. While MinWin weighs in at forty-something megabytes, Vista is well over a thousand (and usually a few gigabytes), but the point is that Microsoft now know how to be slim when they need to be.

Now that the market has spoken with its feet Microsoft are paying attention and Windows 7 lies somewhere on the Vista side of the MinWin to Vista bloat scale. The interface is a significant departure from Vista, borrowing much from other wildly successful operating systems like OS X, and like OS X it will be simpler, faster and easier to use. This is very similar to Windows ME’s notoriously unsuccessful bolting of the Windows 2000 interface onto Windows 98, only this time rather than putting a silk shirt on a pig we should end up with a product actually worth having. This is good news, especially for business users who by this time will have already been waiting too long to move on from XP.

Conversely, Azure (their forthcoming cloud computing OS) is on the MinWin side of the bloat scale. It is almost certainly heavily based on the Windows 2008 Server Core (which follows Novell’s example by evicting the unwanted GUI from the server), needing to do little more than migrate the management functions to a service oriented architecture. If (and only if) they get the management functions right then they will have a serious contender in the cloud computing space. That means sensible, scalable protocols which follow Amazon and Google’s examples (where machines are largely independent, talking to their peers for state information) rather than simply a layer on top of the existing APIs. Unfortunately Microsoft Online Services (MOS) feels currently more like the latter (even falling back to the old school web management tools for some products), but with any luck this will improve with time.

Provided they find the right balance for both products, this is good for IT architects (like myself), good for Microsoft, and most importantly, good for users. Perhaps the delay was their strategy all along, and why not when you can extract another year or two of revenue from the golden goose of proprietary software? In any case we’re at the dawn of a new era, and it looks like Microsoft will be coming to the party after all.

HOWTO: Reverse engineer the iPhone protocols

A few months back (‘Apple iPhone 2.0: The real story behind MobileMe Push Mail and Jabber/XMPP Chat‘) I analysed how the iPhone interacted with the new MobileMe service with a view to offering the same features to Google Apps customers. Unfortunately this is not yet possible (the APIs don’t exist on both sides of the fence) but we learnt a lot in the process.

For those of you who have been living under a rock, MobileMe (previously known as .Mac) is Apple’s foray into cloud computing. It offers some impressive synchronisation and push services, but for a relatively steep annual subscription. One of the most coveted features is push mail, which makes e-mail behave more like instant messaging; as soon as the mail hits Apple’s servers they notify the clients which then retrieve the item. Technically that’s ‘pull’ with notifications rather than ‘push’ per se, but the result is the same; the user experience for email improves dramatically. They do similar things with contacts and calendar items. Due to popular demand (and making good on my promise to elaborate), here’s a brief explanation of how it was I got ‘under the hood’ of the iPhone’s encrypted communications with the MobileMe servers.

The first problem was to see what it was talking to. We’ve got a Mac household and a bunch of VMs (Windows, Linux and some other strange stuff) so I set up internet sharing on one of them and installed Wireshark. This allowed me to capture the (encrypted) traffic, which was terminating at aosnotify.mac.com:5223. Although we couldn’t decipher the traffic itself we already knew a fair bit from the server name, port number and traffic patterns; whenever a test mail arrived there was a small amount of traffic on this connection followed immediately by an IMAP poll and message retrieval. ‘AOS’ presumably stands for ‘Apple Online Services‘ (a division that’s at least 15 years old assuming it still exists) rather than Australian Online Solutions (which is what I translate ‘AOS’ to) and ‘notify’ tells us that they have a specific notification service, which reconciles with what we observed. Most importantly though, network port tcp/5223 was traditionally used by Jabber (XMPP) chat servers for encrypted (SSL) traffic; that reconciled too because Wireshark dissectors were able to peer into the SSL handshakes (but obviously not the data itself, at least not without the private keys stored safely on Apple’s servers and/or SSL accelerators).

The next problem was to tell the iPhone to talk to us rather than Apple. There’s a number of ways to skin this particular cat but if I remember well I went with dnsmasq running in a Linux VM and simply configured the DHCP server to use this DNS server rather than those of the ISP. Then it was just a case of overriding aosnotify.mac.com’s IP with the address of the VM by adding a line to the /etc/hosts file. That worked nicely and I could not only see the iPhone hitting the server when it started up, but could see that it still worked when I wired port 5223 up to the real Apple servers using a rudimentary Perl/Python proxy script. At this point I could capture the data, but it was still encrypted.

I now needed to convince the iPhone that it was talking to Apple when in fact it was talking to me; without a valid certificate for aosnotify.mac.com this wasn’t going to work unless Apple programmers had foolishly ignored handshake failures (that said, stranger things have happened). Assuming this wasn’t the case and knowing that this would be easily fixed with a patch (thereby blowing my third-party integration service out of the water) I started looking at the options. My iPhone was jailbroken so I could have hacked OS X or the iPhone software (eg by binary patching aosnotify.apple.com with aosnotify.<mydomain>.com) but I wanted a solution that would work OOTB.

Conveniently Apple had provided the solution in the iPhone 2.0 software by way of the new enterprise deployment features. All I needed to do was create my own certification authority and inject the root certificate into the iPhone by creating an enterprise configuration which contained it. I’d already played with both the iPhone Configuration Utility for OS X and the official Ruby on Rails iPhone Configuration Web Utility so this was a walk in the park. Deploying it was just a case of attaching the.mobileconfig file to an email and sending it to the iPhone (you can also deploy over HTTP I believe). Now I just needed to create a fake certificate, point the proxy script at it and start up the iPhone.

Sure enough this worked on the first attempt and I was even able to peer into the live, cleartext connection with Wireshark. Although I was surprised with the choice of XMPP’s publish/subscribe functionality, it makes a good deal of sense to use an existing protocol and what better choice than this one (though I wonder if Cisco’s acquisition of Jabber will have any effect). This discovery would have come as a disappointment for those touting the discovery of the private XMPP libraries (who naturally assumed that this translated to an Instant Messaging client like iChat), but it is interesting for us cloud architects.

Privacy and cloud computing

There has been a good deal of talk of late on the important topic of security and privacy in relation to cloud computing. Indeed there are some legitimate concerns and some work that needs to be done in this area in general, but I’m going to focus today on the latter term (indeed they are distinct – as a CISSP security is my forte but I will talk more on this separately):

Privacy is the ability of an individual or group to seclude themselves or information about themselves and thereby reveal themselves selectively.

Traditionally privacy has been maintained by physically controlling access to sensitive data, be it by hiding one’s diary under one’s mattress through installation of elaborate security systems. Access is then selectively restricted to trusted associates as required, often without surrendering physical control over the object. In a world of 1’s and 0’s it’s a similar story, only involving passwords, encryption, access control lists, etc.

Occasionally however we do need to surrender information to others in order to transact and as part of everyday life; be it to apply for a drivers license or passport, or subscribe to a commercial service. In doing so we hope that they (‘the controller’ in European Union parlance) will take care of it as it were their own, but this is rarely the case unless economics and/or regulations dictate:

Externalisation leaves the true cost of most breaches to be borne by the data subject rather than the controller; the victim rather than the perpetrator.

Currently even the largest breaches go relatively unpunished, in that corporations typically only face limited reputational damage and (depending on the jurisdiction) the cost of notifying victims, while the affected individuals themselves can face permanent financial ruin and associated problems. According to the Data Loss Database, only days ago arrests were made over 11,000,000 records copied by a call center worker, and the hall off shame is topped by TJX with almost 100m customer records (including credit card numbers). Often though the data is simply ‘lost’, on a device or backup media which has been stolen, misplaced or sold on eBay.

Personal information has similar properties to nuclear waste; few attributes are transient (account balance), most have long half-lives (address, telephone) many can outlive the owner (SSN) and some are by definition immutable (DoB, eye colour).

In an environment of rampent consumer credit being foisted on us by credit providers who have little in the way of authentication beyond name, address and date of birth these losses can be devastating. This imbalance will need to be leveled by lawmakers (for example by imposing a per-record penalty for losses that would transform minor annoyances into serious financial disincentives), but this is tangential to the special case of cloud computing, rather serving to give background into the prevalent issues.

Cloud computing is relatively immune to traditional privacy breaches; there is no backup media to lose, laptop based databases to steal, unencrypted or unauthenticated connections to sniff or hijack, etc.

The fact is that many (likely most) of these breaches could have been avoided in a cloud computing environment. Data is stored ‘in the cloud’ and accessed by well authenticated users over well secured connections. Authentication is typically via passwords and/or tokens (we even have a prototype smart card authentication product) and encryption usually over Transport Layer Security (TLS), centrally enforced by the cloud applications and cloud services. A well configured cloud computing architecture (with a secure client supporting strong authentication and encryption) is a hacker’s worst nightmare. Granted we still have some tweaking to do (eg the extended validation certificates farce) but the attack surface area can be reduced to a single port (tcp/443) which is extremely antisocial until it is satisfied that you are who you say you are (and vice versa).

A well configured cloud computing architecture is a hacker’s worst nightmare. Conversely, a poorly configured cloud computing architecture is a hacker’s best dream.

On the other hand, one of the best ways to keep information safe is not to collect it in the first place; by consolidating the data the reward for a successful attack increases significantly. Fortunately the defenses typically improve at least proportionally, with vendors (whose businesses are built on trust) deploying armies of security boffins that an individual entity could only dream of. The risk is similar to that of a monoculture, the same term that has been used to describe the Windows monopoly (and we have seen the effects of this in the form of massive distributed botnets); the Irish can tell you why putting all your eggs in one basket is a particularly bad idea.

In summary the potential for enhanced privacy protection in a cloud computing environment is clear, provided the risks are properly and carefully mitigated. We are making good progress in this area and overall the news is good, but we need to tread carefully and keep a very close eye on the spectre of ubiquitous surveillance (Big Brother), large scale privacy breaches and targeted attacks.

Cloud computing has the technology and many of the systems in place already; now it is up to the lawmakers to step up to the plate.

The Cloud Computing Doghouse: Nirvanix (aka Streamload aka MediaMax aka The Linkup)

Although Dell have been denied the ill-fated cloud computing trademark (that’s lowercase please. hold the ™) and moved on to more interesting things, they’re yet to concede defeat and withdraw their application. Even though the double decker bus has disappeared from the moon, that leaves us with 6 months of uncertainty before USPTO consider it abandoned, during which time they can appeal the decision. Although it is generally accepted that they would have a snowball’s chance in hell of succeeding, I would have preferred they take it out the back and put it out of its misery, and they can stay in the doghouse until they do (or it expires).

On the other hand there’s a backlog of crass acts of stupidity in the cloud computing space so they’re going to have so shove over and make room in the doghouse for someone (or something) new; the inaugurating member can’t monopolise it forever. And who better than a ‘new’ company associated with “the meltdown of an online storage service that will leave about 20,000 paying subscribers without their digital music, video, and photo files”: Nirvanix.

First and foremost (given they have apparently threatened to sue one of their own founders) this is an opinion piece based on what little information I have been able to scratch together from various online sources – draw your own conclusions and do your own research before you rely on anything here. It is more a commentary on one of the inherent but easily mitigated risks of cloud computing – unreliable providers – than on Nirvanix itself.

Let’s start with some background and basic maths:

Today you can buy a terabyte (1Tb) hard drive with a 5 year (60 month) warranty for $150 retail single unit quantities. Meanwhile the going rate for cloud storage is about $0.15/Gb/month. Ignoring complications like formatting losses, servers (which are cheap and can host many drives), bandwidth, etc., simply by wiring these up to the cloud one could get a return on investment in a month ($0.15 x 1000Gb/m = $150) and over the life of the $150 drive you can make a whopping $9,000.

Admittedly a gross simplification, but to remote users looking down relatively narrow pipes it can be very difficult to tell the difference between a cheap desktop hard drive and an expensive enterprise SAN (that run at about $20/Gb/year, over an order of magnitude more expensive than cloud storage). At least it is until the thing loses their precious 1’s and 0’s, in which case you hope it was run (or at least backed) by a large storage vendor from redundant datacenters rather than a long haired 16 year old from his basement. Herein lies the problem; presumably Nirvanix/Streamload/MediaMax/The Linkup (or whatever they’re calling themselves today) fall somewhere between the two extremes (hopefully the former rather than the latter), but it’s hard to tell where.

If the various articles (especially this one) are to be believed, the whole sorry saga goes something like this:

  • Steve Iverson (a uni student at the time) develops “adaptive data compression algorithms” for his thesis in 1998
  • Shortly after graduation he founded Streamload to “easily and securely send, store, move, receive and access their digital files
  • By 2005 Streamload was hosting about half a petabyte (425Tb) of data for “well over 20,000 users
  • Streamload was rebadged (after receiving some investment) to Streamload MediaMax™ (as distinct from MediaMax, Inc. which did not exist at the time) on the DEMOfall 05 stage as “a suite of ultra-high capacity online services that helps you manage, share, and access all the files and digital media in your life.
  • However by December 2006 it was losing money and Patrick Harr (current Nirvanix CEO) replaced Steve (with his blessing) as CEO and Steve became CTO. After 60 days assessment the new CEO “advocated letting it ‘gracefully die’ and creating a new company selling ‘cloud’ storage to paying enterprise customers“.
  • Disaster struck on June 15 2007 when “a Streamload system administrator’s script accidently misidentified and deleted ‘good data’ along with the ‘dead data’ of some 3.5 million former user accounts and files
  • Two weeks later Streamload’s board of directors pressed on with Harr’s strategy and “split the company into two independent businesses. Streamload changed its name to Nirvanix. It kept many of the former company’s physical assets [including all the servers and data] and employees, and secured $12$18 million in initial venture funding.
  • Meanwhile “The MediaMax consumer product and its disgruntled customers went to Iverson as CEO of a ‘new’ business” along with “only about $500,000 in working capital” while Nirvanix managed to scratch together a cool $18m from the likes of Intel.
  • After a botched upgrade to MediaMax v5 (which by Steve’s own admission introduced a bunch of features users didn’t want) they changed their name again to The Linkup which was marketed as “a social networking site based around storage“, only to also botch the migration to 20% more expensive (at $5.95/$11.95 per month) paid-only services.
  • Users of the free service were given three weeks (which was extended due to problems with the ‘mover’ script) to upgrade or permanently lose their data. Curiously the data was the whole time stored on Nirvanix servers and was being migrated to their new enterprise Storage Delivery Network.
  • Late July Nirvanix Clarifies False Information in Blogosphere in a blog post buried in their developer site.
  • MediaMax/The Linkup closed its doors on 8/8/08, having given users 30 days notice to retrieve their (remaining) data.

As at today the various angry masses are waiting for Nirvanix to give them access to (what remains, apparently about half of) their data, which Nirvanix assures us “remain[s] secure in the old Streamload/MediaMax storage system” (although it is not clear whether the files migrated to The Linkup were not deleted 8 days after the 8/8/8 closure). They also claim “access to those files requires the MediaMax application front-end and database” (roping SAVVIS, who apparently maintained the frontend, into the fray) but MediaMax claim to have offered it to them, noting that “if they could have got the files back, they would have”. Steve goes on to say:

Fundamentally, MediaMax is responsible because you are our customer, and the biggest mistake we made was to trust Nirvanix to manage our customer data – yes, it was on the “old Streamload system”, and not their new Nirvanix SDN, but I believe the care and attention that was required was not there and was beyond unprofessional.

Here’s where it gets really interesting. In Nirvanix’s own words:

Are Nirvanix Inc. and MediaMax Inc. the same company?

No. Nirvanix and MediaMax split out of the same company, Streamload, Inc. in July 2007. Each company would be independently formed with separate ownership, oversight and investors. The companies were subsequently split off in July 2007 and have been separate and distinct entities since that time.

Did Nirvanix delete user data?

No, Nirvanix has not deleted any customer data.

Did a storage problem occur at Streamload?

As documented on the MediaMax blog in July 2007, a storage problem did occur at Streamload on the Streamload/MediaMax storage system in June 2007. This occurred prior to the formation of Nirvanix Inc. and was completely independent of the Nirvanix Storage Delivery Network which was not launched until October 2007.

The problem with these denials, and in particular the claim that the mass deletions at the start of the death spiral “occurred prior to the formation of Nirvanix Inc.”, is that it conflicts not only with what investors, ex-partners, users, etc. say but also with the California Secretary of State, who list Nirvanix, Inc. as a “merged out” California corporation (C2111900) filed on 15 June 1998 (conveniently the exact same month Streamload was founded; almost a decade before they claim it came into existence) and as a Delaware corporation (C3051094) filed on 16 October 2007. Incidentally MediaMax, Inc. (C2998020) was filed earlier, on 16 May 2007. In case you’re wondering what “merged out” means (despite having to learn all this as CAcert‘s Organisation Assurance Officer I had to look it up too), here’s the definition:

The limited partnership or limited liability company has merged out of existence in California into another business entity. The name of the surviving entity can be obtained by requesting a status report.

Thus it appears that Streamload, Inc. changed its name to Nirvanix, Inc. which then “merged out” of existence in California, “into” Nirvanix, Inc. (Delaware)… the corporate equivalent of moving house (it would be good if someone in the US could get a status report to confirm).

A murderer changing her name after the crime and then claiming immunity on the grounds that it happened before she existed would spend the rest of her life in jail.

Even if they were a different legal entity as claimed they still appparently have the same staff, same 525 B Street, San Diego address, even the same CEO (which I’ll bet a judge would find interesting). If they are one and the same then is it not actually Nirvanix, Inc. who still has a binding contract with all those customers (at the very least least the ones who didn’t migrate to The Linkup)? Did the original Streamload terms allow for a transfer from StreamloadNirvanix to MediaMax? Did the customers agree? Indeed, was it not then a StreamloadNirvanix system administrator who ordered the deletion of the data? (Update: According to a comment MediaMax claim it was, which reconciles with the dates above.)

So why have Nirvanix thus far managed to escape culpability in the form of public (PR) execution and class action lawsuits? This appears to be no accident, rather the result of a sustained [dis]information campaign. For example, most of this information is from the Nirvanix article in Wikipedia which was recently nominated for deletion, apparently by Matthew Harvey at JPR Communications (Nirvanix’s PR firm) who already blanked it twice before being blocked for doing it a third time as a sock puppet. Jonathan Buckley (Nirvanix’s Chief Marketing Officer) also weighed in with a Strong Delete vote (that was largely ignored as a conflict of interest) and the article was unsurprisingly kept and remains to give a voice to the disenfranchised masses. They have also apparently been fairly active with the bloggers, calling their posts “inaccurate and libelous”, a post by an investor “suspect and untrue”, again claiming “Nirvanix was not even incorporated in June of 2007”, and you can bet there’s plenty more going on that we don’t hear about (Update: including press censorship, astroturfing and blaming the victims, claiming they “are all software pirates and porn addicts”).

The more cynical reader could be forgiven for believing that this was planned (but I think it was more a case of incompetence and gross negligence):

  • Develop interesting technology
  • Build reputation by servicing users for free
  • Get millions in investment
  • Float said users off on a leaky liferaft with $1 in $37 ($500k for MediaMax vs $18m for Nirvanix), and the inventor himself
  • $$$Profit$$$

Why do I care? I don’t particularly (at least not about this specific situation) but like the rest of the fledgling cloud computing industry I do find articles that could have been easily avoided (like “Storms in the cloud leave users up creek without a paddle“) difficult to swallow. I’ve never used their services and I don’t compete with them; if anything I may end up recommending them to my consulting clients if they are the best fit for a problem. I do however feel for the 20,000 or so people who lost irreplacible photographs, video, music and other data through acts that can only be described as gross negligence; as a long time professional system administrator I find occurances like the June 2007 accidental deletion extremely hard to accept. The story of a disenfranchised inventor having been parted with his invention is oh-so-common too. Finally, I just don’t like coverups:

Trust is (for now) an essential component in cloud computing infrastructure and victims of outages, data loss, privacy breaches, breakins, etc. have every right to full transparency.

Were this another storage provider (eg Amazon S3) there would have been a clear demarcation point (the APIs) and it would have been possible to demonstrate that the client either called for the destruction of data or did not. Accordingly, immutable audit logs should be maintained and made available to cloud computing users (this is not always the case today – often they are kept but not accessible). There should also be protection against accidental deletions (in that they should not be immediately committed unless purging is required and requested, eg to satisfy a privacy policy or other legal requirement). Nirvanix notes that (for the SDN at least) “at any point during this eight-day [deletion] process, the file can be fully recovered” and other providers have similar checks and balances (this is almost certainly why you can’t recreate a Google Apps user for 5 days, for example).

So where to from here? If Nirvanix do have the data as they claim, then they should stop the ‘internal’ bickering and do everything within their power to get as much of the property (data) as possible back to its rightful owners, or give a full and transparent explanation for why this is impossible. If they are in fact the same legal entity the users contracted with initially (Streamload, Inc., as appears to be the case) then they should take responsibility for their [in]actions, apologise and offer a refund. That being the case, customers should hold them to this, both directly (info@nirvanix.com or 619.764.5650) and with the help of organisations like GetSatisfaction.com, Better Business Bureau or if necessary, the courts.

In the mean time they can stay in the doghouse, with Dell…

Google Chrome: Cloud Operating Environment

Google Chrome is a lot more than a next generation browser; it’s a prototype Cloud Operating Environment.

Rather than blathering on to the blogosphere about the superficial features of Google’s new Chrome browser I’ve spent the best part of my day studying the available material and [re]writing a comprehensive Wikipedia article on the subject which I intend for anyone to be free to reuse under a Creative Commons Attribution 3.0 license (at least this version anyway) rather than Wikipedia’s usual strong copyleft GNU Free Documentation License (GFDL). This unusual freedom is extended in order to foster learning and critical analysis, particularly in terms of security

My prognosis is that this is without doubt big news for cloud computing, and as a CISSP watching with disdain at the poor state of web browser security big news for the security community too. Here’s why:

Surfing the Internet today is like unprotected sex with strangers; Chrome is the condom of the cloud.

The traditional model of a monolithic browser is fundamentally and fatally flawed (particularly with the addition of tabs). Current generation browsers lump together a myriad trusted and untrusted software (yes, many web sites these days are more software than content) running in the same memory address space. Even with the best of intentions this is intolerable as performance problems in one area can cause problems (and even data loss) in others. It’s the web equivalent of the bad old days where one rogue process would take down the whole system. Add nefarious characters to the mix and it’s like living in a bad neighbourhood with no locks

Current generation browsers are like jails without cells.

Chrome introduces a revolutionary new software architecture, based on components from other open source software, including WebKit and Mozilla, and is aimed at improving stability, speed and security, with a simple and efficient user interface.

The first intelligent thing Chrome does is split each task into a separate process (‘sandbox’), thus delegating to the operating system which has been very good at process isolation since we introduced things like pre-emptive multitasking and memory protection. This exacts a fixed per-process resource cost but avoids memory fragmentation issues that plague long-running browsers. Every web site gets its own tab complete with its own process and WebKit rendering engine, which (following the principle of least privilege) runs with very low privileges. If anything goes wrong the process is quietly killed and you get a sad mac style sad tab icon rather than an error reporting dialog for the entire browser.

Chrome enforces a simple computer security model whereby there are two levels of multilevel security (user and sandbox) and the sandbox can only respond to communication requests initiated by the user. Plugins like Flash which often need to run at or above the security level of the browser itself are also sandboxed in their own relatively privileged processes. This simple, elegant combination of compartments and multilevel security is a huge improvement over the status quo, and it promises to further improve as plugins are replaced by standards (eg HTML 5 which promises to displace some plugins by introducing browser-native video) and/or modified to work with restricted permissions. There are also (publicly accessible) blacklists for warning users about phishing and malware and an “Incognito” private browsing mode.

Tabs deplace windows as first class citizens and can migrate between them like an archipelago of islands.

The user interface follows the simplification trend, and much of the frame or “browser chrome” (hence the name) can be hidden altogether so as to seamlessly blend web applications (eg Gmail) with the underlying operating system. Popups are confined to their source tab unless explicitly dragged to freedom, the “Omnibox” simplifies (and remembers) browsing habits and searches and the “New Tab Page” replaces the home page with an Opera style speed dial interface along with automatically integrated search boxes (eg Google, Wikipedia). Gears remains as a breeding ground for web standards and the new V8 JavaScript engine promises to improve performance of increasingly demanding web applications with some clever new features (most notably dynamic compilation to native code).

Just add Linux and cloud storage and you’ve got a full blown Cloud Operating System (“CloudOS”)

What is perhaps most intersting though (at least from a cloud computing point of view) is the full-frontal assault on traditional operating system functions like process management (with a task manager that allows users to “see what sites are using the most memory, downloading the most bytes and abusing (their) CPU”). Chrome is effectively a Cloud Operating Environment for any (supported) operating system in the same way that early releases of Windows were GUIs for DOS. All we need to do now is load it on to a (free) operating system like Linux and wire it up to cloud storage (ala Mozilla Weave) for preferences (eg bookmarks, history) and user files (eg uploads, downloads) and we have a full blown Cloud Operating System!

Update: Fixed URLs.

Chrome URLs:

DNS is dead… long live DNS!

Most of us rely heavily (more heavily than we realise, and indeed should) on this rickety old thing called DNS (the Domain Name System), which was never intended to scale as it did, nor to defend against the kinds of attacks it is subjected to today.

The latest DNS related debacle is (as per usual) related to cache poisoning, which is where your adversary manages to convince your resolver (or more specifically, one of the caches between your resolver and the site/service you are intending to connect to) that they are in fact the one you want to be talking to. Note that these are not man-in-the-middle (MitM) attacks; if someone can see your DNS queries you’re already toast – these are effective, remote attacks that can be devastating:

Consider for example your average company using POP3 to retrieve mail from their mail server every few minutes, in conjunction with single sign on; convince their cache that you are their mail server and you will have everyone’s universal cleartext password in under 5 minutes.

The root of the problem(s) is that the main security offered in a DNS transaction is the query ID (QID) for which there are only 16 bits (eg 65,536 combinations). Even when properly randomised (as was already the case for sensible implementations like djbdns, but not for earlier attempts which foolishly used sequential numbering), fast computers and links can make a meal of this in no time (read, seconds), given enough queries. Fortunately you typically only get one shot for a given name (for any given TTL period – usually 86,400 seconds; 1 day), and even then you have to beat the authorative nameserver with the (correct) answer. Unfortunately, if you can convince your victim to resolve a bunch of different domains (a.example.com, b.example.com … aa.example.com and so on) then you’ll eventually (read, seconds) manage to slip one in.

So what you say? You’ve managed to convince a caching server that azgsewd.victim.com points at your IP – big deal. But what happens if you slipped in extra resource records (RRs) for, say, http://www.victim.com or mail.victim.com? A long time ago you might have been able to get away with this attack simply by smuggling unsolicited answers for victim.com queries along with legitimate answers to legitimate queries, but we’ve been discarding unsolicited answers (at least those that were not ‘in-baliwick’; eg from the same domain) for ages. However here you’ve got a seemingly legitimate answer to a seemingly legitimate question and extra RRs from the same ‘in-baliwick’ domain, which can be accepted by the cache as legitimate and served up to all the clients of that cache for the duration specified by the attacker.

This is a great example of multiple seemingly benign vulnerabilities being [ab]used together such that the result is greater than the sum of its parts, and is exactly why you should be very, very sure about discounting vulnerabilities (for example, a local privilege escalation vulnerability on a machine with only trusted users can be turned into a nightmare if coupled with a buffer overrun in an unprivileged daemon).

Those who still think they’re safe because an attacker needs to be able to trigger queries are sadly mistaken too. Are your caching DNS servers secure (bearing in mind UDP queries can be trivially forged)? Are your users machines properly secured? What about the users themselves? Will they open an email offering free holidays (containing images which trigger resolutions) or follow a URL on a flyer handed to them at the local metro station, café or indeed, right outside your front door? What about your servers – is there any mechanism to generate emails automatically? Do you have a wireless network? VPN clients?

Ok so if you’re still reading you’ve either patched already or you were secure beforehand, as we were at Australian Online Solutions given our DNS hosting platform doesn’t cache; we separate authorative from caching nameservers, and our caches have used random source ports from the outset. This increases the namespace from 16 bits (65k combinations) to (just shy of, since some ports are out of bounds) 32 bits (4+ billion combinations). If you’re not secure, or indeed not sure if you are, then contact us to see how we can help you.

Apple iPhone 2.0: The real story behind MobileMe Push Mail and Jabber/XMPP Chat

So those of you who anticipated a Jabber/XMPP chat client on the iPhone (and iPod Touch) after TUAW rumoured that ‘a new XMPP framework has been spotten(sic) in the latest iPhone firmware‘ back in April were close… but no cigar. Same applies for those who hypothesised about P-IMAP or IMAP IDLE being used by MobileMe for push mail.

The real story, as it turns out, is that Jabber (the same open protocol behind many instant messaging networks including Google Talk) is actually being used for delivering push mail notifications to the iPhone. That’s right, you heard it here first. This would explain not only why the libraries were curiously private (in that they are not exposed to developers) but also why IMAP IDLE support only works while Mail.app is open (it’s a shame because Google Apps/Gmail supports IMAP IDLE already).

While it’s in line with Apple’s arguments about background tasks hurting user experience (eg performance and battery life), cluey developers have noted that the OS X (Unix) based iPhone has many options to safely enable this functionality (eg via resource limiting) and that the push notification service for developers is only a partial solution. It’s no wonder though with the exclusive carrier deals which are built on cellular voice calls and SMS traffic, both of which could be eroded away entirely if products like Skype and Google Talk were given free reign (presumably this is also why Apple literally hangs onto the keys for the platform). If you want more freedom you’re going to have to wait for Google Android, or for ultimate flexibility one of the various Linux based offerings. We digress…

So without further ado, here’s the moment we’ve all been waiting for: a MobileMe push mail notification (using XMPP’s pubsub protocol) from aosnotify.mac.com:5223 over SSL:

<message from="pubsub.aosnotify.mac.com" to="sam@aosnotify.mac.com/5e60ad2e47da9fca36de59244f25c9b1cd8e0cb8" id="/protected/com/apple/mobileme/sam/mail/Inbox__sam@aosnotify.mac.com__3gK4m">
<event xmlns="http://jabber.org/protocol/pubsub#event">
<items node="/protected/com/apple/mobileme/sam/mail/Inbox">
<item id="5WE7I82L5bdNGm2">
<plistfrag xmlns="plist-apple">
<key>maild</key>
<string>E1B537</string>
</plistfrag>
</item>
</items>
</event>
<x xmlns="jabber:x:delay" stamp="2008-07-18T01:11:11.447Z"/>
</message>

<message from="pubsub.aosnotify.mac.com" to="sam@aosnotify.mac.com/5e60ad2e47da9fca36de59244f25c9b1cd8e0cb8" id="/protected/com/apple/mobileme/sam/mail/Inbox__sam@aosnotify.mac.com__NterM">
<event xmlns="http://jabber.org/protocol/pubsub#event">
<items node="/protected/com/apple/mobileme/sam/mail/Inbox">
<item id="8ATABX9e6satO6Y">
<plistfrag xmlns="plist-apple">
<key>maild</key>
<string>544FE17</string>
</plistfrag>
</item>
</items>
</event>
<headers xmlns="http://jabber.org/protocol/shim">
<header name="pubsub#subid">3DEpJ055dXgB2gLRTQYvW4qGh91E36y2n9e27G3X</header>
</headers>
</message>

I’ll explain more about the setup I used to get my hands on this in another post later on. So what’s the bet that this same mechanism will be used for the push notification service to be released later in the year?