08 March 2012

Simplifying cloud: Reliability

The original Google server rack

Reliability in cloud computing is a very simple concept which I've explained in many presentations but never actually documented:

Traditional legacy IT systems consist of relatively unreliable software (Microsoft Exchange, Lotus Notes, Oracle, etc.) running on relatively reliable hardware (Dell, HP, IBM servers, Cisco networking, etc.). Unreliable software is not designed for failure and thus any fluctuations in the underlying hardware platform (including power and cooling) typically result in partial or system-wide outages. In order to deliver reliable service using unreliable software you need to use reliable hardware, typically employing lots of redundancy (dual power supplies, dual NICs, RAID arrays, etc.). In summary:

unreliable software
reliable hardware

Cloud computing platforms typically prefer to build reliability into the software such that it can run on cheap commodity hardware. The software is designed for failure and assumes that components will misbehave or go away from time to time (which will always be the case, regardless of how much you spend on reliability - the more you spend the lower the chance but it will never be zero). Reliability is typically delivered by replication, often in the background (so as not to impair performance). Multiple copies of data are maintained such that if you lose any individual machine the system continues to function (in the same way that if you lose a disk in a RAID array the service is uninterrupted). Large scale services will ideally also replicate data in multiple locations, such that if a rack, row of racks or even an entire datacenter were to fail then the service would still be uninterrupted. In summary:

reliable software
unreliable hardware

Asked for a quote for Joe Weinman's upcoming Cloudonomics: The Business Value of Cloud Computing book, I said:

"The marginal cost of reliable hardware is linear while the marginal cost of reliable software is zero."

That is to say, once you've written reliability into your software you can scale out with cheap hardware without spending more on reliability per unit, while if you're using reliable hardware then each unit needs to include reliability (typically in the form of redundant components), which quickly gets very expensive.
The other two permutations are ineffective:

Unreliable software on unreliable hardware gives an unreliable system. That's why you should never try to install unreliable software like Microsoft Exchange, Lotus Notes, Oracle etc. onto unreliable hardware like Amazon EC2:

unreliable software
unreliable hardware

Finally, reliable software on reliable hardware gives a reliable but inefficient and expensive system. That's why you're unlikely to see reliable software like Cassandra running on reliable platforms like VMware with brand name hardware:

reliable software
reliable hardware

Google enjoyed a significant competitive advantage for many years by using commodity components with a revolutionary proprietary software stack including components like the distributed Google File System (GFS). You can still see Google's original hand-made racks built with motherboards laid on cork board at their Mountain View campus and the computer museum (per image above), but today's machines are custom made by ODMs and are a lot more advanced. Meanwhile Facebook have decided to focus on their core competency (social networking) and are actively commoditising "unreliable" web scale hardware (by way of the Open Compute Project) and software (by way of software releases, most notably the Cassandra distributed database which is now used by services like Netflix).

The challenge for enterprises today is to adopt cheap reliable software so as to enable the transition away from expensive reliable hardware. That's easier said than done, but my advice to them is to treat this new technology as another tool in the toolbox and use the right tool for the job. Set up cloud computing platforms like Cassandra and OpenStack and look for "low-hanging fruit" to migrate first, then deal with the reticent applications once the "center of gravity" of your information technology systems has moved to cloud computing architectures.

P.S. Before the server huggers get all pissy about my using the term "relatively unreliable software", this is a perfectly valid way of achieving a reliable system — just not a cost effective one now "relatively reliable software" is here.

9 comments:

  1. Sam, agreed 100%. And I am happy to have more than 140 characters to ask you a question that I have had in the pipe since I heard this moniker years ago.

    Mileage may vary but I am one of those that believes in commodity hardware and value in the software. You seem however to jump from reliable hw all the way to reliable software (software = middleware/applications).

    What if there was/is/will be a way to take that commodity hardware (at scale)... cheap x86, DAS, no shared storage etc etc put a piece of infrastructure software on it that will do the magic you describe but yet allow to run "legacy" middleware/applications the way they would run on a "reliable hardware" type of infrastructure.

    It would have the benefit of being able to use commodity hardware but yet run reliably legacy applications.

    Thoughts?

    Massimo (yeah ... I work for VMware)

    ReplyDelete
  2. While the reliable/unreliable reads well, I think that as far as the software is concerned, it is not that it is unreliable, but rather that it is not fault tolerant. Assuming that Exchange, Oracle etc are reliable on their own (Oracle seldoms falls over because of its own defects), the 'unreliability' is their inability to handle faults in the underlying hardware (and OS, drivers, network etc). So it becomes for of Fault Intolerant / Reliable vs Fault Tolerant / Unreliable.

    ReplyDelete
  3. @Massimo: it's certainly an interesting idea but when you get down to the details you typically end up looking at high bandwidth/low latency links to keep application state in sync, which is fine when you're on the same blade backplane, or even in the same rack/datacenter, but when you get sufficient geographical diversity to tolerate disasters you start running into troubles (including at some point the physical constraints of the speed of light). Don't get me wrong, I think this would be a great product that would add significant value to cloud services like Amazon EC2, but distributed, fault tolerant software architectures will still hold the upper hand.

    @Simon: I agree it's not really fair to call legacy software "unreliable", but a product like SQL Server is *relatively* unreliable compared to something like Cassandra. Of course there are applications for which you can't ignore ACID constraints and therefore need to go with a more traditional architecture, but these are SFAICT the exception. In any case the point of the article is to simplify the concept and fault [in]tolerant vs [un]reliable isn't as clear (even if technically more correct).

    ReplyDelete
    Replies
    1. @Sam, and that is where I think we kind of lose each others in the details. I totally agree (once again) about the latency / distance limitations you are referring to. The two models you and I are referring to just solve the problem in different ways ... but achieve similar results, in fact latency and bandwidth constraints are limitations in both approaches. Let me clarify with an example.

      In your model you may setup a distributed (Cassandra?) database that has nodes in Zurich and New York. All nodes are active at the same time and keep copies of each other partitions. Due to bandwidth and latency limitations you can't handle this consistently and hence you have to introduce this notion of "eventually consistent" databases. That to me means that you trade off flexibility with consistency, that is to say that you accept to lose data should one of the active nodes fail and hasn't yet replicated committed data on the active partition.

      In my model I'd use a legacy monolithic database that is running in Zurich. I have however setup a replica of the database (in the form of a virtual disk software replication mechanism) that can pair the Zurich node with the (passive) node in New York. For the same latency and bandwidth reasons the replica isn't synchronous so a failure of the Zurich node will trigger a failover to the node in New York where the legacy database will restart with a specific RPO driven by the network constraints.

      The "RPO" in my model looks pretty similar to the "eventually consistent" concept of your model (apart that "eventually consistent" sounds way more cooler than RPO). :)

      So what am I missing?

      Massimo.

      Delete
    2. So that's the way you'd do it today, but if you don't keep everything consistent (over expensive high bandwidth, low latency links) then you have difficult failure modes (lost records, partitioning, etc). I think the biggest risk with your approach (rather than using a system designed from the ground up for this mode of operation) is data corruption (e.g. duplicated or lost orders), which translates to financial loss. If it looks like "eventually consistent" then why not just use "eventually consistent"? You're still just trading off your ACID constraints and the extent to which this is possible is really dictated by the application.

      Delete
    3. > If it looks like "eventually consistent" then why not just use "eventually consistent"?

      For many reasons I would say. First of which is that you don't have to rewrite the millions of legacy applications out there. Unless you are Google / Facebook or a startup that was born in the last couple of years there is a tremendous advantage (IMO) to take what you have (we politely call it "legacy".. we could also say "sh*t") and apply the same principles (commodity hardware etc etc) that is the characteristics of the new "application model" you described (which is good, but it requires application re-engineering).

      There was an internal discussion yesterday at VMware re this very same topic (old Vs new applications). There is no question the world is going in the direction you are mentioning. The timescale is the problem. I hear people excited about AWS breaking the 1B$ revenue mark ... I believe that is (at least) how much IBM makes today on the AS/400 platform alone. Legacy has a long (very long) tale.

      > I think the biggest risk with your approach ... is data corruption (e.g. duplicated or lost orders), which translates to financial loss

      I am with you with data corruption (more likely to happen in a deployment where the records are replicated at the infrastructure level rather than by the application itself) albeit many last generation databases has reached a good level of maturity around the recovery for such situations (which wouldn't be very different compared to a local catastrophic failure without having to necessarily discuss complex DR scenarios).

      However I am not sure I follow you re the lost orders / financial loss. Isn't that the same problem you have with "eventually consistent" solution based on asyncronous replication? Ie they will be eventually consistent yes.. but if a problem occurs between the "now" and the "eventual moment" you lost those orders and you do have a financial loss. No? Perhaps it's a good time for me to go home and start reading seriously about Cassandra because there may be something I am missing :)

      Massimo.

      Delete
    4. A cloud based platform is only going to work if the hardware supports it. The whole point of storing in the cloud is for greater accessibility and less limitations. It's important to deliver a cloud based solution that is compatible.

      Delete
  4. Thank you for posting this article in this forum I will bookmark this page and tell my friends about this ...
    server racks

    ReplyDelete
  5. "The challenge for enterprises today is to adopt cheap reliable software so as to enable the transition away from expensive reliable hardware." – I couldn’t agree more. But with the many cloud service providers today, I don’t think companies would have a hard time looking for the right software.

    ReplyDelete

Note: Only a member of this blog may post a comment.