Leaving Equinix

500px-Equinix.svg

It’s been three years to the day since my last post — a side effect of my being completely immersed in my job at Equinix (where I was, until last week, Director of Cloud & IT Services). I’ve been based in Zürich and working ostensibly in London for the past 5 years (having spent the last decade in Europe, and probably a year of it in Silicon Valley), though in reality I’ve spent most of my time on the road — according to TripIt I’ve traveled almost a million kilometres to almost 200 locations, be it to visit partners, customers, attend & present at events, or work with colleagues in other offices, as well as the occasional holiday. Here’s hoping I’ll be able to be more grounded for the coming years (though if the last week is any indication I’m not so sure)!

When non-technical people ask me who Equinix is (Americans often confuse it with Equinox, the gyms — maybe they’ll tie up one day so your treadmill will power the data centres?), I tell them they’re essentially the “landlord of the Internet”. That’s not entirely true — there are a number of carrier-neutral, multi-tenant data center providers in the market — but it’s understandable, and few can hold a candle to Equinix’s quality, scale, global reach, and (arguably most importantly), business ecosystems. Another analogy I use is the “Hilton of the Internet”, where companies wishing to participate can rent a room, meet each other at the “lobby bar” (regional events and Marketplace), and communicate over the “phone system” (Internet exchanges). Chairman Peter Van Camp refers to the data centres (“International Business Exchanges” or “IBXs”) as “international airports where passengers from many different airlines make connections to get to their final destinations”. You get the idea.

As the Internet developed, Equinix founders Jay Adelson and the late Al Avery identified a need for a neutral location for carriers to connect together — the Switzerland of the Internet if you like. Over the 15+ years since it was founded in 1998, Equinix has grown from its first location in the USA to a global footprint of 105 data centres in 33 metros (cities) in 15 countries spanning 5 continents (by the time you read this they may have many more thanks to the acquisition of Telecity which will basically double the size of the EMEA region). The company usually expands through acquisition or by building new data centres, typically following a “metro” model whereby an accessible (but not necessarily central) location is chosen for a “campus” of data centres (London for example now has 6 data centres, half of which are on the same road in Slough). Recent builds look something like these:

AM3Amsterdam AM3

equinix-1-web

Melbourne ME1

Having established a critical mass of network service providers, Equinix IBXs became attractive to early content providers like Yahoo! They needed to reach the eyeballs which were connected to the carriers (at the time, typically by dialup or ADSL services), and rather than tapping into multiple/many carriers in one location they’d have to arrange to connect to those carriers wherever they were. Furthermore, the carriers themselves needed to connect to each other (that’s the “inter” in “internet”), and they found it easier to do so in a neutral location rather than on their own turf.

Equinix went on to establish similar ecosystems around the financial industry, where trading exchanges (like Internet exchanges) would act as magnets for high frequency traders, news providers, etc. — there are now thriving financial hubs in 16 Equinix metros. More recent ecosystems include advertising, whereby a content provider could ask — in the milliseconds it takes to render a page — for advertisers to bid on ad placements. While light travels quickly, over long distances it can significantly impair the performance of an application (plus it travels slower inside glass fibres), and for these applications there’s no prize for second! The most interesting ecosystem though (in my somewhat biased opinion anyway) is the cloud ecosystem. By chance many of the content providers of yesterday (Amazon, Google, Microsoft) transformed into the cloud providers of today, and I think it’s safe to say now that Equinix is the “home of the cloud” (a term I introduced in 2011, albeit somewhat aspirational at the time).

When I joined Equinix the only way to access cloud providers was over the Internet, or by special arrangement (typically only available to the largest customers like Netflix). This was a problem for most enterprise consumers, and indeed 8 of the top 10 blockers for cloud adoption according to analysts are partially or fully addressed by bypassing the Internet. We first launched AWS Direct Connect with Amazon that year, and I proposed that the process should be more automated (at the time it required filling out paperwork and waiting for someone in the data centre to run a fibre from your infrastructure to a port you had to rent in Amazon’s). The solution proposed by product was a box of robots, and while I was no stranger to boxes of robots from my time at Google, I was convinced we could do better. Here’s the back-of-an-envelope blueprint I submitted in my first month in the company, which (following years of research and development by the CTO office and product teams) essentially went on to become the Equinix Cloud Exchange (I called it CloudConnect at the time, but there were trademark issues):

Cloud Exchange

This hybrid- and multi-cloud architecture allows customers to seamlessly integrate legacy/on-premises, hosted private, and public cloud infrastructure, and I believe it (or something like it) will be the “default” reference architecture for most enterprises in future. Anyone can automate a switch fabric though — indeed a number of competitors have (we even had something like it at UNSW ~20 years ago which would allow you to put any port anywhere on campus onto any network, via a web interface, using the same standards no less!). What differentiates Equinix’s is the presence of hundreds of cloud providers, including all of the top providers in the market today (thanks in no small part to the tireless efforts of the GAM and CAT teams).

For the enterprise CIO, they should look at the data centre as an operating system, only rather than installing best-of-breed applications like Office and Photoshop, they simply connect to services like Office 365 and AWS (after all, cloud is simply the migration from product to service). Alternatively I often use the shopping mall analogy, only rather than visiting to buy products from a store (like Apple), you’re buying from a service provider (like Apple).

Anyway, having spent the past decade on the provision of services at Citrix, Google, and Equinix, I’m hanging up my Equinix hat and getting to work on the consumption and application of information technology to solving business problems (among other things). Watch this space.

HTTP2 Expression of Interest

Here’s my (rather rushed) personal submission to the Internet Engineering Task Force (IETF) in response to their Call for Expressions of Interest in new work around HTTP; specifically, a new wire-level protocol for the semantics of HTTP (i.e., what will become HTTP/2.0), and new HTTP authentication schemes. You can also review the submissions of Facebook, FirefoxGoogle, Microsoft, Twitter and others.

[The views expressed in this submission are mine alone and not (necessarily) those of Citrix, Google, Equinix or any other current, former or future client or employer.]

My primary interest is in the consistent application of HTTP to (“cloud”) service interfaces, with a view to furthering the goals of the Open Cloud Initiative (OCI); namely widespread and ideally consistent interoperability through the use of open standard formats and interfaces.

In particular, I strongly support the use of the existing metadata channel (headers) over envelope overlays (SOAP) and alternative/ancillary representations (typically in JSON/XML) as this should greatly simplify interfaces while ensuring consistency between services. The current approach to cloud “standards” calls on vendors to define their own formats and interfaces and to maintain client libraries for the myriad languages du jour. In an application calling on multiple services this can result in a small amount of business logic calling on various bulky, often poorly written and/or unmaintained libraries. The usual counter to the interoperability problems this creates is to write “adapters” (ala ODBC) which expose a lowest-common-denominator interface, thus hiding functionality and creating an “impedence mismatch”. Ultimately this gives rise to performance, security, cost and other issues.

By using HTTP as intended it is possible to construct (cloud) services that can be consumed using nothing more than the built-in, standards compliant HTTP client. I’m not writing to discuss whether this is a good idea, but to expose a use case that I would like to see considered, and one that we have already applied with an amount of success in the Open Cloud Computing Interface (OCCI).

To illustrate the original scope, versions of HTTP (RFC 2068) included not only the Link header (recently revived by Mark Nottingham in RFC 5988) but also LINK and UNLINK verbs to manipulate it (recently proposed for revival by James Snell). Unfortunately hypertext, and in particular HTML (which includes linking in-band rather than out-of-band) arguably stole HTTP’s thunder, leaving the overwhelming majority of formats that lack in-band linking (images, documents, virtual machines, etc.) high and dry and resulting in inconsistent linking styles (HTML vs XML vs PDF vs DOC etc.). This limited the extent of web linking as well as the utility of HTTP for innovative applications including APIs. Indeed HTTP could easily and simply meet the needs of many “Semantic Web” applications, but that is beyond the scope of this particular discussion.

To illustrate by way of example, consider the following synthetic request/response for an image hosting site which incorporates Web Linking (RFC 5988), Web Categories (draft-johnston-http-category-header) and Web Attributes (yet to be published):

GET /1.jpg HTTP/1.0

HTTP/1.0 200 OK
Content-Length: 69730
Content-Type: image/jpeg
Link: http://creativecommons.org/licenses/by-sa/3.0/; rel=”license”
Link: /2.jpg; rel=”next”
Category: dog; label=”Dog”; scheme=”http://example.org/animals”
Attribute: name=”Spot”

In order to “animate” resources, consider the use of the Link header to start a virtual machine in the Open Cloud Computing Interface (OCCI):

Link: </compute/123;action=start>; rel="http://schemas.ogf.org/occi/infrastructure/compute/action#start"

The main objection to the use of the metadata channel in this fashion (beyond the application of common sense in determining what constitutes data vs metadata) is implementation issues (e.g. arbitrary limitations, i18n, handling of multiple headers, etc.) which could be largely resolved through specification. For example, the (necessary) use of e.g. RFC 2231 encoding for header values (but not keys) in e.g. RFC 5988 Web Linking gives rise to unnecessary complexity that may lead to interoperability, security and other issues which could be resolved through the specification of Unicode for keys and/or values. Another concern is the absence of features such as a standardised ability to return a collection (e.g. multiple responses). I originally suggested that HTTP 2.0 incorporate such ideas in 2009.

I’ll leave the determination of what would ultimately be required for such applications to the working group (should this use case be considered interesting by others), and while better support for performance, scalability and mobility are obviously required this has already been discussed at length. I strongly support Poul-Henning Kamp’s statement that “I think it would be far better to start from scratch, look at what HTTP/2.0 should actually do, and then design a simple, efficient and future proof protocol to do just that, and leave behind all the aggregations of badly thought out hacks of HTTP/1.1.” (and agree that we should incorporate the concept of a “HTTP Router”) as well as Tim Bray’s statement that: “I’m inclined to err on the side of protecting user privacy at the expense of almost all else” (and believe that we should prevent eavesdroppers from learning anything about an encrypted transaction; something we failed to do with DNSSEC even given alternatives like dnscurve that ensure confidentiality as well as integrity).

Leaving Google+


Ironically many Google employees have even given up on Google+
(though plenty still post annoying “Moved to Google+” profile pics on other social networks)

One of those sneaky tweets that links to Google+ just tricked me into wading back into the swamp that it’s become, hopefully for the last time (I say “hopefully” because in all likelihood I’ll be forced back onto it at some point — it’s already apparently impossible to create a Google Account for any Google services without also landing yourself a Google+ profile and Gmail account and it’s very likely that the constant prompting for me to “upgrade” to Google+ will be more annoying than the infamous red notification box). Here’s what I saw in my stream:

  • 20 x quotes/quotepics/comics
  • 8 x irrelevant news articles & opeds
  • 1 x PHP code snippet
  • 3 x blatant ads
  • 2 x Google+ fanboi posts (including this little chestnut: “Saying nobody uses Google+ is like a virgin saying sex is boring. They’ve never actually tried it.” — you just failed at life by comparing Google+ to sex my friend).
  • 2 x random photos

That’s pretty much 0% signal and 100% noise, and before you jump down my throat about who I’m following, it’s a few hundred generally intelligent people (though I note it is convenient that the prevalent defense for Google+ being a ghost town, or worse, a cesspool, is that your experience depends not only on who you’re following, but what they choose to share with you — reminds me of the kind of argument you regularly hear from religious apologists).

Google+ Hangouts

My main gripe with Google+ this week though was the complete failure of Google+ Hangouts (which should arguably be an entirely separate product) for Rishidot Research‘s Open Conversations: Cloud Transparency on Monday. The irony of holding an open/transparency discussion on a close platform aside, we were plagued with technical problems from the outset. First it couldn’t find my MacBook Air’s camera so I had to move from my laptop to my iMac (which called for heavy furniture to be moved to get a clean background). When I joined we started immediately (albeit late, and sans 2-3 of the half dozen attendees), but it wasn’t long before one of the missing attendees joined and repeatedly interrupted the first half of the meeting with audio problems. The final attendee never managed to join, though their name and a blank screen appeared each of the 5-10 times they tried. We then inexplicably lost two attendees, and by the time they managed to re-join I too got a “Network failure for media packets” error:

Then there was “trouble connecting with the plugin”, which called for me to refresh the page and then reinstall the plugin:

Eventually I made it back in, only to discover that we had now lost the host(!?!) and before long it was down to just me and one other attendee. We struggled through the last half of the hour but it was only afterwards that we discovered we were talking to ourselves because the live YouTube stream and recording stopped when the host was kicked out. Needless to say, Google+ Hangouts are not ready for the prime time, and if you invite me to join one then don’t be surprised if I refer you to this article.

Hotel California

To leave Google+ head over to Google Takeout and download your Circles (I grabbed data for other services too for good measure, and exported this blog separately since my profile is now Google+ integrated). You might want to see who’s following you, Actions->Select All and dump them into a circle first, otherwise you’ll probably lose that information when you close your account.

When you go to the Google+ “downgrade” page and select “Delete your entire Google profile” you’ll get a sufficiently complicated warning as to scare most people back into submission, but the most concerning part for me was this unhelpful help advising “Other Google products which require a profile will be impacted“:

Fortunately for YouTube and Blogger at least you can check and revert your decision to use a Google+ profile respectively, but you’ll immediately be told to “Connect to Google+” once you unplug:

After that it’s just a case of checking “I understand that deleting this service can’t be undone and the data I delete can’t be restored.” and clicking “Remove selected services” (what “selected services”? I just want to be rid of Google+!). I’ll let you know how that goes once my friends on Google+ have had a chance to read this.

Getting started with OpenStack in your lab

Having recently finished building my new home lab I wanted to put the second server to good use by installing OpenStack (the first is running VMware ESXi 5.0 with Windows 7, Windows 8, Windows 8 Server and Ubuntu 12.04 LTS virtual machines). I figured many of you would benefit from a detailed walkthrough so here it is (without warranty, liability, support, etc).

The two black boxes on the left are HP Proliant MicroServer N36L’s with modest AMD Athlon(tm) II Neo 1.3GHz dual-core processors and 8GB RAM and the one on the right is an iomega ix4-200d NAS box providing 8TB of networked storage (including over iSCSI for ESXi should I run low on direct attached storage). There’s a 5 port gigabit switch stringing it all together and a 500Mbps CPL device connecting it back up the house. You should be able to set all this up inside 2 grand. Before you try to work out where I live, the safe is empty as I don’t trust electronic locks.

IMG 1198

Download Ubuntu Server (12.04 LTS or the latest long term support release) and write it to a USB key — if you’re a Mac OS X only shop then you’ll want to follow these instructions. Boot your server with the USB key inserted and it should drop you straight into the installer (if not you might need to tell the BIOS to boot from USB by pressing the appropriate key, usually F2 or F10, at the appropriate time). Most of the defaults are OK but you’ll probably want to select the “OpenSSH Server” option in tasksel unless you want to do everything from the console, but be sure to tighten up the default configuration if you care about security. Unless you like mundane admin tasks then you might want to enable automatic updates too. Even so let’s ensure any updates since release have been applied:

sudo apt-get update
sudo apt-get -u upgrade

Next you’ll want to install DevStack (“a documented shell script to build complete OpenStack development environments from RackSpace Cloud Builders“), but first you’ll need to get git:

sudo apt-get install git

Now grab the latest version of DevStack from GitHub:

git clone git://github.com/openstack-dev/devstack.git

And run the script:

cd devstack/; ./stack.sh

The first thing it will do is ask you for passwords for MySQL, Rabbit, a SERVICE_TOKEN and SERVICE_PASSWORD and finally a password for Horizon & Keystone. I used the (excellent) 1Password to generate passwords like “sEdvEuHNNeA7mYJ8Cjou” (the script doesn’t like special characters) and stored them in a secure note.

The script will then go and download dozens of dependencies, which are conveniently packaged by Ubuntu and/or the upstream Debian distribution, run setup.py for a few python packages, clone some repositories, etc. While you wait you may as well go read the script to understand what’s going on. At this point the script failed because /opt/stack/nova didn’t exist. I filed bug 995078 but the script succeeded when I ran it for a second time — looks like it may have been a glitch with GitHub.

You should end up with something like this:

Horizon is now available at http://10.0.1.10/
Keystone is serving at http://10.0.1.10:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: qqG6YTChVLzEHfTDzm8k
This is your host ip: 10.0.1.10
stack.sh completed in 431 seconds.

If you browse to that address you’ll be able to log in to the console:

Openstack login

That will drop you into the Admin section of the OpenStack Desktop (Horizon) where you can get an overview and administer instances, services, flavours, images, projects, users and quotas. You can also download OpenStack and EC2 credentials from the “Settings” pages.

Openstack console

Switch over to the “Project” tab and “Create Keypair” under “Access & Security” (so you can access any instances you create):

Openstack keygen

The key pair will be created and downloaded as a .pem file (e.g. admin.pem).

Now select “Images & Snapshots” under “Manage Compute” you’ll be able to launch the cirros-0.3.0-x86_64-uec image which is included for testing. Simply click “Launch” under “Actions”:

Openstack project

Give it a name like “Test”, select the key pair you created above and click “Launch Instance”:

Openstack launch

You’ll see a few tasks executed and your instance should be up and running (Status: Active) in a few seconds:

Openstack spawning

Now what? First, try to ping the running instance from within the SSH session on the server (you won’t be able to ping it from your workstation):

$ ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_req=1 ttl=64 time=0.734 ms
64 bytes from 10.0.0.2: icmp_req=2 ttl=64 time=0.585 ms
64 bytes from 10.0.0.2: icmp_req=3 ttl=64 time=0.588 ms

Next let’s copy some EC2 credentials over to our user account on the server so we can use the command line euca-* tools. Go to “Settings” in the top right and then the “EC2 Credentials” tab. Now “Download EC2 Credentials”, which come in the form of a ZIP archive containing an X.509 certificate (cert.pem) and key (pk.pem) pair as well as a CA certificate (cacert.pem) and an rc script (ec2rc.sh) to set various environment variables which tell the command line tools where to find these files:

Openstack ec2

While you’re at it you may as well grab your OpenStack Credentials which come in the form of an rc script (openrc.sh) only. It too sets environment variables which can be seen by tools running under that shell.

Openstack rc

Let’s copy them (and the key pair from above) over from our workstation to the server:

scp b34166e97765499b9a75f59eaff48b98-x509.zip openrc.sh admin.pem samj@10.0.1.10:~

Stash the EC2 credentials in ~/.euca:

mkdir ~/.euca; chmod 0700 ~/.euca; cd ~/.euca
cp ~/b34166e97765499b9a75f59eaff48b98-x509.zip ~/.euca; unzip *.zip

Finally let’s source the rc scripts:

source ~/.euca/ec2rc.sh
source ~/openrc.sh

You’ll see the openrc.sh script asks you for a password. Given this is a dev/test environment and we’ve used a complex password, let’s modify the script and hard code the password by commenting out the last 3 lines and adding a new one to export OS_PASSWORD:

# With Keystone you pass the keystone password.
#echo "Please enter your OpenStack Password: "
#read -s OS_PASSWORD_INPUT
#export OS_PASSWORD=$OS_PASSWORD_INPUT
export OS_PASSWORD=qqG6YTChVLzEHfTDzm8k

You probably don’t want anyone seeing your password or key pair so let’s lock down those files:

chmod 0600 ~/openrc.sh ~/admin.pem

Just make sure the environment variables are set correctly:

echo $EC2_USER_ID
42
echo $OS_USERNAME
admin

Finally we should be able to use the EC2 command line tools:

euca-describe-instances 
RESERVATION r-8wvdh1c7 b34166e97765499b9a75f59eaff48b98 default
INSTANCE i-00000001 ami-00000001 test test running None (b34166e97765499b9a75f59eaff48b98, ubuntu) 0 m1.tiny 2012-05-05T13:59:47.000Z nova aki-00000002 ari-00000003 monitoring-disabled 10.0.0.2 10.0.0.2 instance-store

As well as the openstack command:

openstack list server
+--------------------------------------+------+--------+------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+------------------+
| 44a43355-7f95-4621-be61-d34fe53e50a8 | Test | ACTIVE | private=10.0.0.2 |
+--------------------------------------+------+--------+------------------+

You should be able to ssh to the running instance using the IP address and key pair from above:

ssh -i admin.pem -l cirros 10.0.0.2
$ uname -a
Linux cirros 3.0.0-12-virtual #20-Ubuntu SMP Fri Oct 7 18:19:02 UTC 2011 x86_64 GNU/Linux

That’s all for today — I hope you find the process as straightforward as I did and if you do follow these instructions then please leave a comment below (especially if you have any tips or solutions to problems you run into along the way).

Is carrying an iPhone worth the risk?

Update: It appears that Apple have resolved the issue with the September launch of IOS 7, essentially by implementing what I suggested below (highlighted):

Find my iphone
Yesterday I was robbed of my brand new iPhone (S/N: DNPGQ4RDDTDM IMEI: 013032008785006 ) for the second time, in public, in Paris. While I’m still a little shaken, angry and disappointed, I’m glad everyone survived unscathed… this time (last time I was assaulted in the process).

These less fortunate victims of crime lost their lives over iPhones, in the course of a robbery, in trying to retrieve the stolen device and as an innocent bystander respectively:

The latter story (around this time last year), in which a 68 year old woman was pushed down a flight of stairs in a Chicago subway station by the fleeing thief only to die later of head injuries, is almost identical to a robbery in Paris in which a young woman also died of head injuries only weeks prior:

Paris police data from that period showed that 53 percent of 1,071 violent thefts on Paris public transport involved smartphones, and the last two models of iPhones accounted for almost 28 percent of items stolen on public transport. The Interior Minister was at the time seeking faster efforts to allow smartphone owners to “block” stolen phones, disabling calling functions to make them worthless in the resale market as a deterrent to theft. “It will be naturally much less attractive” to steal a phone that can be de-activated remotely, he noted, adding that “we have the technical means to deter thieves”. And yet the grey market for iPhones is obviously still alive and well some 18 months later, in no small part because the parties with the capability to solve the problem (carriers, manufacturers, etc.) lack the interest (stolen phones drive new sales).

This brings me to the point of this post — finding a technical solution to solve the problem once and for all. Indeed, if a smartphone can be “bricked” then its resale value is severely limited. Most efforts today involve blacklisting the IMEI number such that the phone cannot be used on the networks in that country, but this usually takes time as it has to be done securely (typically by the operator from which it was purchased, and only after receiving a police report — too bad for those of us who purchase outright from a retailer!). A few days is long enough for the thief to sell the phone, only to have the buyer find it stop working some time later, thus creating another victim of crime (albeit someone guilty of receiving stolen goods, and in doing so driving demand!). Unless the database is global (which gives rise to other problems including distributed trust, denial of service, duplicated IMEIs, equipment limitations, etc.) then the thief can just sell it into another market, especially here in Europe, or swap it.

Enter Apple, who already have (and heavily advertise) the capability to securely locate, message and wipe the device (should it be able to reach the Internet — too bad if you’re roaming and have data disabled, and care about security and have auto join networks disabled, as I did!). Their trivial restore process (which makes iPhones extremely, and I would argue unnecessarily, transferable) also apparently involves a handshake with Apple servers, so who better to “brick” stolen devices by preventing them from being restored until returned? This would make it essentially impossible for anyone but the legitimate owner of the device to make use of it, thereby destroying the market and going from the most attractive to least attractive smartphone for thieves overnight. Sure you could argue that it’s not their problem, but unlike the police they have the capability (and I would argue the interest) to put an end to it once and for all.

I for one will be seriously reconsidering the cost vs benefit of carrying a device that others value more than my own life, and I’m sure that the benefit of a “Remote Disable” function in competitive advantage would outstrip the profit from replacement of stolen devices, so it’s not just about doing the right thing.

Update: Brian Katz points out that the thief need only enter the wrong PIN 10 times and then the iPhone will factory reset itself (depending on settings), no need for iTunes restore!

P.S. Here’s some advice on protecting your iPhone as well as some tips for avoiding pickpockets in Paris from TripAdvisor and the US Embassy.

Simplifying cloud: Reliability

The original Google server rack

Reliability in cloud computing is a very simple concept which I’ve explained in many presentations but never actually documented:

Traditional legacy IT systems consist of relatively unreliable software (Microsoft Exchange, Lotus Notes, Oracle, etc.) running on relatively reliable hardware (Dell, HP, IBM servers, Cisco networking, etc.). Unreliable software is not designed for failure and thus any fluctuations in the underlying hardware platform (including power and cooling) typically result in partial or system-wide outages. In order to deliver reliable service using unreliable software you need to use reliable hardware, typically employing lots of redundancy (dual power supplies, dual NICs, RAID arrays, etc.). In summary:

unreliable software
reliable hardware

Cloud computing platforms typically prefer to build reliability into the software such that it can run on cheap commodity hardware. The software is designed for failure and assumes that components will misbehave or go away from time to time (which will always be the case, regardless of how much you spend on reliability – the more you spend the lower the chance but it will never be zero). Reliability is typically delivered by replication, often in the background (so as not to impair performance). Multiple copies of data are maintained such that if you lose any individual machine the system continues to function (in the same way that if you lose a disk in a RAID array the service is uninterrupted). Large scale services will ideally also replicate data in multiple locations, such that if a rack, row of racks or even an entire datacenter were to fail then the service would still be uninterrupted. In summary:

reliable software
unreliable hardware

Asked for a quote for Joe Weinman’s upcoming Cloudonomics: The Business Value of Cloud Computing book, I said:

The marginal cost of reliable hardware is linear while the marginal cost of reliable software is zero.

That is to say, once you’ve written reliability into your software you can scale out with cheap hardware without spending more on reliability per unit, while if you’re using reliable hardware then each unit needs to include reliability (typically in the form of redundant components), which quickly gets very expensive.
The other two permutations are ineffective:

Unreliable software on unreliable hardware gives an unreliable system. That’s why you should never try to install unreliable software like Microsoft Exchange, Lotus Notes, Oracle etc. onto unreliable hardware like Amazon EC2:

unreliable software
unreliable hardware

Finally, reliable software on reliable hardware gives a reliable but inefficient and expensive system. That’s why you’re unlikely to see reliable software like Cassandra running on reliable platforms like VMware with brand name hardware:

reliable software
reliable hardware

Google enjoyed a significant competitive advantage for many years by using commodity components with a revolutionary proprietary software stack including components like the distributed Google File System (GFS). You can still see Google’s original hand-made racks built with motherboards laid on cork board at their Mountain View campus and the computer museum (per image above), but today’s machines are custom made by ODMs and are a lot more advanced. Meanwhile Facebook have decided to focus on their core competency (social networking) and are actively commoditising “unreliable” web scale hardware (by way of the Open Compute Project) and software (by way of software releases, most notably the Cassandra distributed database which is now used by services like Netflix).

The challenge for enterprises today is to adopt cheap reliable software so as to enable the transition away from expensive reliable hardware. That’s easier said than done, but my advice to them is to treat this new technology as another tool in the toolbox and use the right tool for the job. Set up cloud computing platforms like Cassandra and OpenStack and look for “low-hanging fruit” to migrate first, then deal with the reticent applications once the “center of gravity” of your information technology systems has moved to cloud computing architectures.

P.S. Before the server huggers get all pissy about my using the term “relatively unreliable software”, this is a perfectly valid way of achieving a reliable system — just not a cost effective one now “relatively reliable software” is here.

Cloud computing’s concealed complexity

Cloud gears cropped

James Urquhart claims Cloud is complex—deal with it, adding that “If you are looking to cloud computing to simplify your IT environment, I’m afraid I have bad news for you” and citing his earlier CNET post drawing analogies to a recent flash crash.

Cloud computing systems are complex, in the same way that nuclear power stations are complex — they also have catastrophic failure modes, but given cloud providers rely heavily on their reputations they go to great lengths to ensure continuity of service (I was previously the technical program manager for Google’s global tape backup program so I appreciate this first hand). The best analogies to flash crashes are autoscaling systems making too many (or too few) resources available and spot price spikes, but these are isolated and there are simple ways to mitigate the risk (DDoS protection, market limits, etc.)

Fortunately this complexity is concealed behind well defined interfaces — indeed the term “cloud” itself comes from network diagrams in which complex interconnecting networks became the responsibility of service providers and were concealed by a cloud outline. Cloud computing is, simply, the delivery of information technology as a service rather than a product, and like other utility services there is a clear demarcation point (the first socket for telephones, the meter for electricity and the user or machine interface for computing).

Everything on the far side of the demarcation point is the responsibility of the provider, and users often don’t even know (nor do they need to know) how the services actually work — it could be an army of monkeys at typewriters for all they care. Granted it’s often beneficial to have some visibility into how the services are provided (in the same way that we want to know our phone lines are secure and power is clean), but we’ve developed specifications like CloudAudit to improve transparency.

Making simple topics complex is easy — what’s hard is making complex topics simple. We should be working to make cloud computing as approachable as possible, and drawing attention to its complexity does not further that aim. Sure there are communities of practitioners who need to know how it all works (and James is addressing that community via GigaOm), but consumers of cloud services should finally be enabled to apply information technology to business problems, without unnecessary complexity.

If you find yourself using complex terminology or unnecessary acronyms (e.g. anything ending with *aaS) then ask yourself if you’re not part of the problem rather than part of the solution.

Flash/Silverlight: How much business can you afford to turn away?

Tim Anderson was asking about the future of Silverlight on Twitter today so here are my thoughts on the subject, in the context of earlier posts on the future of Flash:2009: Why Adobe Flash penetration is more like 50% than 99%
2010: Face it Flash, your days are numbered.
2011: RIP Adobe Flash (1996-2011) – now let’s bury the dead

In the early days of the Internet, a lack of native browser support for “advanced” functionality (particularly video) created a vacuum that propelled Flash to near ubiquity. It was the only plugin to achieve such deep penetration, though I would argue never as high as 99% (which Adobe laughably advertise to this day). As a result, developers were able to convince clients to adopt the platform for all manner of interactive sites (including, infamously, many/most restaurants).

The impossible challenge for proprietary browser plugins is staying up-to-date and secure across a myriad hardware and software platforms — it was hard enough trying to support multiple browsers on multiple versions of Windows on one hardware platform (x86), but with operating systems like Linux and Mac OS X now commanding non-negligible shares of the market it’s virtually impossible. Enter mobile devices, which by Adobe’s own reckoning outnumber PCs by 3 to 1. Plugin vendors now have an extremely diverse ecosystem of hardware (AMD, Intel, etc.) and software (Android, iOS, Symbian, Windows Phone 7, etc.) and an impossibly large number of permutations to support. Meanwhile browser engines (e.g. WebKit, which is the basis for Safari and Chrome on the desktop and iOS, Android and webOS on mobile devices) have added native support for the advanced features whose absence created a demand for Flash.

Unsurprisingly, not only is Flash in rapid decline — as evidenced by Adobe recently pulling out of the mobile market (and thus 3 in 4 devices) — but it would be virtually impossible for any competitor to reach its level of penetration. As such, Silverlight had (from the outset) a snowflake’s chance in hell of achieving an “acceptable” level of penetration.

What’s an “acceptable level of penetration” you ask? That’s quite simple — it’s the ratio of customers that businesses are prepared to turn away in order to access “advanced” functionality that is now natively supported in most browsers. At Adobe’s claimed 99% penetration you’re turning away 1 in 100 customers. At 90% you’re turning away 1 in 10. According to http://riastats.com, if you’re deploying a Flash site down under then you’re going to be turning away 13%, or a bit more than 1 in 8. For Silverlight it’s even worse — almost half of your customers won’t even get to see your site without having to install a plugin (which they are increasingly less likely to do).

How much revenue can your business tolerate losing? 1%? 10%? 50%? And for what benefit?

A word on the future of Europe (without the United Kingdom)

It’s rare that I rant about politics but given the train wreck that we’ve woken up to here in Europe I thought I’d make the exception as this is important for all of us — both here in the 27 member European Union (technically while part of Europe, Switzerland’s not part of the European Union nor the 17 member Eurozone as it has its own currency, but we’re landlocked by it and affected by its instability) as well as abroad, including the United States.I’m no expert on European politics, but having been a resident of the region for almost a decade now and lived and/or worked in three member states (in addition to Switzerland) I have the unusual advantage of having seen it from many angles:

  • From Ireland, which has been (and is to this day) a benefactor of the union by way of support for its relatively small economy and its inexplicably generous 12.5% corporate tax rate.
  • From France, which along with Germany is one of the powerhouses of the European economy with the most to lose if things go awry.
  • From Switzerland, which is an independent, neutral country that happens to be in the center of Europe and only recently joined the Schengen Agreement (relaxing its borders with France, Germany, Austria and Italy).
  • From the United Kingdom, which is a member state outside of the Eurozone with its own currency (British Pounds) that is isolated from the mainland by sea and apparently sees this as a reason to get special treatment.

The United Kingdom is a large and important economy in the zone, but even down to the grassroots level they see themselves as independent and assess every single decision solely on the basis of what it will do for them — there are regularly mini scandals in the papers about their relationship with their fellow Europeans (who are typically seen to be somehow benefiting at their expense). This shortsighted tweet captures the sentiment nicely:

As a prime example, the Common Agricultural Policy which is designed “to provide farmers with a reasonable standard of living, consumers with quality food at fair prices and to preserve rural heritage“, tends to redistribute funds from more urbanised countries like the Netherlands and the United Kingdom to those where agriculture actually takes place. It’s an important (albeit changing) function and it commands almost half of the EU’s budget.

Another example of unnecessary friction is their [self-]exclusion from the Schengen Agreement, which creates a borderless area within Europe, thus facilitating transport and commerce. You still have to pass border control when you enter or leave the Schengen area, including when traveling to/from the Common Travel Area (consisting only of the United Kingdom and Ireland, which are connected on the island of Ireland by the border between the Republic of Ireland and Northern Ireland), but you can travel freely within it once you’re there and there are visas which cover the entire region.

Cutting to the chase, it is of no surprise then that the brits would be stubborn when it came to changing the treaty by unanimous vote — indeed I’ve been predicting that for a while and was certain it would happen a few days ago. What is a surprise though is just how belligerent and childish they’ve been about it — as a frenchman said in reference to the following video from The Telegraph’s excellent article EU suffers worst split in history as David Cameron blocks treaty change:

Another user tweeted:

Others agreed:

And:

I think Simon Wardley sums it up nicely though:

From my point of view the brits are [allowing their representatives to get away with] acting like petulant children, benefiting from the European Union when it suits them, and taking their toys home when it doesn’t. Their argument that the very establishment that got us into this mess must absolutely be protected above all else is weak — and that it is in the interests of the city, let alone the entire country, deceptive.

They “very doggedly” (their words) sought “a ‘protocol’ giving the City of London protection from a wave of EU financial service regulations related to the eurozone crisis”. That’s right, they didn’t want to play by the same rules as everyone else, and exercised their veto when it became apparent that was the only option.

To add insult to injury, they “warned the new bloc that it would not be able to use the resources of the EU, raising real doubts as to whether the eurozone would be able to enforce fiscal rules in order to calm the markets”. So not only are they going to not participate in cleaning up the mess they played a key role in creating, but they’re going to do their best to make sure nobody else can either.

Fortunately there’s light at the end of the tunnel: “Cameron was clumsy in his manoeuvring,” a senior EU diplomat said. “It may be possible that Britain will shift its position in the days ahead if it discovers that isolation really is not a viable course of action.” Please take a moment today to express your discontent with this decision as sometimes in order to serve your own interests you also need to consider those of others — in much the same way as the tragedy of the commons (where in this case the commons is the European and global markets).

Update: Another great [opinion] piece from The Telegraph: Cameron: the bulldog has no teeth:

Cameron (and Britain) are now in a no-win situation. If the eurozone countries start to rally, then we shall be isolated from the new bloc and stuck in the slow lane of Europe. Should the euro problems deepen, then we shall bear the consequences in full. As George Osborne has indicated, a disorderly collapse of the euro would drag a voiceless Britain into depression.

In France and Germany, Cameron will be blamed for exacerbating a crisis by leaders who will brand him the pariah of Europe. Overnight, Britain has changed from a major player to an isolated outpost which, if this goes on, will become about as significant on the global stage as the Isle of Mull. Churchill would be turning in his grave.

Related:

Infographic: Diffusion of Social Networks — Facebook, Twitter, LinkedIn and Google+

Social networking market

They say a picture’s worth a thousand words and much digital ink has been spilled recently on impressive sounding (yet relatively unimpressive) user counts, so here’s an infographic showing the diffusion of social networks as at last month to put things in perspective.

There are 7 billion people on the planet, of which 2 billion are on the Internet. Given Facebook are now starting to make inroads into the laggards (e.g. parents/grandparents) with 800 million active users already under their belt, I’ve assumed that the total addressable market (TAM) for social media (that is, those likely to use it in the short-medium term) is around a billion Internet users (i.e. half) and growing — both with the growth of the Internet and as growing fraction of Internet users. That gives social media market shares of 80% for Facebook, 20% for Twitter and <5% for Google+. In other words, Twitter is 5x the size of Google+ and Facebook is 4x the size of Twitter (e.g. 20x the size of Google+).It’s important to note that while some report active users, Google report total (e.g. best case) users — only a percentage of the total users are active at any one time. I’m also hesitant to make direct comparisons with LinkedIn as while everyone is potentially interested in Facebook, Twitter and Google+, the total addressable market for a professional network is limited, by definition, to professionals — I would say around 200 million and growing fast given the penetration I see in my own professional network. This puts them in a similar position to Facebook in this space — up in the top right chasing after the laggards rather than the bottom left facing the chasm.

Diffusion of innovations

The graph shows Rogers‘ theory on the diffusion of innovations, documented in The Innovator’s Dilemma, where diffusion is the process by which an innovation is communicated through certain channels over time among the members of a social system.

There are 5 stages:

  1. Knowledge is when people are aware of the innovation but don’t know (and don’t care) about it.
  2. Persuasion is when people are interested in learning more.
  3. Decision is when people decide to accept or reject it.
  4. Implementation is when people employ it to some degree for testing (e.g. create an account).
  5. Confirmation is when people finally decide to use it, possibly to its full potential.

I would suggest that the majority of the total addressable market are at stage 1 or 2 for Google+ and Twitter, and stage 4 or 5 for Facebook and LinkedIn (with its smaller TAM). Of note, users’ decisions to reject an innovation at the decision or implementation phase may be semi-permanent — to quote Slate magazine’s Google+ is Dead article, “by failing to offer people a reason to keep coming back to the site every day, Google+ made a bad first impression. And in the social-networking business, a bad first impression spells death.” The same could be said for many users of Twitter, who sign up but fail to engage sufficiently to realise its true value. Facebook, on the other hand, often exhibits users who leave only to subsequently return due to network effects.

Social networking is also arguably a natural monopoly given, among other things, dramatically higher acquisition costs once users’ changing needs have been satisfied by the first mover (e.g. Facebook). Humans have been using social networking forever, only until recently it’s been manual and physiologically limited to around 150 connections (Dunbar’s number, named after British anthropologist Robin Dunbar). With the advent of technology that could displace traditional systems like business cards and rolodexes came a new demand for pushing the limits for personal and professional reasons — I use Facebook and LinkedIn extensively to push Dunbar’s number out an order of magnitude to ~1,500 contacts for example, and Twitter to make new contacts and communicate with thousands of people. I don’t want to maintain 4 different social networks any more than I want to have to search 4 different directories to find a phone number — I already have 3 which is 2 too many!

Rogers’ 5 factors

How far an innovation ultimately progresses depends on 5 factors:

  1. Relative Advantage — Does it improve substantially on the status quo (e.g. Facebook)?
  2. Compatibility — Can it be easily assimilated into an individual’s life?
  3. Simplicity or Complexity — Is it too complex for your average user?
  4. Trialability — How easy is it to experiment?
  5. Observability — To what extent is it visible to others (e.g. for viral adoption)

Facebook, which started as a closed community at Harvard and other colleges and grew from there, obviously offered significant relative advantage over MySpace. I was in California at the time and it seemed like everyone had a MySpace page while only students (and a few of us in local/company networks) had Facebook. It took off like wildfire when they solved the trialability problem by opening the floodgates and a critical mass of users was quickly drawn in due to the observability of viral email notifications, the simplicity of getting up and running and the compatibility with users’ lives (features incompatible with the unwashed masses — such as the egregiously abused “how we met” form — are long gone and complex lists/groups are there for those who need them but invisible to those who don’t). Twitter is also trivial to get started but can be difficult to extract value from initially.

Network models

Conversely, the complexity of getting started on Google+ presents a huge barrier to entry and as a result we may see the circles interface buried in favour of a flat “follower” default like that of Twitter (the “suggested user list” has already appeared), or automated. Just because our real-life social networks are complex and dynamic does not imply that your average user is willing to invest time and energy in maintaining a complex and dynamic digital model. The process of sifting through and categorising friends into circles has been likened to the arduous process of arranging tables for a wedding and for the overwhelming majority of users it simply does not offer a return on investment:

In reality we’re most comfortable with concentric rings, which Facebook’s hybrid model recently introduced by way of “Close Friends”, “Acquaintances” and “Restricted” lists (as well as automatically maintained lists for locations and workplaces — a feature I hope gets extended to other attributes). By default Facebook is simple/flat — mutual/confirmed/2-way connections are “Friends” (though they now also support 1-way follower/subscriber relationships ala Twitter). Concentric rings then offer a greater degree of flexibility for more advanced users and the most demanding users can still model arbitrarily complex networks using lists:

In any case, if you give users the ability to restrict sharing you run the risk of their actually using it, which is a sure-fire way to kill off your social network — after all, much of the value derived from networks like Facebook is from “harmless voyeurism”. That’s why Google+ is worse than a ghost town for many users (including myself, though as a Google Apps users I was excluded from the landrush phase) while being too noisy for others. Furthermore, while Facebook and Twitter have a subscribe/follow (“pull”) model which allows users to be selective of what they hear, when a publisher shares content with circles on Google+ other users are explicitly notified (“push”) — this is important for “observability” but can be annoying for users.

Nymwars

The requirement to provide and/or share your real name, sex, date of birth and a photo also presents a compatibility problem with many users’ expectations of privacy and security, as evidenced by the resulting protests over valid use cases for anonymity and pseudonymity. For something that was accepted largely without question with Facebook, the nymwars appear to have caused irreparable harm to Google+ in the critically important innovator and early adopter segments, for reasons that are not entirely clear to me. I presume that there is a greater expectation of privacy for Google (to whom people entrust private emails, documents, etc.) than for Facebook (which people use specifically and solely for controlled sharing).

Adopter categories

Finally, there are 5 classes of adopters (along the X axis) varying over time as the innovation attains deeper penetration:

  1. Innovators (the first 2.5%) are generally young, social, wealthy, risk tolerant individuals who adopt first.
  2. Early Adopters (the next 13.5%) are opinion leaders who adopt early enough (but not too early) to maintain a central communication position.
  3. Early Majority (the next 34%, to 50% of the population) take significantly longer to adopt innovations.
  4. Late Majority (the next 34%) adopt innovations after the average member of society and tend to be highly sceptical.
  5. Laggards (the last 16%) show little to no opinion leadership and tend to be older, more reclusive and have an aversion to change-agents.

I’ve ruled out wealth because while buying an iPhone is expensive (and thus a barrier to entry), signing up for a social network is free.The peak of the bell curve is the point at with the average user (e.g. 50% of the market) has adopted the technology, and it is very difficult both to climb the curve as a new technology and to displace an existing technology that is over the hump.

The Chasm

The chasm (which exists between Early Adopters and Early Majority i.e. at 16% penetration), refers to Moore‘s argument in Crossing the Chasm that there is a gap between early adopters and the mass market which must be crossed by any innovation which is to be successful. Furthermore, thanks to accelerating technological change they must do so within an increasingly limited time for fear of being equaled by an incumbent or disrupted by another innovation. The needs of the mass market differ — often wildly — from the needs of early adopters and innovations typically need to adapt quickly to make the transition. I would argue that MySpace, having achieved ~75 million users at peak, failed to cross the chasm by finding appeal in the mass market (ironically due in no small part to their unfettered flexibility in customising profiles) and was disrupted by Facebook. Twitter on the other hand (with some 200 million active users) has crossed the chasm, as evidenced by the presence of mainstream icons like BieberSpears and Obama as well as their fans. LinkedIn (for reasons explained above) belongs at the top right rather than the bottom left.

Disruptive innovations

The big question today is whether Google+ can cross the chasm too and give Facebook a run for its money. Facebook, having achieved “new-market disruption” with almost a decade head start in refining the service with a largely captive audience, now exhibits extremely strong network effects. It would almost certainly take another disruptive innovation to displace them (that is, according to Clayton Christensen, one that develops in an emerging market and creates a new market and value network before going on to disrupt existing markets and value networks), in the same way that Google previously disrupted the existing search market a decade ago.

In observing that creating a link to a site is essentially a vote for that site (“PageRank”), Google implemented a higher quality search engine that was more efficient, more scalable and less susceptible to spam. In the beginning Backrub Google was nothing special and the incumbents (remember Altavista?) were continuously evolving — they had little to fear from Google and Google had little to fear from them as it simply wasn’t worth their while chasing after potentially disruptive innovations like Backrub. They were so disinterested in fact that Yahoo! missed an opportunity to acquire Google for $3bn in the early days. Like most disruptive technologies, PageRank was technologically straightforward and far simpler than trying to determine relevance from the content itself. It was also built on a revolutionary hardware and software platform that scaled out rather than up, distributing work between many commodity PCs, thus reducing costs and causing “low-end disruption”. Its initial applications were trivial, but it quickly outpaced the sustaining innovation of the incumbents and took the lead, which it has held ever since:

Today Facebook is looking increasingly disruptive too, only in their world it’s no longer about links between pages, but links between people (which are arguably far more valuable). Last year while working at Google I actively advocated the development of a “PageRank for people” (which I referred to as “PeopleRank” or “SocialRank”), whereby a connection to a person was effectively a vote for that person and the weight of that vote would depend on the person’s influence in the community, in the same way that a link from microsoft.com is worth more than one from viagra.tld (which could actually have negative value in the same way that hanging out with the wrong crowd negatively affects reputation). I’d previously built what I’d call a “social metanetwork” named “meshed” (which never saw the light of day due to cloud-related commitments) and the idea stemmed from that, but I was busy running tape backups for Google, not building social networks on the Emerald Sea team.

With the wealth of information Google has at its fingertips — including what amounts to a pen trace of users’ e-mail and (courtesy Android and Google Voice) phone calls and text messages — it should have been possible for them to completely automate the process of circle creation, in the same way that LinkedIn Maps can identify clusters of contacts. But they didn’t (perhaps because they got it badly wrong with Buzz), and they’re now on the sustaining innovation treadmill with otherwise revolutionary differentiating features being quickly co-opted by Facebook (circles vs lists, hangouts vs Skype, etc).

Another factor to consider is that Google have a massive base of existing users in a number of markets that they can push Google+ to, and they’re not afraid to do so (as evidenced by its appearance in other products and services including AndroidAdWords, BloggerChrome, Picasa, MapsNewsReader, TalkYouTube and of course the ubiquitous sandbar and gratuitous blue arrow which appeared on Google Search). This strategy is not without risk though as if successful it will almost certainly attract further antitrust scrutiny, in the same way that Microsoft found itself in hot water for what was essentially putting an IE icon on the desktop. Indeed I had advocated the deployment of Google+ as a “social layer” rather than isolated product (ala the defunct Google Buzz), but stopped short of promoting an integrated product to rival Facebook — if only to maintain a separation of duties between content production/hosting and discovery.

The Solution

While I’m happy to see some healthy competition in the space, I’d rather not see any of the social networks “win” as if any one of them were able to cement a monopoly then us users would ultimately suffer. At the end of the day we need to remember that for any commercial social network we’re not the customer, we’re the product being sold:

As such, I strongly advocate the adoption of open standards for social networking, whereby users select a service or host a product that is most suitable for their specific needs (e.g. personal, professional, branding, etc) which is interoperable with other, similar products.

What we’re seeing today is similar to the early days of Internet email, where the Simple Mail Transfer Protocol (SMTP) broke down the barriers between different silos — what we need is an SMTP for social networking.

References:

Sources:

  • Facebook: 800 million users (active) [source]
  • Twitter: 200 million users (active) [source]
  • LinkedIn: 135 million users (total) [source]
  • MySpace: 75.9 million users (peak) [source]
  • Google+: 40 million users (total) [source]