Leaving Google+

Ironically many Google employees have even given up on Google+
(though plenty still post annoying “Moved to Google+” profile pics on other social networks)

One of those sneaky tweets that links to Google+ just tricked me into wading back into the swamp that it’s become, hopefully for the last time (I say “hopefully” because in all likelihood I’ll be forced back onto it at some point — it’s already apparently impossible to create a Google Account for any Google services without also landing yourself a Google+ profile and Gmail account and it’s very likely that the constant prompting for me to “upgrade” to Google+ will be more annoying than the infamous red notification box). Here’s what I saw in my stream:

  • 20 x quotes/quotepics/comics
  • 8 x irrelevant news articles & opeds
  • 1 x PHP code snippet
  • 3 x blatant ads
  • 2 x Google+ fanboi posts (including this little chestnut: “Saying nobody uses Google+ is like a virgin saying sex is boring. They’ve never actually tried it.” — you just failed at life by comparing Google+ to sex my friend).
  • 2 x random photos

That’s pretty much 0% signal and 100% noise, and before you jump down my throat about who I’m following, it’s a few hundred generally intelligent people (though I note it is convenient that the prevalent defense for Google+ being a ghost town, or worse, a cesspool, is that your experience depends not only on who you’re following, but what they choose to share with you — reminds me of the kind of argument you regularly hear from religious apologists).

Google+ Hangouts

My main gripe with Google+ this week though was the complete failure of Google+ Hangouts (which should arguably be an entirely separate product) for Rishidot Research‘s Open Conversations: Cloud Transparency on Monday. The irony of holding an open/transparency discussion on a close platform aside, we were plagued with technical problems from the outset. First it couldn’t find my MacBook Air’s camera so I had to move from my laptop to my iMac (which called for heavy furniture to be moved to get a clean background). When I joined we started immediately (albeit late, and sans 2-3 of the half dozen attendees), but it wasn’t long before one of the missing attendees joined and repeatedly interrupted the first half of the meeting with audio problems. The final attendee never managed to join, though their name and a blank screen appeared each of the 5-10 times they tried. We then inexplicably lost two attendees, and by the time they managed to re-join I too got a “Network failure for media packets” error:

Then there was “trouble connecting with the plugin”, which called for me to refresh the page and then reinstall the plugin:

Eventually I made it back in, only to discover that we had now lost the host(!?!) and before long it was down to just me and one other attendee. We struggled through the last half of the hour but it was only afterwards that we discovered we were talking to ourselves because the live YouTube stream and recording stopped when the host was kicked out. Needless to say, Google+ Hangouts are not ready for the prime time, and if you invite me to join one then don’t be surprised if I refer you to this article.

Hotel California

To leave Google+ head over to Google Takeout and download your Circles (I grabbed data for other services too for good measure, and exported this blog separately since my profile is now Google+ integrated). You might want to see who’s following you, Actions->Select All and dump them into a circle first, otherwise you’ll probably lose that information when you close your account.

When you go to the Google+ “downgrade” page and select “Delete your entire Google profile” you’ll get a sufficiently complicated warning as to scare most people back into submission, but the most concerning part for me was this unhelpful help advising “Other Google products which require a profile will be impacted“:

Fortunately for YouTube and Blogger at least you can check and revert your decision to use a Google+ profile respectively, but you’ll immediately be told to “Connect to Google+” once you unplug:

After that it’s just a case of checking “I understand that deleting this service can’t be undone and the data I delete can’t be restored.” and clicking “Remove selected services” (what “selected services”? I just want to be rid of Google+!). I’ll let you know how that goes once my friends on Google+ have had a chance to read this.

Simplifying cloud: Reliability

The original Google server rack

Reliability in cloud computing is a very simple concept which I’ve explained in many presentations but never actually documented:

Traditional legacy IT systems consist of relatively unreliable software (Microsoft Exchange, Lotus Notes, Oracle, etc.) running on relatively reliable hardware (Dell, HP, IBM servers, Cisco networking, etc.). Unreliable software is not designed for failure and thus any fluctuations in the underlying hardware platform (including power and cooling) typically result in partial or system-wide outages. In order to deliver reliable service using unreliable software you need to use reliable hardware, typically employing lots of redundancy (dual power supplies, dual NICs, RAID arrays, etc.). In summary:

unreliable software
reliable hardware

Cloud computing platforms typically prefer to build reliability into the software such that it can run on cheap commodity hardware. The software is designed for failure and assumes that components will misbehave or go away from time to time (which will always be the case, regardless of how much you spend on reliability – the more you spend the lower the chance but it will never be zero). Reliability is typically delivered by replication, often in the background (so as not to impair performance). Multiple copies of data are maintained such that if you lose any individual machine the system continues to function (in the same way that if you lose a disk in a RAID array the service is uninterrupted). Large scale services will ideally also replicate data in multiple locations, such that if a rack, row of racks or even an entire datacenter were to fail then the service would still be uninterrupted. In summary:

reliable software
unreliable hardware

Asked for a quote for Joe Weinman’s upcoming Cloudonomics: The Business Value of Cloud Computing book, I said:

The marginal cost of reliable hardware is linear while the marginal cost of reliable software is zero.

That is to say, once you’ve written reliability into your software you can scale out with cheap hardware without spending more on reliability per unit, while if you’re using reliable hardware then each unit needs to include reliability (typically in the form of redundant components), which quickly gets very expensive.
The other two permutations are ineffective:

Unreliable software on unreliable hardware gives an unreliable system. That’s why you should never try to install unreliable software like Microsoft Exchange, Lotus Notes, Oracle etc. onto unreliable hardware like Amazon EC2:

unreliable software
unreliable hardware

Finally, reliable software on reliable hardware gives a reliable but inefficient and expensive system. That’s why you’re unlikely to see reliable software like Cassandra running on reliable platforms like VMware with brand name hardware:

reliable software
reliable hardware

Google enjoyed a significant competitive advantage for many years by using commodity components with a revolutionary proprietary software stack including components like the distributed Google File System (GFS). You can still see Google’s original hand-made racks built with motherboards laid on cork board at their Mountain View campus and the computer museum (per image above), but today’s machines are custom made by ODMs and are a lot more advanced. Meanwhile Facebook have decided to focus on their core competency (social networking) and are actively commoditising “unreliable” web scale hardware (by way of the Open Compute Project) and software (by way of software releases, most notably the Cassandra distributed database which is now used by services like Netflix).

The challenge for enterprises today is to adopt cheap reliable software so as to enable the transition away from expensive reliable hardware. That’s easier said than done, but my advice to them is to treat this new technology as another tool in the toolbox and use the right tool for the job. Set up cloud computing platforms like Cassandra and OpenStack and look for “low-hanging fruit” to migrate first, then deal with the reticent applications once the “center of gravity” of your information technology systems has moved to cloud computing architectures.

P.S. Before the server huggers get all pissy about my using the term “relatively unreliable software”, this is a perfectly valid way of achieving a reliable system — just not a cost effective one now “relatively reliable software” is here.

Infographic: Diffusion of Social Networks — Facebook, Twitter, LinkedIn and Google+

Social networking market

They say a picture’s worth a thousand words and much digital ink has been spilled recently on impressive sounding (yet relatively unimpressive) user counts, so here’s an infographic showing the diffusion of social networks as at last month to put things in perspective.

There are 7 billion people on the planet, of which 2 billion are on the Internet. Given Facebook are now starting to make inroads into the laggards (e.g. parents/grandparents) with 800 million active users already under their belt, I’ve assumed that the total addressable market (TAM) for social media (that is, those likely to use it in the short-medium term) is around a billion Internet users (i.e. half) and growing — both with the growth of the Internet and as growing fraction of Internet users. That gives social media market shares of 80% for Facebook, 20% for Twitter and <5% for Google+. In other words, Twitter is 5x the size of Google+ and Facebook is 4x the size of Twitter (e.g. 20x the size of Google+).It’s important to note that while some report active users, Google report total (e.g. best case) users — only a percentage of the total users are active at any one time. I’m also hesitant to make direct comparisons with LinkedIn as while everyone is potentially interested in Facebook, Twitter and Google+, the total addressable market for a professional network is limited, by definition, to professionals — I would say around 200 million and growing fast given the penetration I see in my own professional network. This puts them in a similar position to Facebook in this space — up in the top right chasing after the laggards rather than the bottom left facing the chasm.

Diffusion of innovations

The graph shows Rogers‘ theory on the diffusion of innovations, documented in The Innovator’s Dilemma, where diffusion is the process by which an innovation is communicated through certain channels over time among the members of a social system.

There are 5 stages:

  1. Knowledge is when people are aware of the innovation but don’t know (and don’t care) about it.
  2. Persuasion is when people are interested in learning more.
  3. Decision is when people decide to accept or reject it.
  4. Implementation is when people employ it to some degree for testing (e.g. create an account).
  5. Confirmation is when people finally decide to use it, possibly to its full potential.

I would suggest that the majority of the total addressable market are at stage 1 or 2 for Google+ and Twitter, and stage 4 or 5 for Facebook and LinkedIn (with its smaller TAM). Of note, users’ decisions to reject an innovation at the decision or implementation phase may be semi-permanent — to quote Slate magazine’s Google+ is Dead article, “by failing to offer people a reason to keep coming back to the site every day, Google+ made a bad first impression. And in the social-networking business, a bad first impression spells death.” The same could be said for many users of Twitter, who sign up but fail to engage sufficiently to realise its true value. Facebook, on the other hand, often exhibits users who leave only to subsequently return due to network effects.

Social networking is also arguably a natural monopoly given, among other things, dramatically higher acquisition costs once users’ changing needs have been satisfied by the first mover (e.g. Facebook). Humans have been using social networking forever, only until recently it’s been manual and physiologically limited to around 150 connections (Dunbar’s number, named after British anthropologist Robin Dunbar). With the advent of technology that could displace traditional systems like business cards and rolodexes came a new demand for pushing the limits for personal and professional reasons — I use Facebook and LinkedIn extensively to push Dunbar’s number out an order of magnitude to ~1,500 contacts for example, and Twitter to make new contacts and communicate with thousands of people. I don’t want to maintain 4 different social networks any more than I want to have to search 4 different directories to find a phone number — I already have 3 which is 2 too many!

Rogers’ 5 factors

How far an innovation ultimately progresses depends on 5 factors:

  1. Relative Advantage — Does it improve substantially on the status quo (e.g. Facebook)?
  2. Compatibility — Can it be easily assimilated into an individual’s life?
  3. Simplicity or Complexity — Is it too complex for your average user?
  4. Trialability — How easy is it to experiment?
  5. Observability — To what extent is it visible to others (e.g. for viral adoption)

Facebook, which started as a closed community at Harvard and other colleges and grew from there, obviously offered significant relative advantage over MySpace. I was in California at the time and it seemed like everyone had a MySpace page while only students (and a few of us in local/company networks) had Facebook. It took off like wildfire when they solved the trialability problem by opening the floodgates and a critical mass of users was quickly drawn in due to the observability of viral email notifications, the simplicity of getting up and running and the compatibility with users’ lives (features incompatible with the unwashed masses — such as the egregiously abused “how we met” form — are long gone and complex lists/groups are there for those who need them but invisible to those who don’t). Twitter is also trivial to get started but can be difficult to extract value from initially.

Network models

Conversely, the complexity of getting started on Google+ presents a huge barrier to entry and as a result we may see the circles interface buried in favour of a flat “follower” default like that of Twitter (the “suggested user list” has already appeared), or automated. Just because our real-life social networks are complex and dynamic does not imply that your average user is willing to invest time and energy in maintaining a complex and dynamic digital model. The process of sifting through and categorising friends into circles has been likened to the arduous process of arranging tables for a wedding and for the overwhelming majority of users it simply does not offer a return on investment:

In reality we’re most comfortable with concentric rings, which Facebook’s hybrid model recently introduced by way of “Close Friends”, “Acquaintances” and “Restricted” lists (as well as automatically maintained lists for locations and workplaces — a feature I hope gets extended to other attributes). By default Facebook is simple/flat — mutual/confirmed/2-way connections are “Friends” (though they now also support 1-way follower/subscriber relationships ala Twitter). Concentric rings then offer a greater degree of flexibility for more advanced users and the most demanding users can still model arbitrarily complex networks using lists:

In any case, if you give users the ability to restrict sharing you run the risk of their actually using it, which is a sure-fire way to kill off your social network — after all, much of the value derived from networks like Facebook is from “harmless voyeurism”. That’s why Google+ is worse than a ghost town for many users (including myself, though as a Google Apps users I was excluded from the landrush phase) while being too noisy for others. Furthermore, while Facebook and Twitter have a subscribe/follow (“pull”) model which allows users to be selective of what they hear, when a publisher shares content with circles on Google+ other users are explicitly notified (“push”) — this is important for “observability” but can be annoying for users.


The requirement to provide and/or share your real name, sex, date of birth and a photo also presents a compatibility problem with many users’ expectations of privacy and security, as evidenced by the resulting protests over valid use cases for anonymity and pseudonymity. For something that was accepted largely without question with Facebook, the nymwars appear to have caused irreparable harm to Google+ in the critically important innovator and early adopter segments, for reasons that are not entirely clear to me. I presume that there is a greater expectation of privacy for Google (to whom people entrust private emails, documents, etc.) than for Facebook (which people use specifically and solely for controlled sharing).

Adopter categories

Finally, there are 5 classes of adopters (along the X axis) varying over time as the innovation attains deeper penetration:

  1. Innovators (the first 2.5%) are generally young, social, wealthy, risk tolerant individuals who adopt first.
  2. Early Adopters (the next 13.5%) are opinion leaders who adopt early enough (but not too early) to maintain a central communication position.
  3. Early Majority (the next 34%, to 50% of the population) take significantly longer to adopt innovations.
  4. Late Majority (the next 34%) adopt innovations after the average member of society and tend to be highly sceptical.
  5. Laggards (the last 16%) show little to no opinion leadership and tend to be older, more reclusive and have an aversion to change-agents.

I’ve ruled out wealth because while buying an iPhone is expensive (and thus a barrier to entry), signing up for a social network is free.The peak of the bell curve is the point at with the average user (e.g. 50% of the market) has adopted the technology, and it is very difficult both to climb the curve as a new technology and to displace an existing technology that is over the hump.

The Chasm

The chasm (which exists between Early Adopters and Early Majority i.e. at 16% penetration), refers to Moore‘s argument in Crossing the Chasm that there is a gap between early adopters and the mass market which must be crossed by any innovation which is to be successful. Furthermore, thanks to accelerating technological change they must do so within an increasingly limited time for fear of being equaled by an incumbent or disrupted by another innovation. The needs of the mass market differ — often wildly — from the needs of early adopters and innovations typically need to adapt quickly to make the transition. I would argue that MySpace, having achieved ~75 million users at peak, failed to cross the chasm by finding appeal in the mass market (ironically due in no small part to their unfettered flexibility in customising profiles) and was disrupted by Facebook. Twitter on the other hand (with some 200 million active users) has crossed the chasm, as evidenced by the presence of mainstream icons like BieberSpears and Obama as well as their fans. LinkedIn (for reasons explained above) belongs at the top right rather than the bottom left.

Disruptive innovations

The big question today is whether Google+ can cross the chasm too and give Facebook a run for its money. Facebook, having achieved “new-market disruption” with almost a decade head start in refining the service with a largely captive audience, now exhibits extremely strong network effects. It would almost certainly take another disruptive innovation to displace them (that is, according to Clayton Christensen, one that develops in an emerging market and creates a new market and value network before going on to disrupt existing markets and value networks), in the same way that Google previously disrupted the existing search market a decade ago.

In observing that creating a link to a site is essentially a vote for that site (“PageRank”), Google implemented a higher quality search engine that was more efficient, more scalable and less susceptible to spam. In the beginning Backrub Google was nothing special and the incumbents (remember Altavista?) were continuously evolving — they had little to fear from Google and Google had little to fear from them as it simply wasn’t worth their while chasing after potentially disruptive innovations like Backrub. They were so disinterested in fact that Yahoo! missed an opportunity to acquire Google for $3bn in the early days. Like most disruptive technologies, PageRank was technologically straightforward and far simpler than trying to determine relevance from the content itself. It was also built on a revolutionary hardware and software platform that scaled out rather than up, distributing work between many commodity PCs, thus reducing costs and causing “low-end disruption”. Its initial applications were trivial, but it quickly outpaced the sustaining innovation of the incumbents and took the lead, which it has held ever since:

Today Facebook is looking increasingly disruptive too, only in their world it’s no longer about links between pages, but links between people (which are arguably far more valuable). Last year while working at Google I actively advocated the development of a “PageRank for people” (which I referred to as “PeopleRank” or “SocialRank”), whereby a connection to a person was effectively a vote for that person and the weight of that vote would depend on the person’s influence in the community, in the same way that a link from microsoft.com is worth more than one from viagra.tld (which could actually have negative value in the same way that hanging out with the wrong crowd negatively affects reputation). I’d previously built what I’d call a “social metanetwork” named “meshed” (which never saw the light of day due to cloud-related commitments) and the idea stemmed from that, but I was busy running tape backups for Google, not building social networks on the Emerald Sea team.

With the wealth of information Google has at its fingertips — including what amounts to a pen trace of users’ e-mail and (courtesy Android and Google Voice) phone calls and text messages — it should have been possible for them to completely automate the process of circle creation, in the same way that LinkedIn Maps can identify clusters of contacts. But they didn’t (perhaps because they got it badly wrong with Buzz), and they’re now on the sustaining innovation treadmill with otherwise revolutionary differentiating features being quickly co-opted by Facebook (circles vs lists, hangouts vs Skype, etc).

Another factor to consider is that Google have a massive base of existing users in a number of markets that they can push Google+ to, and they’re not afraid to do so (as evidenced by its appearance in other products and services including AndroidAdWords, BloggerChrome, Picasa, MapsNewsReader, TalkYouTube and of course the ubiquitous sandbar and gratuitous blue arrow which appeared on Google Search). This strategy is not without risk though as if successful it will almost certainly attract further antitrust scrutiny, in the same way that Microsoft found itself in hot water for what was essentially putting an IE icon on the desktop. Indeed I had advocated the deployment of Google+ as a “social layer” rather than isolated product (ala the defunct Google Buzz), but stopped short of promoting an integrated product to rival Facebook — if only to maintain a separation of duties between content production/hosting and discovery.

The Solution

While I’m happy to see some healthy competition in the space, I’d rather not see any of the social networks “win” as if any one of them were able to cement a monopoly then us users would ultimately suffer. At the end of the day we need to remember that for any commercial social network we’re not the customer, we’re the product being sold:

As such, I strongly advocate the adoption of open standards for social networking, whereby users select a service or host a product that is most suitable for their specific needs (e.g. personal, professional, branding, etc) which is interoperable with other, similar products.

What we’re seeing today is similar to the early days of Internet email, where the Simple Mail Transfer Protocol (SMTP) broke down the barriers between different silos — what we need is an SMTP for social networking.



  • Facebook: 800 million users (active) [source]
  • Twitter: 200 million users (active) [source]
  • LinkedIn: 135 million users (total) [source]
  • MySpace: 75.9 million users (peak) [source]
  • Google+: 40 million users (total) [source]

Bragging rights: Valeo’s 30,000 Google Apps users announced

It’s been a long time in coming but I can finally tell you all about what originally brought me to France. Back in 2007 as a strategic consultant I designed, delivered and demonstrated a proof of concept of a complete cloud computing user environment (before it was even called cloud computing) to Valeo in a competitive tender, before handing over to CapGemini for deployment later that year.

What’s particularly noteworthy (aside from the sheer scale) is that while many cloud computing deployments are tactical, with a view to reaching a specific goal (e.g. mail scanning, web security, shared calendaring, video hosting, etc.), this one was a high level strategy to replace as much of the existing infrastructure as possible. I also installed three Google Search Appliances as part of the solution and integrated same with a complex Active Directory and Lotus Notes infrastructure.

Granted this hasn’t been a big secret for a while now but it’s the first time the full details have officially emerged. Sergey Brin first bragged about it on Google’s Q2 earnings call last year while talking about Google Apps’ successes:

Just to give you color on what some of these businesses include, in this past quarter Valeo, one of the world’s leading automotive suppliers now has 32,000 users using Google Apps, including of course Gmail, Calendar, Docs and so forth.

Congratulations to everyone at CapGemini, Google and of course Valeo for making this a success.

Valeo launches an innovative initiative with Google to reduce administrative expenses

Wednesday, 13 May 2009

Valeo today announced that the Group’s 30,000 Internet-connected employees now have access to a new communication and collaborative working platform based on Google Apps Premier Edition and supported by Capgemini.

The progressive roll-out of the new system is giving employees access to a suite of online products which will increase administrative efficiency and improve collaboration between the 193 Valeo entities in 27 countries.

“We were searching for an innovative way to reduce significantly our office infrastructure costs while simultaneously improving user collaboration and productivity,” said André Gold, Valeo’s Technical Senior Vice-President. “Our pilot projects demonstrate that this target is achievable.”

Valeo is deploying Google Apps, supported by Google’s partner Capgemini, in a phased approach throughout 2009. As a first step, users are being given access to Google sites, on-line documents, video management and instant messaging, including voice and video chat, in order to improve teamwork. The new system will then offer applications to further enhance the company’s efficiency, such as an Enterprise directory and workflow tools to automate administrative processes. In the final stage, users will benefit from Google mail, calendar, search and on-line translation solutions to reinforce personal efficiency. They will be able to access the applications from a desktop, laptop or other mobile device.

“The cost savings and innovation made possible by cloud computing help businesses better respond to a global and mobile workforce – especially in today’s difficult economic environment,” said Dave Girouard, President, Google Enterprise. “We’re thrilled Valeo has selected Google.”

Valeo is an independent industrial Group fully focused on the design, production and sale of components, integrated systems and modules for cars and trucks. Valeo ranks among the world’s top automotive suppliers. The Group has 122 plants, 61 R&D centers, 10 distribution platforms and employs around 49,000 people in 27 countries worldwide.

For additional information, please contact:
Antoine Balas, Valeo Corporate Communications, Tel.: +
Malgosia Rigoli, Corporate Communications, Google Enterprise EMEA, Tel.: +44207881 4537, malgosia@google.com

For more information about the Group and its activities, please visit our web site http://www.valeo.com

Update 1: Google France’s Laurent Guiraud (who worked closely with me on the proof of concept and who I have to thank for most of what I know about the Google Search Appliance) has written about it on the Official Google Blog: 30,000 new Google Apps business users at Valeo.

Update 2: The story is now featured on the Official Google Enterprise Blog as well.

Update 3: Said to be Google’s “biggest enterprise deal yet“.

Update 4: ReadWriteWeb have picked up the story: Google Apps Continues Push Into Enterprise: 30,000 New Users at Valeo.

Update 5: So have TechCrunchIT: Google Cloud:1. MS Office: 0. 

Update 6: ComputerWeekly report Google Apps gets first global customer

Update 7: BusinessWeek talk about it while asking What’s Holding Back Google Apps?

Update 8: InfoWeek report that Google’s Cloud Evangelism Converts Enterprise Customers

Update 9: The Register (incorrectly) report that “Cap Gemini has sold what it believes is the largest ever contract for Google’s online suite of software products“, only the deal was as good as done by the time they got it.

Update 10: Computer Business Review writes Google secures biggest ever apps contract

Update 11: CNET states With Valeo deal, Google Apps gains business cred

On the Google Docs sharing security incident

I was just trying to respond to ZDnet’s hot-off-the-press article (The cloud bites back: Google bug shared private Google Docs data) about the recent Google Docs sharing vulnerability but ZDnet’s servers are throwing errors. Anyway now that Google have announced that they “believe the issue affected less than 0.05% of all documents” (rather than just emailing the affected users) I was considering writing a post anyway so killing two birds with one stone:

It’s convenient that they should prefer to use a percentage of an unknown number rather than a meaningful statistic, but given that sharing even a single document inappropriately could destroy a business or someone’s life it is still very serious. Fortunately I’ve not heard of any such incidences resulting from this breach (then again often you won’t).

Putting it in perspective though, for the same sample of documents over the same period how many do you think would have suffered security breaches under the “old way” of storing them locally and emailing them? And by security breaches I include availability (loss) and integrity (corruption) as well as confidentiality (disclosure).

People still lose/steal latops and leave data laden USB keys all over the place so I don’t see that this is much different from before and may well be better. Security researchers need statistics though so it would be useful if vendors were more transparent with information about breaches.

It would be great to see some more objective analysis and reporting comparing cloud computing with legacy systems – I’d say the fear mongerers would be surprised by the results.

Here’s some tips that cloud vendors should ideally try to follow:

  • Work with researchers to resolve reported issues
  • Always be transparent about security issues (even if you think nobody noticed)
  • Limited liability is not an excuse to be negligent – always write secure code and test thoroughly
  • Remember that at least until cloud computing is widely accepted (and even thereafter) you are in the business of trust, which is hard to gain and easy to lose.

That’s all for today – back to cloud standards…

Towards a Flash free YouTube killer (was: Adobe Flash penetration more like 50%)

A couple of days ago I wrote about Why Adobe Flash penetration is more like 50% than 99%, which resulted in a bunch of comments as well as a fair bit of discussion elsewhere including commentary from Adobe’s own John Dowdell. It’s good to see some healthy discussion on this topic (though it’s a shame to see some branding it “more flash hate” and an AC poster asking “How much did M$ pay you for this”).

Anyway everyone likes a good demonstration so I figured why not create a proof-of-concept YouTube killer that uses HTML 5’s video tag?

Knowing that around 20% of my visitors already have a subset of HTML 5 support (either via Safari/WebKit or Firefox 3.1 beta), and that this figure will jump to over 50% shortly after Firefox 3.1 drops (over 50% of my readers use Firefox and over 90% of them run the most recent versions), I would already be considering taking advantage of the new VIDEO tag were I to add videos to the site (even though, as a Google Apps Premier Edition user I already have a white label version of YouTube at http://video.samj.net/).

Selecting the demo video was easy – my brother, Michael Johns, did a guest performance on American Idol last Wednesday and as per usual it’s already popped up on YouTube (complete with a HD version). Normally YouTube use Flash’s FLV codec but for HD they sensibly opted for H.264 which is supported by Safari (which supports anything QuickTime supports – including Ogg Vorbis for users with Perian installed). Getting the video file itself is just a case of browsing to the YouTube page, going to Window->Activity and double clicking the digitally signed link that looks something like ‘http://v4.cache.googlevideo.com/videoplayback‘ which should result in the video.mp4 file being downloaded (though now Google are offering paid downloads they’re working hard to stop unsanctioned downloading).

On the other hand Firefox 3.1 currently only supports Ogg Vorbis for licensing/patent reasons as even Reasonable and Non-Discriminatory (RAND) licensing is unreasonable and discriminatory for free and open source software. Unfortunately the W3C working group infamously removed a recommendation that implementors ‘should’ support Ogg Vorbis and Theora for audio and video respectively. Currently a codec recommendation is conspicuously absent from the HTML 5 working draft. So what’s a developer to do but make both Ogg and H.264 versions available? Fortunately transcoding MP4 to OGG (and vice versa) is easy enough with VLC, resulting in a similar quality but 10% smaller file (video.ogg).

The HTML code itself is quite straightforward. It demonstrates:

  • A body onLoad function to switch to Ogg for Firefox users
  • YouTube object fallback for unsupported browsers (which in turn falls back to embed)
  • A simple JavaScript Play/Pause control (which could easily be fleshed out to a slider, etc.)
  • A simple JavaScript event handler to show an alert when the video finishes playing
<?xml version="1.0" encoding="UTF-8"?>

<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en”>
<title>Towards a Flash free YouTube killer…</title>

<!– Basic test for Firefox switches to Ogg Theora –>
<!– TTest could be arbitrarily complex and/or run on the server side –>
<body onLoad=”if (/Firefox/.test(navigator.userAgent)){ document.getElementsByTagName(‘video’)[0].src = ‘video.ogg’; }”>
<h1>Michael Johns &amp; Carly Smithson – The Letter</h1>
<p>(Live At American Idol 02/18/2009) HD
(from <a href=”http://www.youtube.com/watch?v=LkTCFo8XfAc”>YouTube</a&gt;)</p>

<!– Supported browsers will use the video code and ignore the rest –>
<video src=”video.mp4″ autoplay=”true” width=”630″ height=”380″>
<!– If video tag is unsupported by your browser legacy code used –>


<!– Here’s a script to give some basic playback control –>
function playPause() {
var myVideo = document.getElementsByTagName(‘video’)[0];
if (myVideo.paused)
<p><input type=button onclick=”playPause()” value=”Play/Pause”></p>

<!– Here’s an event handler which will tell us when the video finishes –>
myVideo.addEventListener(‘ended’, function () {
alert(‘video playback finished’)
} );
<p>By <a href=”https://samj.net/”>Sam Johnston</a> of
<a href=”http://www.aos.net.au/”>Australian Online Solutions</a></p>

This file (index.html) and the two video files above (video.mp4 and video.ogg) are then uploaded to Amazon S3 (at http://media.samj.net/) and made available via Amazon CloudFront content delivery network (at http://media.cdn.samj.net/). And finally you can see for yourself (bearing in mind that to keep the code clean no attempts were made to check the ready states so either download the files locally or be patient!):

Towards a Flash free YouTube killer…

The day Google broke the Internet…

As I write this, all Google search results are being flagged as malware and redirected to a scary warning page, including google.com itself:

The warning page (which I’ve only seen a handful of times before today) looks like this:

Note the absence of any links to the offending site (in this case http://www.google.com/), diverting traffic to StopBadware.org (which is unsurprisingly down) and its own Safe Browsing diagnostic page , which is also down:

Looking forward to seeing the explanation for this… sucks to be whoever was responsible right now. When was the last time Google was effectively down for over half an hour?

Update: So it’s definitely not just me – twitter’s going crazy too.

Update 2: It’s on Slashdot now, but hard to say if mainstream press has picked it up because Google News is down now too:

Update 3: StopBadware.org are pointing the finger at Google for their denial of service (see “Google glitch causes confusion“).

Update 4: Google are pointing the finger back at StopBadware.org (see “This site may harm your computer” on every search result?!?!). Marissa Mayer has explained on the official blog the cause of the problem to have been human error in that “the URL of ‘/’ was mistakenly checked in as a value to the file and ‘/’ expands to all URLs“.

Update 5: StopBadware.org claim that Google’s explanation (above) “is not accurate. Google generates its own list of badware URLs, and no data that we generate is supposed to affect the warnings in Google’s search listings. We are attempting to work with Google to clarify their statement.

Update 6: Google statement updated, noting that they “maintain a list of such sites through both manual and automated methods” and that they work with StopBadware.org to “come up with criteria for maintaining this list, and to provide simple processes for webmasters to remove their site from the list“, not for delivery of the list itself.

Summary: Now everything’s back to normal the question to ask is how it was possible that a single character error in an update to a single file could disable Internet searches for the best part of an hour for most users? StopBadware.orgGoogle should never have allowed this update to issue (even though each case needs to be individually researched by humans the list itself should be maintained by computers) and Google’s servers should never have accepted the catch-all ‘/’ (any regexp matching more than a single server should be considered bogus and ignored). Fortunately it’s not likely to happen again, if only because Google (who are “very sorry for the inconvenience caused to [their] users”) are busy putting “more robust file checks in place”.

Moral of the story: Wherever black or white-listing is implemented there are more moving parts and more to go wrong. In this case the offending service provides a tangible benefit (protecting users from malware) but those of you whose leaders are asking for your permission to decide what you can and can’t see on the Internet should take heed – is the spectre of censorship and the risk of a national Internet outage really worth the perceived benefit? Will such an outage be rectified within an hour as it was by Google’s army of SREs (Site Reliability Engineers)? And perhaps most importantly, will the scope of the list remain confined to that under which it was approved (typically the ‘war on everything’ from child pornography to terrorism) or will it end up being used for more nefarious purposes? In my opinion the benefit rarely outweighs the (potential) cost.

Major Google App Engine 2.0 update (Java, cron, billing) in the pipeline?

I’ve been doing a lot of Google App Engine development lately and getting increasingly familiar with how it all works. I was curious as to how the SDK (specifically appcfg.py) goes about pushing applications up to the cloud and wasn’t surprised to find that it’s actually quite simple, trading your credentials for a token and then pushing files up one by one (wrapped in a transaction). What I was surprised to find though was a seemingly complete implementation of a feature that has been requested time and time and time and time and time again: cron jobs.

The cron implementation appears to have been checked in by one Dan Sanderson (a Programmer/Writer at Google according to LinkedIn) with subversion r34 (1.1.8) back on the 15 January 2009, though apparently I’m not the first to spot it. Thanks Ross M Karchner for kicking the tyres and working out that while the SDK and APIs are in place and cron entries can be created, the events don’t fire just yet (and Google for the new toy – it’s great to know now what we’ll be able to do soon).

Of course the other big feature that’s been in the pipeline for a long, long, long, long, long time (and that was even apparently publicly confirmed by Google last year) is Java support. Greg Czajkowski’s been a Googler for a while now and his contribution to Sun’s Project Barcelona (including JSR 284: Resource Consumption Management API and JSR 121: Application Isolation API Specification) is just what they need to make Java work in this environment. Knowing that the mystery language is going to be one of the few Google supports internally (C++, Java, Python and JavaScript), that we already have Python and that it’s not likely to be C++ or JavaScript (though they wouldn’t be the first to try server side javscript), that leaves only Java. It’s literally App Engine’s number 1 issue and between it and related issue 102 it has over 2000 votes.

Finally, and arguably most importantly for those of us working on enterprise deployments rather than hobby projects, utility billing is overdue and eagerly awaited.

I’m expecting to see this major update drop some time in the next few weeks, securing Google’s position as one of a small number of major Platform-as-a-Service (PaaS) players. In the mean time we’ve got offline Gmail to play with.

Update: Here’s a comment I left on issue 6 which gives some more technical details:

I just spotted this in the latest SDK release so it looks like cron (among other things) is just around the corner:

$ ls -la google/appengine/cron/
total 272
drwxr-xr-x 12 samj admin 408 17 Jan 12:18 .
drwxr-xr-x 11 samj admin 374 17 Jan 12:18 ..
-r--r--r-- 1 samj admin 27359 15 Jan 03:16 GrocLexer.py
-rw-r--r-- 1 samj admin 25813 17 Jan 12:18 GrocLexer.pyc
-r--r--r-- 1 samj admin 21071 15 Jan 03:16 GrocParser.py
-rw-r--r-- 1 samj admin 18377 17 Jan 12:18 GrocParser.pyc
-r-xr-xr-x 1 samj admin 646 15 Jan 03:16 __init__.py
-rw-r--r-- 1 samj admin 313 17 Jan 12:18 __init__.pyc
-r-xr-xr-x 1 samj admin 1909 15 Jan 03:16 groc.py
-rw-r--r-- 1 samj admin 3050 17 Jan 12:18 groc.pyc
-r-xr-xr-x 1 samj admin 7848 15 Jan 03:16 groctimespecification.py
-rw-r--r-- 1 samj admin 10029 17 Jan 12:18 groctimespecification.pyc

From the comments in the code, here's what you can expect:

A Groc schedule looks like '1st,2nd monday 9:00', or 'every 20 mins'. This
module takes a parsed schedule (produced by Antlr) and creates objects that
can produce times that match this schedule.

A parsed schedule is one of two types - an Interval, and a Specific Time.
See the class docstrings for more.

Extensions to be considered:

allowing a comma separated list of times to run
allowing the user to specify particular days of the month to run

An Interval type spec runs at the given fixed interval. They have two
period - the type of interval, either "hours" or "minutes"
interval - the number of units of type period.

A Specific interval is more complex, but define a certain time to run, on
given days. They have the following attributes:
time - the time of day to run, as "HH:MM"
ordinals - first, second, third &c, as a set of integers in 1..5
months - the months that this is valid, as a set of integers in 1..12
weekdays - the days of the week to run this, 0=Sunday, 6=Saturday.

The specific time interval can be quite complex. A schedule could look like
"1st,third sat,sun of jan,feb,mar 09:15"

In this case, ordinals would be [1,3], weekdays [0,6], months [1,2,3] and time
would be "09:15".

Seems I'm not the first to discover this[1], and while the SDK works and cron-related log entries are written
the cron events don't fire just yet.


1. http://groups.google.com/group/google-appengine/browse_thread/thread/4376bdd02b7bfa3f?pli=1

Virtual Google Search Appliance is here…

I’ve been quiet of late as I’ve been busy racking up the frequent flier miles last month or two, but I’m back (albeit busy) and will endeavour to work through a backlog of posts, even if that means spending less time on them and leaving the Pulitzer Prize to someone else. While I wait for it to download I thought I’d let you know about today’s announcement of a Google Search [Virtual] Appliance (which I’ve been hanging out for, under NDA, since 2006!):

Ever wanted to write code against Google search technology, test your apps, and see how it all integrates into your development environment without having to pay a thing? If you’re an IT administrator, you’ll have that chance with the new virtual edition of the Google Search Appliance. The Google Search Appliance virtual edition is for non-commercial, development purposes only, and gives developers the opportunity to test against the features of the physical Google Search Appliance.

The Google Search Appliance virtual edition provides a free test bed for the Google Search Appliance – our solution for securely searching enterprise content behind the corporate firewall – helping ensure a smooth transition to the production-ready hardware. If your organization is considering adopting an enterprise search solution, the virtual edition platform gives your team the flexibility to build applications against the Google Search Appliance, try different configuration scenarios, explore proofs-of-concept and test the APIs supported by Google enterprise search technology. As part of testing with the virtual edition, you can:

These features might come in handy, particularly if your existing environment contains the array of legacy systems, databases, servers and integration architecture typical of most large organizations. And because it’s free, your boss might give you an extra week’s vacation just for trying it out (don’t quote us on that). You can download Google Search Appliance virtual edition software onto any server that is supported by VMWare virtualization. To learn more and get started, click here. And since we always love feedback, feel free to drop by our developer community or send your thoughts to enterprisegsavirtual@google.com.

Well it’s almost done, but I’m not holding my breath as it wants 3Gb of RAM and I didn’t have the patience for Apple to custom build a 4Gb MacBook for me the other week so I’ve only got 2Gb. I wonder what it would take to get it up and running on a large instance of EC2?

Update: It works (albeit slowly), and it looks surprisingly standard (Linux 2.6.20 – CentOS 5 I think); maybe EC2’s not out of the realm of possibility after all:

Update 2: Having kicked the tires for a while I’m already thinking about the possibilities. Now that the GSA has broken its shackles to expensive, proprietary hardware the world is its’ oyster, and while the license prohibits production use, that’s an administrative rather than a technical hurdle. Locking down the licensing (currently the MAC address is mashed up and digitally signed along with various feature and URL count restrictions, but MAC addresses are malleable with virtual machines) and ensuring performance meets acceptable standards on uncontrollable (virtual) hardware are two obvious (if optional) hurdles. That said, expect to see something happen in this area as the competition is already offering free, downloadable search solutions; indeed I wouldn’t be surprised if there were already virtual GSAs in production.

I’d really like to see Google supported for Australian Online Solutions‘ upcoming CloudSearch product, so getting it up and running on EC2 would be nice even if only to prove the concept. Assuming there’s no non-standard kernel hacks then migration shouldn’t be that hard, and even if there were they would have to be released under the terms of the GPL per my thus far unanswered public request. That said, user selectable kernels (AKIs) and ramdisks (ARIs) on Amazon’s EC2 are currently only available to Amazon and a select few others so said modifications (if any) would have to be injected via a loadable module for now.

Watch this space…

Google Chrome: Cloud Operating Environment

Google Chrome is a lot more than a next generation browser; it’s a prototype Cloud Operating Environment.

Rather than blathering on to the blogosphere about the superficial features of Google’s new Chrome browser I’ve spent the best part of my day studying the available material and [re]writing a comprehensive Wikipedia article on the subject which I intend for anyone to be free to reuse under a Creative Commons Attribution 3.0 license (at least this version anyway) rather than Wikipedia’s usual strong copyleft GNU Free Documentation License (GFDL). This unusual freedom is extended in order to foster learning and critical analysis, particularly in terms of security

My prognosis is that this is without doubt big news for cloud computing, and as a CISSP watching with disdain at the poor state of web browser security big news for the security community too. Here’s why:

Surfing the Internet today is like unprotected sex with strangers; Chrome is the condom of the cloud.

The traditional model of a monolithic browser is fundamentally and fatally flawed (particularly with the addition of tabs). Current generation browsers lump together a myriad trusted and untrusted software (yes, many web sites these days are more software than content) running in the same memory address space. Even with the best of intentions this is intolerable as performance problems in one area can cause problems (and even data loss) in others. It’s the web equivalent of the bad old days where one rogue process would take down the whole system. Add nefarious characters to the mix and it’s like living in a bad neighbourhood with no locks

Current generation browsers are like jails without cells.

Chrome introduces a revolutionary new software architecture, based on components from other open source software, including WebKit and Mozilla, and is aimed at improving stability, speed and security, with a simple and efficient user interface.

The first intelligent thing Chrome does is split each task into a separate process (‘sandbox’), thus delegating to the operating system which has been very good at process isolation since we introduced things like pre-emptive multitasking and memory protection. This exacts a fixed per-process resource cost but avoids memory fragmentation issues that plague long-running browsers. Every web site gets its own tab complete with its own process and WebKit rendering engine, which (following the principle of least privilege) runs with very low privileges. If anything goes wrong the process is quietly killed and you get a sad mac style sad tab icon rather than an error reporting dialog for the entire browser.

Chrome enforces a simple computer security model whereby there are two levels of multilevel security (user and sandbox) and the sandbox can only respond to communication requests initiated by the user. Plugins like Flash which often need to run at or above the security level of the browser itself are also sandboxed in their own relatively privileged processes. This simple, elegant combination of compartments and multilevel security is a huge improvement over the status quo, and it promises to further improve as plugins are replaced by standards (eg HTML 5 which promises to displace some plugins by introducing browser-native video) and/or modified to work with restricted permissions. There are also (publicly accessible) blacklists for warning users about phishing and malware and an “Incognito” private browsing mode.

Tabs deplace windows as first class citizens and can migrate between them like an archipelago of islands.

The user interface follows the simplification trend, and much of the frame or “browser chrome” (hence the name) can be hidden altogether so as to seamlessly blend web applications (eg Gmail) with the underlying operating system. Popups are confined to their source tab unless explicitly dragged to freedom, the “Omnibox” simplifies (and remembers) browsing habits and searches and the “New Tab Page” replaces the home page with an Opera style speed dial interface along with automatically integrated search boxes (eg Google, Wikipedia). Gears remains as a breeding ground for web standards and the new V8 JavaScript engine promises to improve performance of increasingly demanding web applications with some clever new features (most notably dynamic compilation to native code).

Just add Linux and cloud storage and you’ve got a full blown Cloud Operating System (“CloudOS”)

What is perhaps most intersting though (at least from a cloud computing point of view) is the full-frontal assault on traditional operating system functions like process management (with a task manager that allows users to “see what sites are using the most memory, downloading the most bytes and abusing (their) CPU”). Chrome is effectively a Cloud Operating Environment for any (supported) operating system in the same way that early releases of Windows were GUIs for DOS. All we need to do now is load it on to a (free) operating system like Linux and wire it up to cloud storage (ala Mozilla Weave) for preferences (eg bookmarks, history) and user files (eg uploads, downloads) and we have a full blown Cloud Operating System!

Update: Fixed URLs.

Chrome URLs: