The browser is the OS (thanks to Firefox 3.5, Chrome 2, Safari 4)

Almost a year ago I wrote about Google Chrome: Cloud Operating Environment and [re]wrote the Google Chrome Wikipedia article, discussing the ways in which Google was changing the game through new and innovative features. They had improved isolation between sites (which is great for security), improved usability (speed dial, tear off tabs, etc.) and perhaps most importantly for SaaS/Web 2.0 applications, vastly improved the JavaScript engine.

Similar features were quickly adopted by competitors including Opera (which Chrome quickly overtook at ~2%) and Firefox (which still has an order of magnitude more users at ~20-25%). Safari is really making waves too at around 1/3-1/2 of the share of Firefox (~8%) and with the recent release of Safari 4 it’s a compelling alternative – especially given it passes the Acid 3 test with flying colours while Firefox 3.5 bombs out at 93/100.

HTML 5 features such as local storage and the video and audio elements are starting to make their way into the new breed of browsers too, though it’s still often necessary to install Google Gears to get advanced offline functionality (e.g. most of the Google Apps suite) up and running. Google have drawn fire by missing the Firefox 3.5 launch and users finding Gears disabled are flocking to the gears-users Google Group to vent their frustrations, some going so far as claiming that “Google is trying to do what it can to push users to Chrome” and asking “Are we watching a proccess of Google becoming customer-deaf Microsoft?”. Let’s just hope it’s ready in time for my travel later this week…

The point is that after the brutal browser wars which stagnated the web for some time (right up until Microsoft opened the floodgates by introducing Ajax), we’re now starting to see some true competition again. Granted Internet Explorer is still a 1,000 pound gorilla at ~65% of market share, but even with a silk shirt in the form of IE 8 and a handful of lame ads it’s still a pig and the target of the vast majority of security exploits on the web. This makes it an an easy sell for any competitor who manages to get a foot in the door (which is unfortunately still the hardest part of the sale).

The decision not to ship IE with Windows 7 in Europe will be interesting as it should draw mainstream attention to the alternatives which will flow on to other markets (as we’ve seen with adoption of “alternative” technology like Linux in the past – not to mention the whole Netbook craze started by OLPC in the third world). However, with the browser being where most of the action is today the operating system has become little more than a life support system for it – an overly thick interface layer between the browser and the hardware. Surely I’m not the only one who finds it curious that while the software component of a new computer is fast approaching 50% of the cost (up from around 10% a decade ago), the heart of the system (the browser) is both absent from Windows 7 and yet freely available (both in terms of beer and freedom)? Something’s gotta give…

Anyway it’s time to stop looking at the features and performance of the underlying operating system, rather the security and scalability of the browser. When was the last time you turned to the operating system anyway, except to fix something that went wrong or do some menial housekeeping (like moving or deleting files)?

Is HTTP the HTTP of cloud computing?

Ok so after asking Is OCCI the HTTP of cloud computing? I realised that the position may have already been filled and that the question was more Is AtomPub already the HTTP of cloud computing?

After all my strategy for OCCI was to follow Google’s example with GData by adding some necessary functionality (a search interface, caching directives, resource-specific attributes, etc.). Most of the heavy lifting was actually being done by AtomPub, thus avoiding a huge amount of tedious and error-prone protocol writing (around 20,000 words of it) – something which OGF and the OCCI working group isn’t really geared up for anyway. This is clearly a workable and well-proven approach as it as been adopted strategically by both Microsoft and Google and also tactically by Salesforce and IBM, among others. Best of all adding things like queries and versioning is a manageable workload while starting from scratch is most certainly not.

But what if there were an easier way? Recall that the problem we are trying to solve is exposing a flexible interface to an arbitrarily large collection of interconnected compute, storage and network resources. We need to be able to describe and manipulate the resources (CRUD), associate them with each other via rich links (e.g. links with attributes like local identifiers – eth0, sda, etc.) and change their state (start, stop, restart, etc.), among other things.

Representational State Transfer (REST)

Actually we’re not talking about exposing the resources themselves (that would be impossible) but various representations of those resources – like Plato’s shadows on the cave walls – this is the “REpresentational” in “REpresentational State Transfer (REST)”. There’s an infinite number of possible representations so it’s impossible to try to capture them all now, but here’s some examples:

  • An Open Virtualisation Format (OVF) serialisation of a compute resource
  • A platform-specific descriptor file (e.g. VMX)
  • A complete archive of the virtual machine with its dependencies (OVA)
  • A graphical image of the console at a given point in time (‘snapshot’)
  • A video stream of the console for archiving/audit purposes (ala Citrix’s Project Iris)
  • The console itself (e.g. SSH, ICA, RDP, VNC)
  • Build documentation (e.g. PDF, ODF)
  • Esoteric enterprise requirements (e.g. NMS configuration)

It doesn’t take a rocket scientist to spot the correlation between this and HTTP’s existing content negotiation functionality (whereby a client can ask for a specific representation of a given resource – e.g. HTML vs PDF) so this is already pretty much solved for us (see HTTP’s Accept: header for the details). For bonus points this information should be exposed in the URI as it’s not always possible or convenient to set headers ala:

Web Linking

But what about the links? As I explained yesterday the web is built on links embedded in HTML documents using the A tag. Atom also provides enhanced linking functionality via the LINK element, where it is also possible to specify content types, languages, etc. In this case however we want to allow resources to be arbitrary types and more often than not we won’t have the ability to link within the payload itself. This leaves us with two options: put the links in the payload anyway by relying on a meta-model like Atom (or one we roll ourselves) or find some way to represent them within HTTP itself.

Enter HTTP headers which are also extensible and, as it turns out, in the process of being extended (or at least refined) to handle this very requirement by fellow down under, Mark Nottingham. See the “Web Linking” IETF Internet-Draft (draft-nottingham-http-link-header, at the time of writing version 05) for the nitty gritty details and the ietf-http-wg list for some current discussions. Basically it clarifies the existing Link: headers and the result looks something like this:

Link: <http://example.com/TheBook/chapter2>; rel="previous"; title="previous chapter">

The Link: header itself is also extensible so we can faithfully represent our model by adding e.g. the local device name when linking storage and network resources to compute resources and other requisite attributes. It would be helpful if the content-type were also specified (Atom allows for multiple links of the same relation provided the content-type differs for example) but language is already covered by HTTP (it doesn’t seem useful to advertise French links to someone who already asked to speak English).

It’s also interesting to note that earlier versions of the HTTP RFCs actually [poorly] specified both the Link: headers as well as LINK and UNLINK methods for maintaining links between web resources. John Pritchard had a crack at clarification in the Efficient HyperLink Maintenance for HTTP I-D but like most I-Ds this one seems to have died after 6 months, and with it the methods themselves. It seems to me that adding HTTP methods at this time is a drastic (and almost certainly infeasible) action, especially for something that could just as easily be accomplished via headers ala Set-Cookie: (too bad the I-D doesn’t specify how to add/delete/modify links!). In the simplest sense a Link: header appearing in a PUT or POST could replace the existing one(s) but something more elegant for acting on individual links would be nice – probably a discussion worth having on the ietf-http-wg list.

Organisation of Information

Looking back to Atom for a second we’re still missing some key functionality:

  • Atom id -> HTTP URL
  • Atom updated -> HTTP Last-Modified: Header
  • Atom title and summary -> Atom/HTTP Slug: Header or equivalent
  • Atom link -> HTTP Link: Header
  • Atom category -> ???

Houston, we have a problem. OCCI use cases range from embedded hypervisors exposing a single resource to a single entry-point for an entire enterprise or the “Great Global Grid” – we need a way to organise, categories and search for the information, likely including:

  • Free text search via a Google-style “?q=firewall” syntax
  • Taxonomy via categories (already done for Atom) for things like “Operating System” and “Data Center”
  • Folksonomy via [user] tags (already done for Atom and bearing in mind that tag spaces are cool) for things like “testlab”

Fortunately the good work already done in this area for Atom would be realatively easy to port to a Category: HTTP header, following the Link: header example above. In the mean time a standard search interface (including category support) is trivial and thanks to Google, already done.

Structured Data Formats

HTML also resolves another pressing issue – what format to use for submitting key-value pairs (which constitutes a large part of what we need to do with OCCI). It gives us two options:

The advantages of being able to create a resource from a web form simply by POSTing to the collection of resources (e.g. http://example.com/compute), and with HTML 5 by PUTting the resource in place directly (e.g. http://example.com/compute/<uuid&gt;) are immediately obvious. Not only does this help make the human and programmable web one and the same (which in turn makes it much easier for developers/users to kick the tyres and understand the API) but it means that scripting even advanced tasks with curl/wget would be trivial. Plus there’s no place for time-wasting religious arguments about angle brackets (XML) over curly braces (JSON).

RESTful State Machines

Something else which has not sat well with me until I spent the weekend ingesting RESTful Web Services book (by Leonard Richardson and Sam Ruby) was the “actuator” concept we picked up from the Sun Cloud APIs. This breaks away from RESTful principles by exposing an RPC-style API for triggering state changes (e.g. start, stop, restart). Granted it’s an improvement on the alternative (GETting a resource and PUTting it back with an updated state) as Tim Bray explains in RESTful Casuistry (to which Roy Fielding and Bill de hÓra also responded), but it still “feels funky”. Sure it doesn’t make any sense to try to “force” a monitored status to some other value (for example setting a “state” attribute to “running”), especially when we can’t be sure that’s the state we’ll get to (maybe there will be an error or the transition will be dependent on some outcome over which we have no control). Similarly it doesn’t make much sense to treat states as nouns, for example adding a “running” state to a collection of states (even if a resource can be “running” and “backing up” concurrently). But is using URLs as “buttons” representing verbs/transitions the best answer?

What makes more sense [to me] is to request a transition and check back for updates (e.g. by polling or HTTP server push). If it’s RESTful to POST comments to an article (which in addition to its own contents acts as a collection of zero or more comments) then POSTing a request to change state to a [sub]resource also makes sense. As a bonus these can be parametrised (for example a “resize” request can be accompanied with a “size” parameter and a “stop” request sent with clarification as to whether an “ACPI Off” or “Pull Cord” is required). Transitions that take a while, like “format” on a storage resource, can simply return HTTP 201 Accepted so we’ve got support for asynchronous actions as well – indeed some requests (e.g. “backup”) may not even be started immediately. We may also want to consider using something like Post Once Exactly (POE) to ensure that requests like “restart” aren’t executed repeatedly and that we can cancel requests that the system hasn’t had a chance to deal with yet.

Exactly how this should look in terms of URL layout I’m not sure (perhaps http://example.com/<resource>/requests) but being able to enumerate the possible actions as well as acceptable parameters (e.g. an enum for variations on “stop” or a range for “resize”) would be particularly useful for clients.

Collections

This is all well and good for individual resources, but collections are still a serious problem. There are many use cases which involve retrieving an arbitrarily large number of resources and making a HTTP request for each (as well as requests for enumeration etc.) doesn’t make sense. More importantly, it doesn’t scale – particularly in enterprise environments where requests via proxies and filters can suffer from high latency (if not low bandwidth).

One potential solution is to strap multiple HTTP message entities together as a multipart document, but that’s hardly clean and results in some hairy coding on the client side (e.g. manual manipulation of HTTP messages that would otherwise be fully automated). The best solution we currently have for this problem (as evidenced by widespread deployment) is AtomPub so I’m still fairly sure it’s going to have to make an appearance somewhere, even if it doesn’t wrap all of the resources by default.

rel=shortlink: url shortening that really doesn’t hurt the internet

Inspired primarily by the fact that the guys behind the RevCanonical fiasco are still stubbornly refusing to admit they got it wrong (the whole while arrogantly brushing off increasingly direct protests from the standards community) I’ve whipped up a Google App Engine application which reasonably elegantly implements rel=shortlink: url shortening that really doesn’t hurt the internet:

http://rel-shortlink.appspot.com

It works just like TinyURL and its ilk, accepting a URL and [having a crack at] shortening it. It checks both the response headers and (shortly) the HTML itself for rel=shortlink and if they’re not present then you have the option of falling back to a traditional service (the top half a dozen are preconfigured or you can specify your own via the API’s “fallback” parameter).

An interesting facet of this implementation is the warnings it gives if it encounters the similar-but-ambiguous short_url proposal and the fatal errors it throws up when it sniffs out the nothing-short-of-dangerous rev=canonical debacle. Apparently people (here’s looking at you Ars Technica and Dopplr) felt there was no harm in implementing these “protocols”. Now there most certainly is.

Here’s the high level details (from the page itself):

Who
A community service by Sam Johnston (@samj / s…@samj.net) of Australian Online Solutions, loosely based on a relatively good (albeit poorly executed) idea by some some web developers purporting to “save the Internet” while actually hurting it.
What
A mechanism for webmasters to indicate the preferred short URL(s) for a given resource, thereby avoiding the need to consult a potentially insecure/unreliable third-party for same. Resulting URLs reveal useful information about the source (domain) and subject (path):
http://tinyurl.com/cgy9pu » http://purl.org/net/shortlink
Where
The shortlink Google Code project, the rel-shortlink Google App Engine application, the #shortlink Twitter hashtag and coming soon to a client or site near you.
When
Starting April 2009, pending ongoing discussion in the Internet standards community (in the mean time you can also use http://purl.org/net/shortlink in place of shortlink).
Why
Short URLs are useful both for space constrained channels (such as SMS and Twitter) and also for anywhere URLs need to be manually entered (e.g. when they are printed or spoken). Third-party shorteners can cause many problems, including link rot, performance problems, outages and privacy & security issues.
How
By way of <link rel="shortlink"> HTML elements and/or Link: ; rel=shortlink HTTP headers.

So have at it and let me know what you think. The source code is available under the AGPL license for those who are curious as to how it works.

Introducing rel=”shortlink”: a better alternative to URL shorteners

Yesterday I wrote rather critically about a surprisingly successful drive to implement a deprecated “rev” relationship. This developed virtually overnight in response to the growing “threat” (in terms of linkrot, security, etc.) of URL shorteners including tinyurl.com, bit.ly and their ilk.

The idea is simple: allow the sites to specify short URLs in the document/feed itself, either automatically ([compressed] unique identifier, timestamp, “initials” of the title, etc.) or manually (using a human-friendly slug). That way, when people need to reference the URL in a space constrained environment (e.g. microblogging like Twitter) or anywhere they need to be manually entered (e.g. printed or spoken) they can do so in a fashion that will continue to work so long as the target does and which reveals information about the content (such as its owner and a concise handle).

Examples of such short URLs include:

The idea is sound but the proposed implementation is less so. There is (or at least was) provision for “rev”erse link references but these have been deprecated in HTML 5. There is also a way of hinting the canonical URI by specifying a rel=”canonical” link. This makes a lot of sense because often the same document can be referred to by an infinite number of URIs (e.g. in search results, with sort orders, aliases, categories, etc.). Combine the two and you’ve got a way of saying “I am the canonical URI and this other URI happens to point at me too”, only it can only ever (safely) work for the canonical URL itself and it doesn’t make sense to list one arbitrary URL when there could be an infinite number.

Another suggestion was to use rel=”alternate shorter” but the problem here is that the content should be identical (except for superficial formatting changes such as highlighting and sort order), while “alternate” means “an alternate version of the resource” itself – e.g. a PDF version. Clients that understand “alternate” versions shoult not list the short URL as the content itself is (usually) the same.

Ben Ramsay got closest to the mark with A rev=”canonical” rebuttal but missed the “alternate” problem above, nonetheless suggesting a new rel=”shorter” relation. Problem there is the “short” URI is not guaranteed to be “shortest” or indeed even “shorter” – it still makes sense, for example, to specify a “short” URI of http://example.com/promo to a user viewing http://example.com/123 because the longer “short” URI conveys information about the content in addition to its host.

Accordingly I have updated WHATWG RelExtensions and will shortly submit the following to the IANA IESG for addition to the Atom Link Relations registry:

Value:
shortlink (http://purl.org/net/shortlink)

Description:
A short URI that refers to the same document.

Expected Display Characteristics:
This relation may be used as a concise reference to the document. It will
typically be shorter than other URIs (including the canonical URI) and may
rely on a [compressed] unique identifier or a human readable slug. It is
useful for space constrained environments such as email and microblogs as
well as for URIs that need to be manually entered (e.g. printed, spoken).
The referenced document may differ superficially from the original (e.g.
sort order, highlighting).

Security Considerations:
Automated agents should take care when this relation crosses administrative domains (e.g., the URI has a different authority than the current document). Such agents should also avoid circular references by resolving only once.

Note that in the interim “http://purl.org/net/shortlink” can be used. Bearing in mind that you should be liberal in what you accept, and conservative in what you send, servers should use the interim identifier for now and clients should accept both. Nobody should be accepting or sending rev=”canonical” or rel=”alternate shorter” given the problems detailed above.

Update: It seems there are still a few sensible people out there, like Robert Spychala with his Short URL Auto-Discovery document. Unfortunately he proposes a term with an underscore (short_url) when it should be a space and causes the usual URI/URL confusion. Despite people like Bernhard Häussner claiming that “short_url is best, it’s the only one that does not sound like shortened content“, I don’t get this reading from a “short” link… seems pretty obvious to me and you can always still use relations like “abstract” for that purpose. In any case it’s a valid argument and one that’s easily resolved by using the term “shortcutlink” instead (updated accordingly above). Clients could fairly safely use any link relation containing the string “short”.

Update: You can follow the discussion on Twitter at #relshortcut, #relshort and #revcanonical.

Update: I forgot to mention again that the HTTP Link: header can be used to allow clients to find the shortlink without having to GET and parse the page (e.g. by doing a HEAD request):

Link: <http://example.com/promo> rel="shortlink"

Update: Both Andy Mabbett and Stan Vassilev also independently suggested rel=shortcut, which leads me to believe that we’re on a winner. Stan adds that we’ve other things to consider in addition to the semantics and Google’s Matt Cutts points out why taking rather than giving canonical-ness (as in RevCanonical) is a notoriously bad idea.

Update: Thanks to the combination of Microsoft et al recommending the use of “shortcut icon” for favicon.ico (after stuffing our logs by quietly introducing this [mis]feature) and HTML link types being a space separated list (thanks @amoebe for pointing this out – I’d been looking at the Atom RFCs and assuming they used the single link type semantics), the term “shortcut” is effectively scorched earth. Not only is there a bunch of sites that already have “shortcut” links (even if the intention was that “shortcut icon” be atomic), but there’s a bunch of code that looks for “shortcut”, “icon” or “shortcut icon”. FWIW HTML 5 specifies the “icon” link type. Moral of the story: get consensus before implementing code.

As I still have problems with the URI/URL confusion (thus ruling out “shorturl”) but have come around to the idea that this should be a noun rather than an adjective, I now propose “shortlink” as a suitable, self-explanatory, impossible-to-confuse term.

Update: I’ve created a shortlink Google Group and kicked off a discussion with a view to reaching a consensus. I’ve also created a corresponding Google Code project and modified the shorter links WordPress plugin to implement shortlinks.

rev=”canonical” considered harmful (complete with sensible solution)

Sites like http://tinyurl.com/ provide a very simple service: turning unwieldly but information rich URLs like https://samj.net/2009/04/open-letter-to-community-regarding-open.html into something more manageable like http://tinyurl.com/ceze29. This was traditionally useful for emails with some clients mangling long URLs but it also makes sense for URLs in documents, on TV, radio, etc. (basically anywhere a human has to manually enter it). Shorteners are a dime a dozen now – there’s over 90 of them listed here alone… and I must confess to having created one at http://tvurl.com/ a few years back (the idea being that you could buy a TV friendly URL). Not a bad idea but there were other more important things to do at the time and I was never going to be able to buy my first island from the proceeds. Unfortunately though there are many problems with adding yet another layer of indirection and the repurcussions could be quite serious (bearing in mind even the more trustworthy sites tend to come and go).

So a while back I whipped up a thing called “springboard” for Google Apps/AppEngine (having got bored with maintaining text files for use with Apache’s mod_rewrite) which allowed users to create redirect URLs like http://go.example.com/promo (and which was apparently a good idea because now Google have their own version called short links). This is the way forward – you can tell at a glance who’s behind the link from the domain and you even get an idea of what you’re clicking through to from the path (provided you’re not being told fibs). When you click on this link you get flicked over to the real (long) URL with a HTTP redirect, probably a 301 which means “Moved Permanently”, so the browsers know what’s going on too. If your domain goes down then chances are the target will be out of action too (much the same story as with third-party DNS) so there’s a lot less risk. It’s all good news and if you’re using a CMS like Drupal then it could be completely automated and transparent – you won’t even know it’s there and clients looking for a short URL won’t have to go ask a third party for one.

So the problem is that nowdays you’ve got every man and his dog wanting to feed your nice clean (but long) URLs through the mincer in order to post them on Twitter. Aside from being a security nightmare (the resulting URLs are completely opaque, though now clients like Nambu are taking to resolving them back again!?!), it breaks all sort of things from analytics to news sites like Digg. Furthermore there are much better ways to achieve this. If you have to do a round trip to shorten the URL anyway, why not ask the site for a shorter version of its canonical URL (that being the primary or ‘ideal’ URL for the content – usually quite long and optimised for SEO)? In the case of Drupal at least every node has an ID so you can immediately boil URLs down to http://example.com/node/123, http://example.com/123 or even use something like base32 to get even shorter URLs like http://example.com/3R.

So how do we express this for the clients? The simplest way is to embed LINK tags into the HEAD section of the HTML and specify a sensible relation (“rel”). Normally these are used to specify alternative versions of the content, icons, etc. but there’s nothing to say that for any given URL(s) the “short” url is e.g. http://example.com/3R. That’s right, rel=”short”, not rel=”alternate shorter” or other such rubbish (“alternate” refers to alternate content, usually in a different mime-type, not just an alternate URL – here the content is likely to be exactly the same). It can be performance optimised somewhat too by setting an e.g. X-Rel-Short header so that users (e.g. Twitter clients) can resolve a long URL to the preferred short URL via a HTTP HEAD request, without having to retrieve and parse the HTML.

Another even less sensible alternative being peddled by various individuals (and being discussed here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and of course here) is [ab]using the rightly deprecated and confusing rev attribute ala rev=”canonical”. Basically this is saying “I am the authorative/canonical URL and this other URL happens to point here too”, without saying anything whatsoever about the URL itself actually being short. There could be an infinite number of such inbound URLs and this only ever works for the one canonical URL itself. Essentially this idea is stillborn and I sincerely hope that when people come back to work next week it will be promptly put out of its misery.

So in summary someone’s got carried away and started writing code (RevCanonical) without first considering all the implications. Hopefully they will soon realise this isn’t such a great idea after all and instead get behind the proposal for rel=”short” at the WHATWG. Then we can all just add links like this to our pages:

<link href=”http://example.com/promo&#8221; rel=”short”>

Incidentally I say “short” and not “shorter” because the short URL may not in fact be the shortest URL for a given resource – “http://example.com/3R&#8221; could well also map back to the same page but the URL is meaningless. And I leave out “alternate” because it’s not alternate content, rather just an alternate URL – a subtle but significant difference.

Let’s hope sanity prevails…

Update: The HTTP Link: header is a much more sensible solution to the HTTP header optimisation:

Link: <http://example.com/promo>; rel="short"

Towards a Flash free YouTube killer (was: Adobe Flash penetration more like 50%)

A couple of days ago I wrote about Why Adobe Flash penetration is more like 50% than 99%, which resulted in a bunch of comments as well as a fair bit of discussion elsewhere including commentary from Adobe’s own John Dowdell. It’s good to see some healthy discussion on this topic (though it’s a shame to see some branding it “more flash hate” and an AC poster asking “How much did M$ pay you for this”).

Anyway everyone likes a good demonstration so I figured why not create a proof-of-concept YouTube killer that uses HTML 5’s video tag?

Knowing that around 20% of my visitors already have a subset of HTML 5 support (either via Safari/WebKit or Firefox 3.1 beta), and that this figure will jump to over 50% shortly after Firefox 3.1 drops (over 50% of my readers use Firefox and over 90% of them run the most recent versions), I would already be considering taking advantage of the new VIDEO tag were I to add videos to the site (even though, as a Google Apps Premier Edition user I already have a white label version of YouTube at http://video.samj.net/).

Selecting the demo video was easy – my brother, Michael Johns, did a guest performance on American Idol last Wednesday and as per usual it’s already popped up on YouTube (complete with a HD version). Normally YouTube use Flash’s FLV codec but for HD they sensibly opted for H.264 which is supported by Safari (which supports anything QuickTime supports – including Ogg Vorbis for users with Perian installed). Getting the video file itself is just a case of browsing to the YouTube page, going to Window->Activity and double clicking the digitally signed link that looks something like ‘http://v4.cache.googlevideo.com/videoplayback‘ which should result in the video.mp4 file being downloaded (though now Google are offering paid downloads they’re working hard to stop unsanctioned downloading).

On the other hand Firefox 3.1 currently only supports Ogg Vorbis for licensing/patent reasons as even Reasonable and Non-Discriminatory (RAND) licensing is unreasonable and discriminatory for free and open source software. Unfortunately the W3C working group infamously removed a recommendation that implementors ‘should’ support Ogg Vorbis and Theora for audio and video respectively. Currently a codec recommendation is conspicuously absent from the HTML 5 working draft. So what’s a developer to do but make both Ogg and H.264 versions available? Fortunately transcoding MP4 to OGG (and vice versa) is easy enough with VLC, resulting in a similar quality but 10% smaller file (video.ogg).

The HTML code itself is quite straightforward. It demonstrates:

  • A body onLoad function to switch to Ogg for Firefox users
  • YouTube object fallback for unsupported browsers (which in turn falls back to embed)
  • A simple JavaScript Play/Pause control (which could easily be fleshed out to a slider, etc.)
  • A simple JavaScript event handler to show an alert when the video finishes playing
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en”>
<head>
<title>Towards a Flash free YouTube killer…</title>
</head>

<!– Basic test for Firefox switches to Ogg Theora –>
<!– TTest could be arbitrarily complex and/or run on the server side –>
<body onLoad=”if (/Firefox/.test(navigator.userAgent)){ document.getElementsByTagName(‘video’)[0].src = ‘video.ogg’; }”>
<h1>Michael Johns &amp; Carly Smithson – The Letter</h1>
<p>(Live At American Idol 02/18/2009) HD
(from <a href=”http://www.youtube.com/watch?v=LkTCFo8XfAc”>YouTube</a&gt;)</p>

<!– Supported browsers will use the video code and ignore the rest –>
<video src=”video.mp4″ autoplay=”true” width=”630″ height=”380″>
<!– If video tag is unsupported by your browser legacy code used –>

</video>

<!– Here’s a script to give some basic playback control –>
<script>
function playPause() {
var myVideo = document.getElementsByTagName(‘video’)[0];
if (myVideo.paused)
myVideo.play();
else
myVideo.pause();
}
</script>
<p><input type=button onclick=”playPause()” value=”Play/Pause”></p>

<!– Here’s an event handler which will tell us when the video finishes –>
<script>
myVideo.addEventListener(‘ended’, function () {
alert(‘video playback finished’)
} );
</script>
<p>By <a href=”https://samj.net/”>Sam Johnston</a> of
<a href=”http://www.aos.net.au/”>Australian Online Solutions</a></p>
</body>
</html>

This file (index.html) and the two video files above (video.mp4 and video.ogg) are then uploaded to Amazon S3 (at http://media.samj.net/) and made available via Amazon CloudFront content delivery network (at http://media.cdn.samj.net/). And finally you can see for yourself (bearing in mind that to keep the code clean no attempts were made to check the ready states so either download the files locally or be patient!):

Towards a Flash free YouTube killer…

Why Adobe Flash penetration is more like 50% than 99%

Slashdot discussed PC PRO’s “99% Flash Player Penetration – Too Good to be True?” article today which prompted me to explain why I have always been dubious of Adobe’s claim that “Flash content reaches 99.0% of Internet viewers“. Here’s the claim verbatim:

Adobe Flash Player is the world’s most pervasive software platform, used by over 2 million professionals and reaching 99.0% of Internet-enabled desktops in mature markets as well as a wide range of devices.

The methodology section asks more questions than it answers but fortunately we don’t even need to go into details about how 3-6% margins of error can lead to a 99% outcome. Here’s why the claim is bogus:

Mobile devices

“Internet viewers” clearly includes users of mobile devices like the iPhone and yet they are conveniently only counting “desktops in mature markets”. Fortunately this little chestnut which has just been removed from Adobe’s own Mobile & Devices Develoer Center is still available in the Google cache:

So by Adobe’s own reckoning there’s about 1 billion PCs to 3 billion mobile devices. That reconciles with other research, but the latest news is that we’ve just passed 4 billion (or 60% of the entire global population today). Let’s be conservative anyway and use Adobe’s own figures. So that’s 1 billion PCs to 3 billion mobiles.

99% of 25% is still less than 25% overall…

but that’s not really fair because many (most?) of those mobile devices aren’t yet Internet enabled. Well every iPhone sells with Internet and none of them have Flash so those 13 million Internet-enabled mobile devices already knock nearly a point and a half off their 99% claim. Google’s Android’s just getting started and its first incarnation (the T-Mobile G1) has already sold a million devices with many more to come this year.

The iPhone alone knocks it down to 97.5%

On the other hand, Flash Lite will have shipped on a billion devices by March so:

1 billion PCs + 1 billion mobiles = 2 billion out of 4 billion total = 50%

As if it’s not enough for Adobe that large segments of the mobile device market are currently out of their reach they have another two emerging technologies which will erode their penetration rates:

HTML 5

Features like the “video” and “canvas” tags, timed media playback, offline storage, messaging/networking, etc. (which have previously only been possible via 3rd party plugins) will soon be supported natively by the browser. I’m a W3C Invited Expert in the HTML 5 working group (primarily monitoring the web application developments) and it’s great to see demos like offline mobile Gmail already starting to appear. It will also be nice not having to download and install a plugin to view video content in sites like YouTube, Facebook and MySpace (arguably the main reason why anyone would install Flash in the first place).

Netbooks

Over 10 million of Netbooks (“Internet Notebooks”) shipped last year and another 35 million are expected to ship this year. So far Flash penetration hasn’t been too bad with Windows/Linux on x86 devices but the next generation will run a myriad operating systems on alternatives to x86 like ARM. Flash will certainly have trouble maintaining its usual penetration rates, and will have to do so with less incentive to install thanks to HTML 5 support in the browsers.

In summary, Flash has its place. It is constantly evolving and there are things that it does that can be difficult or impossible to do otherwise (e.g. video capture). However, if you choose to deploy Flash then you are choosing to exclude some potential users and it’s hard to say how many as it depends on many factors (your specific audience’s demographics, devices, locations, etc.). Adobe’s figures are not perfect so the only way to reliably know how many users you are turning away is to measure it after the fact. If you don’t need these advanced features then opt for a Native Web Application, or confine Flash objects to pages where it is absolutely required.

Update: There’s a handful of (mostly) developers discussing this article over at Y Combinator’s hacker news, generally in defense of Adobe’s figures (including a comment from Adobe’s own John Dowell). This is not “more flash hate“, rather an observation that things that seem “too good to be true” usually are. Take a quick glance at this screenshot from Adobe.com:

What does it tell you? Did you bother to read beyond the title? Would you have any doubts about using Flash for your site after seeing it? Should you?

Is an iPhone user really not an “Internet viewer” when iPhones ship with a full-featured browser and a data plan? Does “desktop” mean desktop computer or desktop metaphor? Are laptops and netbooks included? What happens if you drop the “mature markets” restriction? What does “over 2 million professionals” really mean anyway? How do the numbers for “real world” tests compare to the survey results?

Basically this clever marketing exercise (which no doubt overcomes the #1 objection to Flash in many instances) raises more questions than it answers.