How lobbyists are denying you a voice and destroying democracy

I came across an unsurprising but nonetheless disconcerting revelation today that is gives a very good example of what most of us knew all along: that “public comment” process are routinely subverted by commercial interests, generally at the public’s expense. It comes in the form of a smoking gun courtesy DSL Reports: Who Knew Senior Citizens Hated Net Neutrality?

There is currently an extremely important battle underway over securing Net Neutrality regulations and another where big media are actively attacking (by way of three-strikes policies like HADOPI in France) what is fast becoming a legal right: broadband access (thanks to Finland for getting the ball rolling: Fast Internet access becomes a legal right in Finland).

Us (US?) consumers recently had a big win with the FCC getting on board the Open Internet bandwagon but not afraid to flog a dead horse, industry lobbyists have rolled out an army of puppets parroting their position; that Net Neutrality is somehow opposed to broadband adoption (which could not be further from the truth). In this case it’s the Arkansas Retired Seniors Coalition, purporting to represent (surprise, surprise) retired seniors in Arkansas, ignoring the fact that your average senior quite probably doesn’t know what net neutrality is, let alone care about it!

They do care about Internet access though and as the slowest state in the south all it would take would be a seemingly suitable scapegoat and you’d have pitchforks in the streets. My guess is they don’t even know the position taken by their representatives which makes this letter sent on their behalf at least deceitful:

The problem which such astroturfing is that it makes public opinion both harder to reliably collect and easier to dismiss. Such shenanigans appear far more prevalent in the US than other countries I’ve lived in, but regulations there (e.g. DMCA) tend to flow on to the rest of us eventually so it’s in everyone’s interest to have their say.

There really should be something done about the issue, however most solutions are relatively difficult to enforce. Examples include requiring a statutory declaration component such that egregious abuses can be punished (and to make people think twice about misrepresenting others), or requiring the individuals represented to make an overt act such as signing a petition. Rejecting messages that are too similar, and therefore obviously templates, raises the bar somewhat but does not stop determined attackers.

The long term solution likely comes in the form of digital identity, whereby each individual can be reliably authenticated and the cost of involving them in decisions trends towards zero. As referendums are extremely expensive and inefficient (despite the availability of technology that could put them within reach for routine decision-making) we appoint representatives who we hope will accurately reflect our views on each of the topics. Obviously this is rare – for example your representative might share your views on fiscal policy but reject gay marriage in which case you have to choose what is more important to you.

An arguably better solution is where individuals can take part in all decisions they care about, which is called a direct democracy (or pure democracy), and the use of technology to achieve better representation is a separate but related concept known as e-democracy. We should be paying more attention to both as it’s like we only got half way there by establishing representative democracies in most of the western world.

Crystal ball: Data-only carriers to destroy the telco industry RSN

This is one of those random thoughts that fits in a tweet but deserves a little more explanation. Like most I currently pay around €100 a month for a mobile package that includes some texts, airtime (2+2 hours on and off peak), some data and usually some useless gimmicks (free calls at certain times or to certain phones, etc.). This of course makes it truly impossible to compare apples to apples and I almost feel like choosing the right plan should be a profession (I’m sure there must be businesses that do this for a living).

Under the covers though it’s all just 1’s and 0’s and it’s been that way for a while – Australia turned off it’s analog mobile network (AMPS) while I was still there and like here in Europe uses the Global Standard for Mobiles (GSM). This shares the limited airwaves with timeslices (TDMA) and over in the US they do a similar thing with code (CDMA), probably because TDMA has timing problems when you get out to tens of kilometers (irrespective of the strength of the signal) and the US has a lot of land to cover. Point is that under the covers it’s all data. Of course things have changed a bit since I was helping design Australia’s first digital mobile network – now we’ve got 3G, LTE, WiFi, WiMax, etc. to play with too.

Traditional telephony was what we call “circuit switched”, which means it was about creating a dedicated connection between two endpoints. First these were hardwired, then switched manually by operators, then clicks on the line would operate mechanical switches at the exchange, more recently tones (DTMF) would tell chips what to do and nowdays connections are set up out-of-band over data connections. But it all still revolves around circuits, even though these days we’re not tying up a pair of copper for the duration of the call, rather sending as much data as we need to when we need it (silence often uses little or no bandwidth but then we have to simulate background noise at the other end so as not to confuse the human).

That is to say it’s time we stopped thinking about circuits which tend to be billed by time (after all, the resource could not be shared when you were using it) and start thinking about data (which is typically billed by quantity transferred or bandwidth available). In other words we are paying (generally more) for our communications because of technological limitations that have long since been removed. Even Skype go to great lengths to identify which country you are calling from so as to impose the legacy billing system we are used to (so many cents per minute depending on the country) rather than take advantage of what the Internet has to offer in terms of being unaffected by geography.

Then there’s texts which are an even bigger rort. These were basically an afterthought which are sent out-of-band over the relatively limited control channel – the one that’s used to set up calls and so on (that’s why they take a while to send and why you can jam a phone by sending/receiving too many). Knowing that everything is 1’s and 0’s anyway, did you ever stop to think about how many texts a minute of voice is worth (even using strong compression)? It’s a lot but let’s work it out. Full rate GSM consumes 13Kbps or just shy of 100,000 8-bit characters per minute assuming my maths are correct. Each SMS is 140 8-bit (or 160 7-bit) characters or around 700 texts per minute. In Australia those texts cost $0.25 each so we’re paying $175.00 a minute to consume the bandwidth as texts when we’d pay around $0.50 to consume it as voice. You can see why they love them now, can’t you!

The telcos have been on the gravy train for long enough at our expense and it’s long since been time for the next generation of carrier to take over. There’s a massive opportunity here for someone to enter the market with a data-only service and in doing so destroy the existing industry literally overnight. We’ve already got devices (iPhones, Android) that are more than capable of doing everything we need over data, but which are being deliberately crippled by hardware and software vendors in order to protect the legacy carriers. That’s not to say that Apple and Google are to blame for contracts they are almost certainly forced into by the likes of AT&T, but seeing Google taking the high road while having to concede that “individual operators can request that certain applications be filtered if they violate their terms of service” is disappointing.

Why can’t we have Google Voice on the iPhone? Or use Skype over 3G (without jailbreaking and installing 3G Unrestrictor)? Or open source/open standard SIP telephony for that matter? Why are we sending texts when we have instant messaging? Or dialing in to retrieve voicemails that could just as easily be translated and/or emailed? Why are we paying for silence on the line when we should be paying for bandwidth and/or quantity of data? Why do we pay for minutes at all?

The telcos will tell you it’s to protect their networks, and ultimately to protect you, no doubt from the evils of illegal filesharing, terroristing and child pornography. There’s an element of truth to this (it only takes a few greedy customers to ruin it for the rest and as always 10% of the users use 90% of the traffic), ut there are simple, effective solutions for this too. People will pay more for a premium/priority service and at the end of the day you can always reign in abusers with packet shaping. The fairest mechanism I can think of comes in the form of a logarithmic bandwidth policy whereby the more you use the slower you go, but the point is that there are solutions so this is pure FUD. My “unlimited” data connection was just throttled from 3G+ to 3G speeds at 800Mb and again at 1000 Mb (so much for unlimited), but I’d happily pay more for a more “unlimited” service if it meant I could say goodbye to minutes and texts forever.

It will happen – it’s just a case of when (and where first). Australia’s regularly used as a test market and capped ($99 all you can talk) style plans took over by storm a few years ago, so let’s just help an existing innovative carrier like 3 or a new one altogether teach the incumbents a lesson, with any luck by the time I get back there.

A disturbing taste of the “Digital Wild West”

Dodgy dealings happen all the time but it’s not often you get to see it boiling over into the public arena as we have today. I saw in my newsfeed this morning that GrokLaw had picked up on (Darl, Norris, Bryan Cave Named as Defendants in IP Litigation – The Pelican Brief) a Courthouse News article (Ex-Partner Accused of AIP Trade Secret Theft) about a recently filed complaint by Pelican Equity, LLC against Talos Partners, Darl McBride (of SCO Group fame), Robert V. Brazell (of Overstock.com fame), Stephen L. Norris, Rama Ramachandran and lawfirm Bryan Cave LLP.

It claims a conspiracy to “steal AIP’s proprietary stock loan product” (EQUITAP™, [which] helps investors achieve their financial goals by structuring non-recourse loans using the securities in their portfolio as collateral) and “virtually API’s entire business from API and its founder, Mark Robbins” (Pelican claim to own the relevant rights). It then goes on to explain the whole sorry story of a techie (Robbins) investing four years and apparently all of his money into development of a product, being approached by seasoned businessmen (Brazell and McBride) as potential partners, the subsequent formation of a new business (Talos) and theft of everything from AIP’s products to website to employees (Ramachandran) with the help of AIP’s own lawyers (Bryan Cave LLP) who ultimately blew the whistle with an “astonishing” conflict of interest waiver.

The truly mindblowing part of the whole story though is the Skyline Cowboy site they claim is run by McBride and Brazell: “Finally, in a heinous effort to obliterate AIP’s business and deflect their misdeeds [they] have over approximately the last 60 days littered the Internet with scurrilous postings on http://www.skylinecowboy.com, a website they used primarily for that purpose, and on Yahoo, Twitter and other message boards.

If that’s true it’s like coming back to stab the guy in the carpark after you’ve robbed him of everything he owns. Not only have they posted a video of the guy’s wife being served what they claim is a $109,627 check fraud judgment following a $1,000 bounty as well as a $20,000 reward for arrest and $1,000,000 reward for “full restitution” (save that both appear to be impossible – and likely a result of the claimed highway robbery), but now they’ve offered $30,000 for the true identity of GrokLaw’s Pamela Jones (PJ) who they claim is a “Secret IBM Shill Blogger”. Let’s not be too quick to forget the relationship to SCO Group and their apparently Microsoft funded attacks on IBM, Novell and Linux in general.

Anyway you can see the juicy details for yourself in the filings and if you’re a GrokLaw member, the article and associated discussion (the article has since been updated “Now that I’ve read it, I’ve made the article Members Only for now.” and unfortunately “creation of new accounts has been temporarily disabled“). I have but one question: Who the %!#$ do these cowboys think they are? It’s amazing to think that our society routinely jails people for petty theft while leaving [what appear to be] career conmen free to enrich themselves at others’ expense. Anyway at least Bernie Madoff got his comeuppance… you’ve heard my opinion – what’s yours?

Update: An anonymous commenter just stated that they “know for a fact” that Rob Brazell went to Skyline High School. Sure enough a Google search for skyline and salt lake city (where all the action is) brings the school up first (so the origin of the name fits) and another for brazell and skyline high school returns over 100 results (so some members of the Brazell family(s) went there). If that’s true then it seems the lawsuit is “on the money” (so to speak).

Organising the Internet with Web Categories

In order to scratch an itch relating to the Open Cloud Computing Interface (OCCI) I submitted my first Internet-Draft to the IETF this week: Web Categories (draft-johnston-http-category-header).

The idea’s fairly simple and largely inspired by the work of others (most notably the original HTTP and Atom authors, and a guy down under who’s working on another draft). It defines an intuitive mechanism for web servers to express flexible category information for any resource (including opaque/binary/non-HyperText formats) in the HTTP headers, allowing users to categorise web resources into vocabularies or “schemes” and assign human-friendly “labels” in addition to the computer-friendly “terms”.

This approach to taxonomies was lifted directly from (and is thus 100% compatible with) Atom and is another step closer to being able to render individual resources natively over HTTP rather than encoded and wrapped in XML (which gets unwieldly when you’re dealing with multi-gigabyte virtual machines, as we are with OCCI).

It’s anybody’s guess where the document will go from here – it’s currently marked “Experimental” but with any luck it will pique the interest of the standards and/or semantic web community and go on to live a long and happy life.

Internet Engineering Task Force                              S. Johnston
Internet-Draft                               Australian Online Solutions
Intended status: Experimental                               July 1, 2009
Expires: January 2, 2010


                             Web Categories
                 draft-johnston-http-category-header-00

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 2, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document specifies the Category header-field for HyperText
   Transfer Protocol (HTTP), which enables the sending of taxonomy
   information in HTTP headers.



Johnston                 Expires January 2, 2010                [Page 1]

Internet-Draft              Abbreviated Title                  July 2009


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . . . 3
   2.  Categories  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  The Category Header Field . . . . . . . . . . . . . . . . . . . 4
     3.1.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . 4
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
     4.1.  Category Header Registration  . . . . . . . . . . . . . . . 5
   5.  Security Considerations . . . . . . . . . . . . . . . . . . . . 5
   6.  Internationalisation Considerations . . . . . . . . . . . . . . 5
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
     7.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
     7.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
   Appendix A.  Notes on use with HTML . . . . . . . . . . . . . . . . 7
   Appendix B.  Notes on use with Atom . . . . . . . . . . . . . . . . 7
   Appendix C.  Acknowledgements . . . . . . . . . . . . . . . . . . . 8
   Appendix D.  Document History . . . . . . . . . . . . . . . . . . . 8
   Appendix E.  Outstanding Issues . . . . . . . . . . . . . . . . . . 8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 9































Johnston                 Expires January 2, 2010                [Page 2]

Internet-Draft              Abbreviated Title                  July 2009


1.  Introduction

   A means of indicating categories for resources on the web has been
   defined by Atom [RFC4287].  This document defines a framework for
   exposing category information in the same format via HTTP headers.

   The atom:category element conveys information about a category
   associated with an entry or feed.  A given atom:feed or atom:entry
   element MAY have zero or more categories which MUST have a "term"
   attribute (a string that identifies the category to which the entry
   or feed belongs) and MAY also have a scheme attribute (an IRI that
   identifies a categorization scheme) and/or a label attribute (a
   human-readable label for display in end-user applications).

   Similarly a web resource may be associated with zero or more
   categories as indicated in the Category header-field(s).  These
   categories may be divided into separate vocabularies or "schemes"
   and/or accompanied with human-friendly labels.

   [[ Feedback is welcome on the ietf-http-wg@w3.org mailing list,
   although this is NOT a work item of the HTTPBIS WG. ]]

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, [RFC2119], as
   scoped to those conformance targets.

   This document uses the Augmented Backus-Naur Form (ABNF) notation of
   [RFC2616], and explicitly includes the following rules from it:
   quoted-string, token.  Additionally, the following rules are included
   from [RFC3986]: URI.


2.  Categories

   In this specification, a category is a grouping of resources by
   'term', from a vocabulary ('scheme') identified by an IRI [RFC3987].
   It is comprised of:

   o  A "term" which is a string that identifies the category to which
      the resource belongs.

   o  A "scheme" which is an IRI that identifies a categorization scheme
      (optional).





Johnston                 Expires January 2, 2010                [Page 3]

Internet-Draft              Abbreviated Title                  July 2009


   o  An "label" which is a human-readable label for display in end-user
      applications (optional).

   A category can be viewed as a statement of the form "resource is from
   the {term} category of {scheme}, to be displayed as {label}", for
   example "'Loewchen' is from the 'dog' category of 'animals', to be
   displayed as 'Canine'".


3.  The Category Header Field

   The Category entity-header provides a means for serialising one or
   more categories in HTTP headers.  It is semantically equivalent to
   the atom:category element in Atom [RFC4287].

   Category           = "Category" ":" #category-value
   category-value     = term *( ";" category-param )
   category-param     = ( ( "scheme" "=" <"> scheme <"> )
                      | ( "label" "=" quoted-string )
                      | ( "label*" "=" enc2231-string )
                      | ( category-extension ) )
   category-extension = token [ "=" ( token | quoted-string ) ]
   enc2231-string     = 
   term               = token
   scheme             = URI

   Each category-value conveys exactly one category but there may be
   multiple category-values for each header-field and/or multiple
   header-fields per [RFC2616].

   Note that schemes are REQUIRED to be absolute URLs in Category
   headers, and MUST be quoted if they contain a semicolon (";") or
   comma (",") as these characters are used to separate category-params
   and category-values respectively.

   The "label" parameter is used to label the category such that it can
   be used as a human-readable identifier (e.g. a menu entry).
   Alternately, the "label*" parameter MAY be used encode this label in
   a different character set, and/or contain language information as per
   [RFC2231].  When using the enc2231-string syntax, producers MUST NOT
   use a charset value other than 'ISO-8859-1' or 'UTF-8'.

3.1.  Examples

   NOTE: Non-ASCII characters used in prose for examples are encoded
   using the format "Backslash-U with Delimiters", defined in Section
   5.1 of [RFC5137].




Johnston                 Expires January 2, 2010                [Page 4]

Internet-Draft              Abbreviated Title                  July 2009


   For example:
   Category: dog

   indicates that the resource is in the "dog" category.
   Category: dog; label="Canine"; scheme="http://purl.org/net/animals"

   indicates that the resource is in the "dog" category, from the
   "http://purl.org/net/animals" scheme, and should be displayed as
   "Canine".

   The example below shows an instance of the Category header encoding
   multiple categories, and also the use of [RFC2231] encoding to
   represent both non-ASCII characters and language information.
   Category: dog; label="Canine"; scheme="http://purl.org/net/animals",
             lowchen; label*=UTF-8'de'L%c3%b6wchen";
             scheme="http://purl.org/net/animals/dogs"

   Here, the second category has a label encoded in UTF-8, uses the
   German language ("de"), and contains the Unicode code point \u'00F6'
   ("LATIN SMALL LETTER O WITH DIAERESIS").


4.  IANA Considerations

4.1.  Category Header Registration

   This specification adds an entry for "Category" in HTTP to the
   Message Header Registry [RFC3864] referring to this document:
   Header Field Name: Category
   Protocol: http
   Status: standard
   Author/change controller:
       IETF (iesg@ietf.org)
       Internet Engineering Task Force
   Specification document(s):
       [ this document ]


5.  Security Considerations

   The content of the Category header-field is not secure, private or
   integrity-guaranteed, and due caution should be exercised when using
   it.


6.  Internationalisation Considerations

   Category header-fields may be localised depending on the Accept-



Johnston                 Expires January 2, 2010                [Page 5]

Internet-Draft              Abbreviated Title                  July 2009


   Language header-field, as defined in section 14.4 of [RFC2616].

   Scheme IRIs in atom:category elements may need to be converted to
   URIs in order to express them in serialisations that do not support
   IRIs, as defined in section 3.1 of [RFC3987].  This includes the
   Category header-field.


7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
              Word Extensions: Character Sets, Languages, and
              Continuations", RFC 2231, November 1997.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3864]  Klyne, G., Nottingham, M., and J. Mogul, "Registration
              Procedures for Message Header Fields", BCP 90, RFC 3864,
              September 2004.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, January 2005.

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, January 2005.

   [RFC4287]  Nottingham, M. and R. Sayre, "The Atom Syndication
              Format", RFC 4287, December 2005.

   [RFC5137]  Klensin, J., "ASCII Escaping of Unicode Characters",
              RFC 5137, February 2008.

7.2.  Informative References

   [OCCI]     Open Grid Forum (OGF), Edmonds, A., Metsch, T., Johnston,
              S., and A. Richardson, "Open Cloud Computing Interface
              (OCCI)", .

   [RFC2068]  Fielding, R., Gettys, J., Mogul, J., Nielsen, H., and T.
              Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",



Johnston                 Expires January 2, 2010                [Page 6]

Internet-Draft              Abbreviated Title                  July 2009


              RFC 2068, January 1997.

   [W3C.REC-html401-19991224]
              Raggett, D., Hors, A., and I. Jacobs, "HTML 4.01
              Specification",
              .

   [W3C.WD-html5-20090423]
              Hyatt, D. and I. Hickson, "HTML 5", April 2009,
              .

   [draft-nottingham-http-link-header]
              Nottingham, M., "Web Linking",
              draft-nottingham-http-link-header-05 (work in progress),
              April 2009.

   [rel-tag-microformat]
              Celik, T., Marks, K., and D. Powazek, "rel="tag"
              Microformat", .


Appendix A.  Notes on use with HTML

   In the absence of a dedicated category element in HTML 4
   [W3C.REC-html401-19991224] and HTML 5 [W3C.WD-html5-20090423],
   category information (including user supllied folksonomy
   classifications) MAY be exposed using HTML A and/or LINK elements by
   concatenating the scheme and term:
   category-link = scheme term
   scheme        = URI
   term          = token

   These category-links MAY form a resolveable "tag space" in which case
   they SHOULD use the "tag" relation-type per [rel-tag-microformat].

   Alternatively META elements MAY be used:

   o  where the "name" attribute is "keywords" and the "content"
      attribute is a comma-separated list of term(s)

   o  where the "http-equiv" attribute is "Category" and the "content"
      attribute is a comma-separated list of category-value(s)


Appendix B.  Notes on use with Atom

   Where the cardinality is known to be one (for example, when
   retrieving an individual resource) it MAY be preferable to render the



Johnston                 Expires January 2, 2010                [Page 7]

Internet-Draft              Abbreviated Title                  July 2009


   resource natively over HTTP without Atom structures.  In this case
   the contents of the atom:content element SHOULD be returned as the
   HTTP entity-body and metadata including the type attribute and atom:
   category element(s) via HTTP header-field(s).

   This approach SHOULD NOT be used where the cardinality is guaranteed
   to be one (for example, search results which MAY return one result).


Appendix C.  Acknowledgements

   The author would like to thank Mark Nottingham for his work on Web
   Linking [draft-nottingham-http-link-header] (on which this document
   was based) and to the authors of [RFC2068] for specification of the
   Link: header-field on which this is based.

   The author would like to thank members of the OGF's Open Cloud
   Computing Interface [OCCI] working group for their contributions and
   others who commented upon, encouraged and gave feedback to this
   draft.


Appendix D.  Document History

   [[ to be removed by the RFC editor should document proceed to
   publication as an RFC. ]]

      -00

      *  Initial draft based on draft-nottingham-http-link-header-05


Appendix E.  Outstanding Issues

   [[ to be removed by the RFC editor should document proceed to
   publication as an RFC. ]]

   The following issues are oustanding and should be addressed:

   1.  Is extensibility of Category headers necessary as is the case for
       Link: headers?  If so, what are the use cases?

   2.  Is supporting multi-lingual representations of the same
       category(s) necessary?  If so, what are the risks of doing so?

   3.  Is a mechanism for maintaining Category header-fields required?
       If so, should it use the headers themselves or some other
       mechanism?



Johnston                 Expires January 2, 2010                [Page 8]

Internet-Draft              Abbreviated Title                  July 2009


   4.  Does this proposal conflict with others in the same space?  If
       so, is it an improvement on what exists?


Author's Address

   Sam Johnston
   Australian Online Solutions
   GPO Box 296
   Sydney, NSW  2001

   Email: samj@samj.net
   URI:   https://samj.net/






































Johnston                 Expires January 2, 2010                [Page 9]

					

The browser is the OS (thanks to Firefox 3.5, Chrome 2, Safari 4)

Almost a year ago I wrote about Google Chrome: Cloud Operating Environment and [re]wrote the Google Chrome Wikipedia article, discussing the ways in which Google was changing the game through new and innovative features. They had improved isolation between sites (which is great for security), improved usability (speed dial, tear off tabs, etc.) and perhaps most importantly for SaaS/Web 2.0 applications, vastly improved the JavaScript engine.

Similar features were quickly adopted by competitors including Opera (which Chrome quickly overtook at ~2%) and Firefox (which still has an order of magnitude more users at ~20-25%). Safari is really making waves too at around 1/3-1/2 of the share of Firefox (~8%) and with the recent release of Safari 4 it’s a compelling alternative – especially given it passes the Acid 3 test with flying colours while Firefox 3.5 bombs out at 93/100.

HTML 5 features such as local storage and the video and audio elements are starting to make their way into the new breed of browsers too, though it’s still often necessary to install Google Gears to get advanced offline functionality (e.g. most of the Google Apps suite) up and running. Google have drawn fire by missing the Firefox 3.5 launch and users finding Gears disabled are flocking to the gears-users Google Group to vent their frustrations, some going so far as claiming that “Google is trying to do what it can to push users to Chrome” and asking “Are we watching a proccess of Google becoming customer-deaf Microsoft?”. Let’s just hope it’s ready in time for my travel later this week…

The point is that after the brutal browser wars which stagnated the web for some time (right up until Microsoft opened the floodgates by introducing Ajax), we’re now starting to see some true competition again. Granted Internet Explorer is still a 1,000 pound gorilla at ~65% of market share, but even with a silk shirt in the form of IE 8 and a handful of lame ads it’s still a pig and the target of the vast majority of security exploits on the web. This makes it an an easy sell for any competitor who manages to get a foot in the door (which is unfortunately still the hardest part of the sale).

The decision not to ship IE with Windows 7 in Europe will be interesting as it should draw mainstream attention to the alternatives which will flow on to other markets (as we’ve seen with adoption of “alternative” technology like Linux in the past – not to mention the whole Netbook craze started by OLPC in the third world). However, with the browser being where most of the action is today the operating system has become little more than a life support system for it – an overly thick interface layer between the browser and the hardware. Surely I’m not the only one who finds it curious that while the software component of a new computer is fast approaching 50% of the cost (up from around 10% a decade ago), the heart of the system (the browser) is both absent from Windows 7 and yet freely available (both in terms of beer and freedom)? Something’s gotta give…

Anyway it’s time to stop looking at the features and performance of the underlying operating system, rather the security and scalability of the browser. When was the last time you turned to the operating system anyway, except to fix something that went wrong or do some menial housekeeping (like moving or deleting files)?

Is HTTP the HTTP of cloud computing?

Ok so after asking Is OCCI the HTTP of cloud computing? I realised that the position may have already been filled and that the question was more Is AtomPub already the HTTP of cloud computing?

After all my strategy for OCCI was to follow Google’s example with GData by adding some necessary functionality (a search interface, caching directives, resource-specific attributes, etc.). Most of the heavy lifting was actually being done by AtomPub, thus avoiding a huge amount of tedious and error-prone protocol writing (around 20,000 words of it) – something which OGF and the OCCI working group isn’t really geared up for anyway. This is clearly a workable and well-proven approach as it as been adopted strategically by both Microsoft and Google and also tactically by Salesforce and IBM, among others. Best of all adding things like queries and versioning is a manageable workload while starting from scratch is most certainly not.

But what if there were an easier way? Recall that the problem we are trying to solve is exposing a flexible interface to an arbitrarily large collection of interconnected compute, storage and network resources. We need to be able to describe and manipulate the resources (CRUD), associate them with each other via rich links (e.g. links with attributes like local identifiers – eth0, sda, etc.) and change their state (start, stop, restart, etc.), among other things.

Representational State Transfer (REST)

Actually we’re not talking about exposing the resources themselves (that would be impossible) but various representations of those resources – like Plato’s shadows on the cave walls – this is the “REpresentational” in “REpresentational State Transfer (REST)”. There’s an infinite number of possible representations so it’s impossible to try to capture them all now, but here’s some examples:

  • An Open Virtualisation Format (OVF) serialisation of a compute resource
  • A platform-specific descriptor file (e.g. VMX)
  • A complete archive of the virtual machine with its dependencies (OVA)
  • A graphical image of the console at a given point in time (‘snapshot’)
  • A video stream of the console for archiving/audit purposes (ala Citrix’s Project Iris)
  • The console itself (e.g. SSH, ICA, RDP, VNC)
  • Build documentation (e.g. PDF, ODF)
  • Esoteric enterprise requirements (e.g. NMS configuration)

It doesn’t take a rocket scientist to spot the correlation between this and HTTP’s existing content negotiation functionality (whereby a client can ask for a specific representation of a given resource – e.g. HTML vs PDF) so this is already pretty much solved for us (see HTTP’s Accept: header for the details). For bonus points this information should be exposed in the URI as it’s not always possible or convenient to set headers ala:

Web Linking

But what about the links? As I explained yesterday the web is built on links embedded in HTML documents using the A tag. Atom also provides enhanced linking functionality via the LINK element, where it is also possible to specify content types, languages, etc. In this case however we want to allow resources to be arbitrary types and more often than not we won’t have the ability to link within the payload itself. This leaves us with two options: put the links in the payload anyway by relying on a meta-model like Atom (or one we roll ourselves) or find some way to represent them within HTTP itself.

Enter HTTP headers which are also extensible and, as it turns out, in the process of being extended (or at least refined) to handle this very requirement by fellow down under, Mark Nottingham. See the “Web Linking” IETF Internet-Draft (draft-nottingham-http-link-header, at the time of writing version 05) for the nitty gritty details and the ietf-http-wg list for some current discussions. Basically it clarifies the existing Link: headers and the result looks something like this:

Link: <http://example.com/TheBook/chapter2>; rel="previous"; title="previous chapter">

The Link: header itself is also extensible so we can faithfully represent our model by adding e.g. the local device name when linking storage and network resources to compute resources and other requisite attributes. It would be helpful if the content-type were also specified (Atom allows for multiple links of the same relation provided the content-type differs for example) but language is already covered by HTTP (it doesn’t seem useful to advertise French links to someone who already asked to speak English).

It’s also interesting to note that earlier versions of the HTTP RFCs actually [poorly] specified both the Link: headers as well as LINK and UNLINK methods for maintaining links between web resources. John Pritchard had a crack at clarification in the Efficient HyperLink Maintenance for HTTP I-D but like most I-Ds this one seems to have died after 6 months, and with it the methods themselves. It seems to me that adding HTTP methods at this time is a drastic (and almost certainly infeasible) action, especially for something that could just as easily be accomplished via headers ala Set-Cookie: (too bad the I-D doesn’t specify how to add/delete/modify links!). In the simplest sense a Link: header appearing in a PUT or POST could replace the existing one(s) but something more elegant for acting on individual links would be nice – probably a discussion worth having on the ietf-http-wg list.

Organisation of Information

Looking back to Atom for a second we’re still missing some key functionality:

  • Atom id -> HTTP URL
  • Atom updated -> HTTP Last-Modified: Header
  • Atom title and summary -> Atom/HTTP Slug: Header or equivalent
  • Atom link -> HTTP Link: Header
  • Atom category -> ???

Houston, we have a problem. OCCI use cases range from embedded hypervisors exposing a single resource to a single entry-point for an entire enterprise or the “Great Global Grid” – we need a way to organise, categories and search for the information, likely including:

  • Free text search via a Google-style “?q=firewall” syntax
  • Taxonomy via categories (already done for Atom) for things like “Operating System” and “Data Center”
  • Folksonomy via [user] tags (already done for Atom and bearing in mind that tag spaces are cool) for things like “testlab”

Fortunately the good work already done in this area for Atom would be realatively easy to port to a Category: HTTP header, following the Link: header example above. In the mean time a standard search interface (including category support) is trivial and thanks to Google, already done.

Structured Data Formats

HTML also resolves another pressing issue – what format to use for submitting key-value pairs (which constitutes a large part of what we need to do with OCCI). It gives us two options:

The advantages of being able to create a resource from a web form simply by POSTing to the collection of resources (e.g. http://example.com/compute), and with HTML 5 by PUTting the resource in place directly (e.g. http://example.com/compute/<uuid&gt;) are immediately obvious. Not only does this help make the human and programmable web one and the same (which in turn makes it much easier for developers/users to kick the tyres and understand the API) but it means that scripting even advanced tasks with curl/wget would be trivial. Plus there’s no place for time-wasting religious arguments about angle brackets (XML) over curly braces (JSON).

RESTful State Machines

Something else which has not sat well with me until I spent the weekend ingesting RESTful Web Services book (by Leonard Richardson and Sam Ruby) was the “actuator” concept we picked up from the Sun Cloud APIs. This breaks away from RESTful principles by exposing an RPC-style API for triggering state changes (e.g. start, stop, restart). Granted it’s an improvement on the alternative (GETting a resource and PUTting it back with an updated state) as Tim Bray explains in RESTful Casuistry (to which Roy Fielding and Bill de hÓra also responded), but it still “feels funky”. Sure it doesn’t make any sense to try to “force” a monitored status to some other value (for example setting a “state” attribute to “running”), especially when we can’t be sure that’s the state we’ll get to (maybe there will be an error or the transition will be dependent on some outcome over which we have no control). Similarly it doesn’t make much sense to treat states as nouns, for example adding a “running” state to a collection of states (even if a resource can be “running” and “backing up” concurrently). But is using URLs as “buttons” representing verbs/transitions the best answer?

What makes more sense [to me] is to request a transition and check back for updates (e.g. by polling or HTTP server push). If it’s RESTful to POST comments to an article (which in addition to its own contents acts as a collection of zero or more comments) then POSTing a request to change state to a [sub]resource also makes sense. As a bonus these can be parametrised (for example a “resize” request can be accompanied with a “size” parameter and a “stop” request sent with clarification as to whether an “ACPI Off” or “Pull Cord” is required). Transitions that take a while, like “format” on a storage resource, can simply return HTTP 201 Accepted so we’ve got support for asynchronous actions as well – indeed some requests (e.g. “backup”) may not even be started immediately. We may also want to consider using something like Post Once Exactly (POE) to ensure that requests like “restart” aren’t executed repeatedly and that we can cancel requests that the system hasn’t had a chance to deal with yet.

Exactly how this should look in terms of URL layout I’m not sure (perhaps http://example.com/<resource>/requests) but being able to enumerate the possible actions as well as acceptable parameters (e.g. an enum for variations on “stop” or a range for “resize”) would be particularly useful for clients.

Collections

This is all well and good for individual resources, but collections are still a serious problem. There are many use cases which involve retrieving an arbitrarily large number of resources and making a HTTP request for each (as well as requests for enumeration etc.) doesn’t make sense. More importantly, it doesn’t scale – particularly in enterprise environments where requests via proxies and filters can suffer from high latency (if not low bandwidth).

One potential solution is to strap multiple HTTP message entities together as a multipart document, but that’s hardly clean and results in some hairy coding on the client side (e.g. manual manipulation of HTTP messages that would otherwise be fully automated). The best solution we currently have for this problem (as evidenced by widespread deployment) is AtomPub so I’m still fairly sure it’s going to have to make an appearance somewhere, even if it doesn’t wrap all of the resources by default.

rel=shortlink: url shortening that really doesn’t hurt the internet

Inspired primarily by the fact that the guys behind the RevCanonical fiasco are still stubbornly refusing to admit they got it wrong (the whole while arrogantly brushing off increasingly direct protests from the standards community) I’ve whipped up a Google App Engine application which reasonably elegantly implements rel=shortlink: url shortening that really doesn’t hurt the internet:

http://rel-shortlink.appspot.com

It works just like TinyURL and its ilk, accepting a URL and [having a crack at] shortening it. It checks both the response headers and (shortly) the HTML itself for rel=shortlink and if they’re not present then you have the option of falling back to a traditional service (the top half a dozen are preconfigured or you can specify your own via the API’s “fallback” parameter).

An interesting facet of this implementation is the warnings it gives if it encounters the similar-but-ambiguous short_url proposal and the fatal errors it throws up when it sniffs out the nothing-short-of-dangerous rev=canonical debacle. Apparently people (here’s looking at you Ars Technica and Dopplr) felt there was no harm in implementing these “protocols”. Now there most certainly is.

Here’s the high level details (from the page itself):

Who
A community service by Sam Johnston (@samj / s…@samj.net) of Australian Online Solutions, loosely based on a relatively good (albeit poorly executed) idea by some some web developers purporting to “save the Internet” while actually hurting it.
What
A mechanism for webmasters to indicate the preferred short URL(s) for a given resource, thereby avoiding the need to consult a potentially insecure/unreliable third-party for same. Resulting URLs reveal useful information about the source (domain) and subject (path):
http://tinyurl.com/cgy9pu » http://purl.org/net/shortlink
Where
The shortlink Google Code project, the rel-shortlink Google App Engine application, the #shortlink Twitter hashtag and coming soon to a client or site near you.
When
Starting April 2009, pending ongoing discussion in the Internet standards community (in the mean time you can also use http://purl.org/net/shortlink in place of shortlink).
Why
Short URLs are useful both for space constrained channels (such as SMS and Twitter) and also for anywhere URLs need to be manually entered (e.g. when they are printed or spoken). Third-party shorteners can cause many problems, including link rot, performance problems, outages and privacy & security issues.
How
By way of <link rel="shortlink"> HTML elements and/or Link: ; rel=shortlink HTTP headers.

So have at it and let me know what you think. The source code is available under the AGPL license for those who are curious as to how it works.

Introducing rel=”shortlink”: a better alternative to URL shorteners

Yesterday I wrote rather critically about a surprisingly successful drive to implement a deprecated “rev” relationship. This developed virtually overnight in response to the growing “threat” (in terms of linkrot, security, etc.) of URL shorteners including tinyurl.com, bit.ly and their ilk.

The idea is simple: allow the sites to specify short URLs in the document/feed itself, either automatically ([compressed] unique identifier, timestamp, “initials” of the title, etc.) or manually (using a human-friendly slug). That way, when people need to reference the URL in a space constrained environment (e.g. microblogging like Twitter) or anywhere they need to be manually entered (e.g. printed or spoken) they can do so in a fashion that will continue to work so long as the target does and which reveals information about the content (such as its owner and a concise handle).

Examples of such short URLs include:

The idea is sound but the proposed implementation is less so. There is (or at least was) provision for “rev”erse link references but these have been deprecated in HTML 5. There is also a way of hinting the canonical URI by specifying a rel=”canonical” link. This makes a lot of sense because often the same document can be referred to by an infinite number of URIs (e.g. in search results, with sort orders, aliases, categories, etc.). Combine the two and you’ve got a way of saying “I am the canonical URI and this other URI happens to point at me too”, only it can only ever (safely) work for the canonical URL itself and it doesn’t make sense to list one arbitrary URL when there could be an infinite number.

Another suggestion was to use rel=”alternate shorter” but the problem here is that the content should be identical (except for superficial formatting changes such as highlighting and sort order), while “alternate” means “an alternate version of the resource” itself – e.g. a PDF version. Clients that understand “alternate” versions shoult not list the short URL as the content itself is (usually) the same.

Ben Ramsay got closest to the mark with A rev=”canonical” rebuttal but missed the “alternate” problem above, nonetheless suggesting a new rel=”shorter” relation. Problem there is the “short” URI is not guaranteed to be “shortest” or indeed even “shorter” – it still makes sense, for example, to specify a “short” URI of http://example.com/promo to a user viewing http://example.com/123 because the longer “short” URI conveys information about the content in addition to its host.

Accordingly I have updated WHATWG RelExtensions and will shortly submit the following to the IANA IESG for addition to the Atom Link Relations registry:

Value:
shortlink (http://purl.org/net/shortlink)

Description:
A short URI that refers to the same document.

Expected Display Characteristics:
This relation may be used as a concise reference to the document. It will
typically be shorter than other URIs (including the canonical URI) and may
rely on a [compressed] unique identifier or a human readable slug. It is
useful for space constrained environments such as email and microblogs as
well as for URIs that need to be manually entered (e.g. printed, spoken).
The referenced document may differ superficially from the original (e.g.
sort order, highlighting).

Security Considerations:
Automated agents should take care when this relation crosses administrative domains (e.g., the URI has a different authority than the current document). Such agents should also avoid circular references by resolving only once.

Note that in the interim “http://purl.org/net/shortlink” can be used. Bearing in mind that you should be liberal in what you accept, and conservative in what you send, servers should use the interim identifier for now and clients should accept both. Nobody should be accepting or sending rev=”canonical” or rel=”alternate shorter” given the problems detailed above.

Update: It seems there are still a few sensible people out there, like Robert Spychala with his Short URL Auto-Discovery document. Unfortunately he proposes a term with an underscore (short_url) when it should be a space and causes the usual URI/URL confusion. Despite people like Bernhard Häussner claiming that “short_url is best, it’s the only one that does not sound like shortened content“, I don’t get this reading from a “short” link… seems pretty obvious to me and you can always still use relations like “abstract” for that purpose. In any case it’s a valid argument and one that’s easily resolved by using the term “shortcutlink” instead (updated accordingly above). Clients could fairly safely use any link relation containing the string “short”.

Update: You can follow the discussion on Twitter at #relshortcut, #relshort and #revcanonical.

Update: I forgot to mention again that the HTTP Link: header can be used to allow clients to find the shortlink without having to GET and parse the page (e.g. by doing a HEAD request):

Link: <http://example.com/promo> rel="shortlink"

Update: Both Andy Mabbett and Stan Vassilev also independently suggested rel=shortcut, which leads me to believe that we’re on a winner. Stan adds that we’ve other things to consider in addition to the semantics and Google’s Matt Cutts points out why taking rather than giving canonical-ness (as in RevCanonical) is a notoriously bad idea.

Update: Thanks to the combination of Microsoft et al recommending the use of “shortcut icon” for favicon.ico (after stuffing our logs by quietly introducing this [mis]feature) and HTML link types being a space separated list (thanks @amoebe for pointing this out – I’d been looking at the Atom RFCs and assuming they used the single link type semantics), the term “shortcut” is effectively scorched earth. Not only is there a bunch of sites that already have “shortcut” links (even if the intention was that “shortcut icon” be atomic), but there’s a bunch of code that looks for “shortcut”, “icon” or “shortcut icon”. FWIW HTML 5 specifies the “icon” link type. Moral of the story: get consensus before implementing code.

As I still have problems with the URI/URL confusion (thus ruling out “shorturl”) but have come around to the idea that this should be a noun rather than an adjective, I now propose “shortlink” as a suitable, self-explanatory, impossible-to-confuse term.

Update: I’ve created a shortlink Google Group and kicked off a discussion with a view to reaching a consensus. I’ve also created a corresponding Google Code project and modified the shorter links WordPress plugin to implement shortlinks.

rev=”canonical” considered harmful (complete with sensible solution)

Sites like http://tinyurl.com/ provide a very simple service: turning unwieldly but information rich URLs like https://samj.net/2009/04/open-letter-to-community-regarding-open.html into something more manageable like http://tinyurl.com/ceze29. This was traditionally useful for emails with some clients mangling long URLs but it also makes sense for URLs in documents, on TV, radio, etc. (basically anywhere a human has to manually enter it). Shorteners are a dime a dozen now – there’s over 90 of them listed here alone… and I must confess to having created one at http://tvurl.com/ a few years back (the idea being that you could buy a TV friendly URL). Not a bad idea but there were other more important things to do at the time and I was never going to be able to buy my first island from the proceeds. Unfortunately though there are many problems with adding yet another layer of indirection and the repurcussions could be quite serious (bearing in mind even the more trustworthy sites tend to come and go).

So a while back I whipped up a thing called “springboard” for Google Apps/AppEngine (having got bored with maintaining text files for use with Apache’s mod_rewrite) which allowed users to create redirect URLs like http://go.example.com/promo (and which was apparently a good idea because now Google have their own version called short links). This is the way forward – you can tell at a glance who’s behind the link from the domain and you even get an idea of what you’re clicking through to from the path (provided you’re not being told fibs). When you click on this link you get flicked over to the real (long) URL with a HTTP redirect, probably a 301 which means “Moved Permanently”, so the browsers know what’s going on too. If your domain goes down then chances are the target will be out of action too (much the same story as with third-party DNS) so there’s a lot less risk. It’s all good news and if you’re using a CMS like Drupal then it could be completely automated and transparent – you won’t even know it’s there and clients looking for a short URL won’t have to go ask a third party for one.

So the problem is that nowdays you’ve got every man and his dog wanting to feed your nice clean (but long) URLs through the mincer in order to post them on Twitter. Aside from being a security nightmare (the resulting URLs are completely opaque, though now clients like Nambu are taking to resolving them back again!?!), it breaks all sort of things from analytics to news sites like Digg. Furthermore there are much better ways to achieve this. If you have to do a round trip to shorten the URL anyway, why not ask the site for a shorter version of its canonical URL (that being the primary or ‘ideal’ URL for the content – usually quite long and optimised for SEO)? In the case of Drupal at least every node has an ID so you can immediately boil URLs down to http://example.com/node/123, http://example.com/123 or even use something like base32 to get even shorter URLs like http://example.com/3R.

So how do we express this for the clients? The simplest way is to embed LINK tags into the HEAD section of the HTML and specify a sensible relation (“rel”). Normally these are used to specify alternative versions of the content, icons, etc. but there’s nothing to say that for any given URL(s) the “short” url is e.g. http://example.com/3R. That’s right, rel=”short”, not rel=”alternate shorter” or other such rubbish (“alternate” refers to alternate content, usually in a different mime-type, not just an alternate URL – here the content is likely to be exactly the same). It can be performance optimised somewhat too by setting an e.g. X-Rel-Short header so that users (e.g. Twitter clients) can resolve a long URL to the preferred short URL via a HTTP HEAD request, without having to retrieve and parse the HTML.

Another even less sensible alternative being peddled by various individuals (and being discussed here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and of course here) is [ab]using the rightly deprecated and confusing rev attribute ala rev=”canonical”. Basically this is saying “I am the authorative/canonical URL and this other URL happens to point here too”, without saying anything whatsoever about the URL itself actually being short. There could be an infinite number of such inbound URLs and this only ever works for the one canonical URL itself. Essentially this idea is stillborn and I sincerely hope that when people come back to work next week it will be promptly put out of its misery.

So in summary someone’s got carried away and started writing code (RevCanonical) without first considering all the implications. Hopefully they will soon realise this isn’t such a great idea after all and instead get behind the proposal for rel=”short” at the WHATWG. Then we can all just add links like this to our pages:

<link href=”http://example.com/promo&#8221; rel=”short”>

Incidentally I say “short” and not “shorter” because the short URL may not in fact be the shortest URL for a given resource – “http://example.com/3R&#8221; could well also map back to the same page but the URL is meaningless. And I leave out “alternate” because it’s not alternate content, rather just an alternate URL – a subtle but significant difference.

Let’s hope sanity prevails…

Update: The HTTP Link: header is a much more sensible solution to the HTTP header optimisation:

Link: <http://example.com/promo>; rel="short"

Towards a Flash free YouTube killer (was: Adobe Flash penetration more like 50%)

A couple of days ago I wrote about Why Adobe Flash penetration is more like 50% than 99%, which resulted in a bunch of comments as well as a fair bit of discussion elsewhere including commentary from Adobe’s own John Dowdell. It’s good to see some healthy discussion on this topic (though it’s a shame to see some branding it “more flash hate” and an AC poster asking “How much did M$ pay you for this”).

Anyway everyone likes a good demonstration so I figured why not create a proof-of-concept YouTube killer that uses HTML 5’s video tag?

Knowing that around 20% of my visitors already have a subset of HTML 5 support (either via Safari/WebKit or Firefox 3.1 beta), and that this figure will jump to over 50% shortly after Firefox 3.1 drops (over 50% of my readers use Firefox and over 90% of them run the most recent versions), I would already be considering taking advantage of the new VIDEO tag were I to add videos to the site (even though, as a Google Apps Premier Edition user I already have a white label version of YouTube at http://video.samj.net/).

Selecting the demo video was easy – my brother, Michael Johns, did a guest performance on American Idol last Wednesday and as per usual it’s already popped up on YouTube (complete with a HD version). Normally YouTube use Flash’s FLV codec but for HD they sensibly opted for H.264 which is supported by Safari (which supports anything QuickTime supports – including Ogg Vorbis for users with Perian installed). Getting the video file itself is just a case of browsing to the YouTube page, going to Window->Activity and double clicking the digitally signed link that looks something like ‘http://v4.cache.googlevideo.com/videoplayback‘ which should result in the video.mp4 file being downloaded (though now Google are offering paid downloads they’re working hard to stop unsanctioned downloading).

On the other hand Firefox 3.1 currently only supports Ogg Vorbis for licensing/patent reasons as even Reasonable and Non-Discriminatory (RAND) licensing is unreasonable and discriminatory for free and open source software. Unfortunately the W3C working group infamously removed a recommendation that implementors ‘should’ support Ogg Vorbis and Theora for audio and video respectively. Currently a codec recommendation is conspicuously absent from the HTML 5 working draft. So what’s a developer to do but make both Ogg and H.264 versions available? Fortunately transcoding MP4 to OGG (and vice versa) is easy enough with VLC, resulting in a similar quality but 10% smaller file (video.ogg).

The HTML code itself is quite straightforward. It demonstrates:

  • A body onLoad function to switch to Ogg for Firefox users
  • YouTube object fallback for unsupported browsers (which in turn falls back to embed)
  • A simple JavaScript Play/Pause control (which could easily be fleshed out to a slider, etc.)
  • A simple JavaScript event handler to show an alert when the video finishes playing
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en”>
<head>
<title>Towards a Flash free YouTube killer…</title>
</head>

<!– Basic test for Firefox switches to Ogg Theora –>
<!– TTest could be arbitrarily complex and/or run on the server side –>
<body onLoad=”if (/Firefox/.test(navigator.userAgent)){ document.getElementsByTagName(‘video’)[0].src = ‘video.ogg’; }”>
<h1>Michael Johns &amp; Carly Smithson – The Letter</h1>
<p>(Live At American Idol 02/18/2009) HD
(from <a href=”http://www.youtube.com/watch?v=LkTCFo8XfAc”>YouTube</a&gt;)</p>

<!– Supported browsers will use the video code and ignore the rest –>
<video src=”video.mp4″ autoplay=”true” width=”630″ height=”380″>
<!– If video tag is unsupported by your browser legacy code used –>

</video>

<!– Here’s a script to give some basic playback control –>
<script>
function playPause() {
var myVideo = document.getElementsByTagName(‘video’)[0];
if (myVideo.paused)
myVideo.play();
else
myVideo.pause();
}
</script>
<p><input type=button onclick=”playPause()” value=”Play/Pause”></p>

<!– Here’s an event handler which will tell us when the video finishes –>
<script>
myVideo.addEventListener(‘ended’, function () {
alert(‘video playback finished’)
} );
</script>
<p>By <a href=”https://samj.net/”>Sam Johnston</a> of
<a href=”http://www.aos.net.au/”>Australian Online Solutions</a></p>
</body>
</html>

This file (index.html) and the two video files above (video.mp4 and video.ogg) are then uploaded to Amazon S3 (at http://media.samj.net/) and made available via Amazon CloudFront content delivery network (at http://media.cdn.samj.net/). And finally you can see for yourself (bearing in mind that to keep the code clean no attempts were made to check the ready states so either download the files locally or be patient!):

Towards a Flash free YouTube killer…