Major Google App Engine 2.0 update (Java, cron, billing) in the pipeline?

I’ve been doing a lot of Google App Engine development lately and getting increasingly familiar with how it all works. I was curious as to how the SDK (specifically appcfg.py) goes about pushing applications up to the cloud and wasn’t surprised to find that it’s actually quite simple, trading your credentials for a token and then pushing files up one by one (wrapped in a transaction). What I was surprised to find though was a seemingly complete implementation of a feature that has been requested time and time and time and time and time again: cron jobs.

The cron implementation appears to have been checked in by one Dan Sanderson (a Programmer/Writer at Google according to LinkedIn) with subversion r34 (1.1.8) back on the 15 January 2009, though apparently I’m not the first to spot it. Thanks Ross M Karchner for kicking the tyres and working out that while the SDK and APIs are in place and cron entries can be created, the events don’t fire just yet (and Google for the new toy – it’s great to know now what we’ll be able to do soon).

Of course the other big feature that’s been in the pipeline for a long, long, long, long, long time (and that was even apparently publicly confirmed by Google last year) is Java support. Greg Czajkowski’s been a Googler for a while now and his contribution to Sun’s Project Barcelona (including JSR 284: Resource Consumption Management API and JSR 121: Application Isolation API Specification) is just what they need to make Java work in this environment. Knowing that the mystery language is going to be one of the few Google supports internally (C++, Java, Python and JavaScript), that we already have Python and that it’s not likely to be C++ or JavaScript (though they wouldn’t be the first to try server side javscript), that leaves only Java. It’s literally App Engine’s number 1 issue and between it and related issue 102 it has over 2000 votes.

Finally, and arguably most importantly for those of us working on enterprise deployments rather than hobby projects, utility billing is overdue and eagerly awaited.

I’m expecting to see this major update drop some time in the next few weeks, securing Google’s position as one of a small number of major Platform-as-a-Service (PaaS) players. In the mean time we’ve got offline Gmail to play with.

Update: Here’s a comment I left on issue 6 which gives some more technical details:

I just spotted this in the latest SDK release so it looks like cron (among other things) is just around the corner:

$ ls -la google/appengine/cron/
total 272
drwxr-xr-x 12 samj admin 408 17 Jan 12:18 .
drwxr-xr-x 11 samj admin 374 17 Jan 12:18 ..
-r--r--r-- 1 samj admin 27359 15 Jan 03:16 GrocLexer.py
-rw-r--r-- 1 samj admin 25813 17 Jan 12:18 GrocLexer.pyc
-r--r--r-- 1 samj admin 21071 15 Jan 03:16 GrocParser.py
-rw-r--r-- 1 samj admin 18377 17 Jan 12:18 GrocParser.pyc
-r-xr-xr-x 1 samj admin 646 15 Jan 03:16 __init__.py
-rw-r--r-- 1 samj admin 313 17 Jan 12:18 __init__.pyc
-r-xr-xr-x 1 samj admin 1909 15 Jan 03:16 groc.py
-rw-r--r-- 1 samj admin 3050 17 Jan 12:18 groc.pyc
-r-xr-xr-x 1 samj admin 7848 15 Jan 03:16 groctimespecification.py
-rw-r--r-- 1 samj admin 10029 17 Jan 12:18 groctimespecification.pyc

From the comments in the code, here's what you can expect:

A Groc schedule looks like '1st,2nd monday 9:00', or 'every 20 mins'. This
module takes a parsed schedule (produced by Antlr) and creates objects that
can produce times that match this schedule.

A parsed schedule is one of two types - an Interval, and a Specific Time.
See the class docstrings for more.

Extensions to be considered:

allowing a comma separated list of times to run
allowing the user to specify particular days of the month to run

An Interval type spec runs at the given fixed interval. They have two
attributes:
period - the type of interval, either "hours" or "minutes"
interval - the number of units of type period.

A Specific interval is more complex, but define a certain time to run, on
given days. They have the following attributes:
time - the time of day to run, as "HH:MM"
ordinals - first, second, third &c, as a set of integers in 1..5
months - the months that this is valid, as a set of integers in 1..12
weekdays - the days of the week to run this, 0=Sunday, 6=Saturday.

The specific time interval can be quite complex. A schedule could look like
this:
"1st,third sat,sun of jan,feb,mar 09:15"

In this case, ordinals would be [1,3], weekdays [0,6], months [1,2,3] and time
would be "09:15".

Seems I'm not the first to discover this[1], and while the SDK works and cron-related log entries are written
the cron events don't fire just yet.

Sam

1. http://groups.google.com/group/google-appengine/browse_thread/thread/4376bdd02b7bfa3f?pli=1

Virtual Google Search Appliance is here…

I’ve been quiet of late as I’ve been busy racking up the frequent flier miles last month or two, but I’m back (albeit busy) and will endeavour to work through a backlog of posts, even if that means spending less time on them and leaving the Pulitzer Prize to someone else. While I wait for it to download I thought I’d let you know about today’s announcement of a Google Search [Virtual] Appliance (which I’ve been hanging out for, under NDA, since 2006!):

Ever wanted to write code against Google search technology, test your apps, and see how it all integrates into your development environment without having to pay a thing? If you’re an IT administrator, you’ll have that chance with the new virtual edition of the Google Search Appliance. The Google Search Appliance virtual edition is for non-commercial, development purposes only, and gives developers the opportunity to test against the features of the physical Google Search Appliance.

The Google Search Appliance virtual edition provides a free test bed for the Google Search Appliance – our solution for securely searching enterprise content behind the corporate firewall – helping ensure a smooth transition to the production-ready hardware. If your organization is considering adopting an enterprise search solution, the virtual edition platform gives your team the flexibility to build applications against the Google Search Appliance, try different configuration scenarios, explore proofs-of-concept and test the APIs supported by Google enterprise search technology. As part of testing with the virtual edition, you can:

These features might come in handy, particularly if your existing environment contains the array of legacy systems, databases, servers and integration architecture typical of most large organizations. And because it’s free, your boss might give you an extra week’s vacation just for trying it out (don’t quote us on that). You can download Google Search Appliance virtual edition software onto any server that is supported by VMWare virtualization. To learn more and get started, click here. And since we always love feedback, feel free to drop by our developer community or send your thoughts to enterprisegsavirtual@google.com.

Well it’s almost done, but I’m not holding my breath as it wants 3Gb of RAM and I didn’t have the patience for Apple to custom build a 4Gb MacBook for me the other week so I’ve only got 2Gb. I wonder what it would take to get it up and running on a large instance of EC2?

Update: It works (albeit slowly), and it looks surprisingly standard (Linux 2.6.20 – CentOS 5 I think); maybe EC2’s not out of the realm of possibility after all:


Update 2: Having kicked the tires for a while I’m already thinking about the possibilities. Now that the GSA has broken its shackles to expensive, proprietary hardware the world is its’ oyster, and while the license prohibits production use, that’s an administrative rather than a technical hurdle. Locking down the licensing (currently the MAC address is mashed up and digitally signed along with various feature and URL count restrictions, but MAC addresses are malleable with virtual machines) and ensuring performance meets acceptable standards on uncontrollable (virtual) hardware are two obvious (if optional) hurdles. That said, expect to see something happen in this area as the competition is already offering free, downloadable search solutions; indeed I wouldn’t be surprised if there were already virtual GSAs in production.

I’d really like to see Google supported for Australian Online Solutions‘ upcoming CloudSearch product, so getting it up and running on EC2 would be nice even if only to prove the concept. Assuming there’s no non-standard kernel hacks then migration shouldn’t be that hard, and even if there were they would have to be released under the terms of the GPL per my thus far unanswered public request. That said, user selectable kernels (AKIs) and ramdisks (ARIs) on Amazon’s EC2 are currently only available to Amazon and a select few others so said modifications (if any) would have to be injected via a loadable module for now.

Watch this space…

Google Chrome: Cloud Operating Environment

Google Chrome is a lot more than a next generation browser; it’s a prototype Cloud Operating Environment.

Rather than blathering on to the blogosphere about the superficial features of Google’s new Chrome browser I’ve spent the best part of my day studying the available material and [re]writing a comprehensive Wikipedia article on the subject which I intend for anyone to be free to reuse under a Creative Commons Attribution 3.0 license (at least this version anyway) rather than Wikipedia’s usual strong copyleft GNU Free Documentation License (GFDL). This unusual freedom is extended in order to foster learning and critical analysis, particularly in terms of security

My prognosis is that this is without doubt big news for cloud computing, and as a CISSP watching with disdain at the poor state of web browser security big news for the security community too. Here’s why:

Surfing the Internet today is like unprotected sex with strangers; Chrome is the condom of the cloud.

The traditional model of a monolithic browser is fundamentally and fatally flawed (particularly with the addition of tabs). Current generation browsers lump together a myriad trusted and untrusted software (yes, many web sites these days are more software than content) running in the same memory address space. Even with the best of intentions this is intolerable as performance problems in one area can cause problems (and even data loss) in others. It’s the web equivalent of the bad old days where one rogue process would take down the whole system. Add nefarious characters to the mix and it’s like living in a bad neighbourhood with no locks

Current generation browsers are like jails without cells.

Chrome introduces a revolutionary new software architecture, based on components from other open source software, including WebKit and Mozilla, and is aimed at improving stability, speed and security, with a simple and efficient user interface.

The first intelligent thing Chrome does is split each task into a separate process (‘sandbox’), thus delegating to the operating system which has been very good at process isolation since we introduced things like pre-emptive multitasking and memory protection. This exacts a fixed per-process resource cost but avoids memory fragmentation issues that plague long-running browsers. Every web site gets its own tab complete with its own process and WebKit rendering engine, which (following the principle of least privilege) runs with very low privileges. If anything goes wrong the process is quietly killed and you get a sad mac style sad tab icon rather than an error reporting dialog for the entire browser.

Chrome enforces a simple computer security model whereby there are two levels of multilevel security (user and sandbox) and the sandbox can only respond to communication requests initiated by the user. Plugins like Flash which often need to run at or above the security level of the browser itself are also sandboxed in their own relatively privileged processes. This simple, elegant combination of compartments and multilevel security is a huge improvement over the status quo, and it promises to further improve as plugins are replaced by standards (eg HTML 5 which promises to displace some plugins by introducing browser-native video) and/or modified to work with restricted permissions. There are also (publicly accessible) blacklists for warning users about phishing and malware and an “Incognito” private browsing mode.

Tabs deplace windows as first class citizens and can migrate between them like an archipelago of islands.

The user interface follows the simplification trend, and much of the frame or “browser chrome” (hence the name) can be hidden altogether so as to seamlessly blend web applications (eg Gmail) with the underlying operating system. Popups are confined to their source tab unless explicitly dragged to freedom, the “Omnibox” simplifies (and remembers) browsing habits and searches and the “New Tab Page” replaces the home page with an Opera style speed dial interface along with automatically integrated search boxes (eg Google, Wikipedia). Gears remains as a breeding ground for web standards and the new V8 JavaScript engine promises to improve performance of increasingly demanding web applications with some clever new features (most notably dynamic compilation to native code).

Just add Linux and cloud storage and you’ve got a full blown Cloud Operating System (“CloudOS”)

What is perhaps most intersting though (at least from a cloud computing point of view) is the full-frontal assault on traditional operating system functions like process management (with a task manager that allows users to “see what sites are using the most memory, downloading the most bytes and abusing (their) CPU”). Chrome is effectively a Cloud Operating Environment for any (supported) operating system in the same way that early releases of Windows were GUIs for DOS. All we need to do now is load it on to a (free) operating system like Linux and wire it up to cloud storage (ala Mozilla Weave) for preferences (eg bookmarks, history) and user files (eg uploads, downloads) and we have a full blown Cloud Operating System!

Update: Fixed URLs.

Chrome URLs:

Cloud computing and the coming clone wars

All this talk about Dell’s place in the cloud computing ecosystem has got me thinking… I’ve been saying for a long time now that the hardware market will turn into a bloodbath, with squeezes on the server side coming from horizontal scaling/commoditised hardware, multi-core processing, virtualisation, etc. as well as on the client side from cheap computers that we’ll (hopefully) soon be able to purchase by the kilo. It is no surprise then that the traditional vendors are clamouring for their slice of the (next generation) pie. And who better to service our cloud computing needs than the very same people who have been fitting out data centers for decades?

The problem is that a ‘cloud computer’ (on both sides of the fence) is a completely different beast than traditional client/server devices. Rather they are (or at least can be):

  • Smaller, a lot smaller (think cigarette packet)
  • Cheaper, a lot cheaper, like an order of magnitude cheaper (think <$100 before long). They last a lot longer too.
  • Greener, drawing only a few watts instead of hundreds (think 2-5W)

The JackPC pictured here isn’t a great example because it currently lacks a capable browser (though I’ve addressed this with them so maybe they’ll add a recent browser at some point) but you can still run one on the server and access it via terminal services, and it illustrates my point well.

I was in fact so convinced of the trouble ahead for traditional hardware that I even built a prototype of an ultra-cheap PC over 5 years ago now. I was looking for something fast, secure, environmentally friendly and easy to support (which meant absolutely no moving parts) and ended up with a Debian GNU/Linux based design. The idea was that it would have no persistent storage (booting itself over the Internet), offloading as much as it could to ‘the cloud’ (which at the time meant my Xbox cluster running Linux). Unfortunately back then the Mozilla browser was still playing catchup and tools like Google Docs didn’t exist so OpenOffice.org (itself in a sorry state, having just morphed from StarOffice) were required on the client. This arrangement worked nicely on the LAN by offloading much of the work to terminal servers via rdesktop (an RDP client written by a scarily clever young guy called Matt Chapman who worked with me at UNSW). However at home the 256/64k ‘standard’ ADSL Internet connection in Australia just didn’t cut it – people won’t wait 10-20 minutes for their machines to start even if they only need to do so every once in a while (eg to pick up security fixes).

Hang on, did I just say Xbox cluster? Yes, it’s amazing what lengths you’ll go to when you have some time but limited budget (datacenter space was still expensive back then). With servers still going for a few grand a piece and the Xbox selling with a reasonably capable processor, adequate RAM and a relatively sizeable hard drive for only a few hundred it was a no brainer. Apache doesn’t need much in the way of resources (even if the things it hosts including PHP, Perl, Ruby and Python do), nor does infrastructure like mail servers (qmail), dns servers (djbdns), etc. Yes it was a pain having to solder 29-wire modchips into half a dozen devices to bypass the DRM and no doubt they raised some eyebrows in the datacenter where they were hosted, but they did their job well for a long time and were even used for some fairly large jobs. This was the first taste of cloud computing (on a very small scale) and I’ve been hooked ever since.

I’ll leave you with one final thought for today… the picture on the left is one of Google’s first production servers. It now lives in the Computer History Museum and there’s at least one more like it which I’ve seen ‘in the flesh’ at their Mountain View HQ. These are not so different from my Xbox cluster (though the software on these machines made some people extremely rich), and are evidence of the completely different approach needed to build something like the Google platform. The focus now is on size, efficiency, heat, performance per Watt and per dollar, interconnectivity, density, and a bunch of other factors that only really become apparent when you roll up your sleeves and try to do it yourself (The Google Guide to Datacenter Construction is a book you won’t see at your local bookstore any time soon – this is a key component of their ‘secret sauce’).

The thought for the day then is where are the logos? There’s no Intel Inside logo on the JackPC or the CherryPal, nor a HP, IBM or Dell logo on Google’s rack…

Proof Gmail IMAP (Gimap) supports IMAP IDLE

So for those of you with capable mail clients (like OS X Mail.app), here’s proof that IMAP IDLE works for delivering push mail:

$ openssl s_client -connect imap.gmail.com:993 -crlf
* OK Gimap ready for requests from 1.2.3.4 0123456789abcdef
. capability
* CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA XLIST CHILDREN XYZZY
. OK Thats all she wrote! 0123456789abcdef
. login samj@samj.net letmein
. OK samj@samj.net authenticated (Success)
. examine inbox
* FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
* OK [PERMANENTFLAGS ()]
* OK [UIDVALIDITY 2]
* 4498 EXISTS
* 0 RECENT
* OK [UNSEEN 1431]
* OK [UIDNEXT 25141]
. OK [READ-ONLY] inbox selected. (Success)
. idle
+ idling
---mail sent and deleted here---
* 4499 EXISTS
* 4499 EXPUNGE
* 4498 EXISTS

This is invariably why some clients ‘feel’ more responsive than others, and why you should find an IMAP IDLE capable client.

Google Apps – On in 60 seconds

To celebrate today’s Google Apps launch here’s a screencast I whipped up showing just how easy it is with Google Apps to get ‘On in 60 seconds’:

Update: On advice from Dave Girouard (VP, Google Enterprise) we’ve changed the name to ‘On in 60 seconds’ from the somewhat less enthralling ‘Setting up Google Apps’.

Update: Thanks for the amazing response – 20,000 views! Let’s hope Google Apps is as popular as its video!

Update: The music in the video was that of Michael Johns, now one of the favourites in American Idol!