Thursday, November 13, 2008

Another interesting excercise: MapTz a GAE-hosted service to enhance Google Maps

The rains continue, actually it is almost slush coming down here this morning. This week spent quite some time to develop another tiny service to solve a single problem, MapTz.

The problem was that in Zipiko we often need the timezone of particular locations. We use GeoNames a lot which nicely gives us detailed location information, including the timezone. Alas, GeoNames can be slow at times and sometimes is actually down. Google Maps offers a similar service and seems faster and more reliable. It has its cons though, and one is that it does not provide the timezone for geocoded locations. With a means to get the time zone for a given location we would be able to use either GeoNames or Google Maps and even switch automatically from one to the other based upon responsiveness. So the issue was how to get the time zone when given coordinates and a bit of other info.

Some googling resulted in a couple of services that seemed to be able to return a UTC-offset for a given point, but that I really wanted a pointer in the Olsen database of timezones. This db is implemented in most operating systems and environments, often known as tz, and it allows effective localization of a given UTC time, taking daylight saving, historical changes and what have you into account. So here is what I ended up doing.

It turns out there is a very nice computer readable map with all the worlds time zones, at MapStars. Here the time zones are the true zones on the world map where clocks are on the same time, i.e. UTC-offset. The map is in KML, an XML based format, so can be parsed by a computer program. Essentially one ends up with a set of polygons with for each polygon an associated offset from UTC. The next ingredient is an algorithm that determines if a point is inside a polygon. I used a simple Python implementation of the ray-casting algorithm.
Now within one UTC-offset region, a "band" on the world map, there are many actual time zones (in the Olsen sense). To find the right one I ended up using knowledge of the country(-code) of the point. We can look for all time zones that are in use in a particular country and then pick the one with a UTC-offset that matches the offset of the found region. This worked, but I needed this code to run on the highly distributed Google App Engine and that required a couple of tricks as GAE, rightfully, restricts the amount of resources available to an application.

First, the set of polygons is a largish data structure. I spent most time in figuring out how to set up this data structure and make it available quickly to each (HTTP) request. Parsing the KML file takes 4-5 seconds on my local machine so you really want to do that only once. GAE offers essentially 2 ways to make some data available to any server that ends up running your app to serve a request: a database and a memory cache. The polygon set was too large for the memory cache, and fetching largish things from that cache is , understandably, not very fast. Each polygon could easily be stored in the db but there over 70 of them, some quite large/complex, and they might all have to be fetched for a request. A third approach is to use a global variable to hold the polygons. Such variable will not be available to all possible instances of the application but it does persist between subsequent request to the same instance, if those requests follow each other within a short period.
I went with this approach, as follows. The code parses the KML file "off-line" and creates python code that instantiates the polygons and places them in a global variable. When running as a GAE web app the code simply imports that python code. One more complication was that global variables are limited in memory foot print; as such a good feature. The polygon set went a bit over that size. Another issue was that the python file to instantiate the polygons was too large. To overcome both issues I split the set in chunks of 10 regions with for each set a global and a python file and zipped the files into one. Instantiation (to serve the "first" request) only takes a couple of hundred milliseconds and now I had something that worked quite well.

So time for a couple of more optimizations. First if a country only uses one time zone, there is no need to search for the right polygon. Second, when creating the polygons, i.e. looping over the coordinates that bound the region, I could record the northern-, southern, easter-, and wester-most points of that region. If a given point is not within those 4 boundary points it certainly is not within the polygon. These optimizations are in the current service and response times and use of computational quota are quite satisfactory. One more optimization that probably would have a big impact is to move from floating points to integers for all coordinates. I may do that one of these days and update this blog with the result. Meanwhile feel free to use the service if you have a need for getting those timezones!

Update: as was to be expected the above mentioned optimization to use integers indeed really speeds thing up. The GAE logs report an approx. 10 fold speed increase and likewise 10 times less use of computational cycles.

1 comment:

Anonymous said...

this post is very usefull thx!