Server-side mapping
We have several projects that involve processing large geospatial datasets
(geo-data) and displaying them on maps. These projects present some interesting
technical challenges involving the storage, transfer and processing of
geo-data. This post outlines some of bigger challenges we have encountered and
our corresponding solutions.
The challenge
In the past we have used the GMap and
OpenLayers libraries and their equivalent Drupal
modules on our mapping projects. They are effective solutions when you have
a small or even moderately sized collection of entities containing some simple
geodata (points, lines, polygons) that you want to present as vector overlays
on a map. Unfortunately they tend to fall apart fast when you attempt them with
larger datasets. There are two main reasons for this:
-
Geospatial data can be large, particularly as we tend to encode it in
text-based formats such as WKT or GeoJSON when we are sending it to a web
browser. The larger the data, the longer it takes to transfer from server to
client. -
The information being sent is raw data which means that the client needs to
parse and process the data before rendering it on the screen. The more data
there is, the longer this takes.
Making things worse, the geo-data is often sent at the beginning of the html
document (via Drupal.settings or similar). Most browsers will wait until they
have downloaded and parsed this data before they begin to render the rest of
the page, increasing the delay.
As a result of the above, it doesn't take much to have a serious negative impact
on page load times and little more to actually crash your visitor's browser.
Heavy lifting server-side
A good solution to these issues is to process and render the geo-data as image
tiles on the server. Tiles can then be cached and served to the client when
requested and the data is only rendered whenever it is changed instead of each
time the page is loaded. Bandwidth is also reduced as the image tiles are
relatively consistent in size regardless of the complexity or amount of data
used to produce them.
As a demonstration we have created two maps containing some sample road data:
- The
first loads the data from an external GeoJSON file that is downloaded by
the browser, parsed and rendered as a vector layer.
- The
second map shows the same data but served as a set of tiles that have been
generated on the server.
I recommend testing these examples in a variety of browsers as their
performance varies on the different platforms - particularly for the first
example.
There are several components involved in a server-side tile rendering pipeline.
They can be loosely categorised under storage, rendering, and caching.
Storage
Geo-data can be stored in a variety of places and formats each with it's own
advantages. Here are some that are common:
ESRI Shapefiles (commonly known as shapefiles) are a popular file format for
storing and transfering geo-data. They are comprised of a .shp file and often
bundled in a zip file with a collection of other files containing related
information.
Well known text (WKT) & GeoJSON
WKT and GeoJSON are formats used to encode geospatial data in plain text,
making them convenient to read and parse at the expense of increasing file
size.
GeoJSON is a relatively new format. As it is just JSON and therefore easily
parsed in Javascript, it is an increasingly popular format to use when passing
raw data to browser-based clients.
PostGIS is a spatial database extension to the PostgreSQL database management system. The relational database
gives you the ability to index, query, and manipulate your data with SQL and an
extensive API of geospatial functions.
In Drupal it's common to store your data in fields attached to entities using
the Geofield module; however the data is stored formatted as WKT in a column of type LONGTEXT and
when compared to PostGIS it not very flexible.
We have therefore developed Sync
PostGIS which allows site developers
to flag entity types with geofields to have their data mirrored in a PostGIS
database. The source data in Drupal's main database is retained and treated as
best-copy, but all changes (insert, update and delete) are reflected in the
PostGIS version. This gives us the ability to utilize PostGIS's rich geospatial
features on our Drupal-managed geo-data!
Rendering
Once we have our raw geo-data stored somewhere we need a method of converting
it into the images that we will display on our maps. Mapnik is an excellent
tool for the job.
Mapnik is an open source C++ library designed to generate map images from
a variety of data sources and configurable style rules. Language bindings are
available for Python and Javascript (Node.js) as well as an XML-based
stylesheet format.
TileMill is a desktop application for creating web maps. It is developed by
Development Seed to complement their MapBox
service. Powered by Mapnik and Node.js it allows users to define style rules
using a CSS-like language called CartoCSS. With each change, the rules and data
sources are passed to Mapnik and a preview map is rendered giving immediate
feedback.
TileMill's main output will render tiles and package them in the MBTiles
format. However it can also be used to generate a Mapnik XML stylesheet which
can be passed to Mapnik by other applications to render tiles.
MapBox has a great collection of resources to get you up and running with
TileMill. I recommend starting with their crash
course.
Caching
So far, we have resolved the bandwidth issues discussed at the beginning of
this post by rendering our data into tiles on the server with Mapnik. This also
alleviates the visitor's web browser from the strain of processing large
amounts of raw geo-data. However generating tiles on the server is also
a resource-intensive process; depending on the area and zoom levels you wish to
cover, rendering a set of tiles at once can take anywhere from a few seconds to
more than a week.
Obviously we don't want to be rendering tiles from scratch with every request.
Instead it is much more efficient to cache the tiles somewhere after they have
been rendered and serve requests directly from the cache, only resorting to
rendering when a cached tile doesn't exist. There are many ways to cache tiles
on your server. Here are some methods that we use:
MBTiles is a file-format specification pioneered by Development Seed. It is
essentially a SQLite database containing a whole set of rendered map tiles.
Known as tilesets, these files are portable and lightweight and can be
generated by TileMill. They are great for caching base layers or layers
comprised of data that doesn't change frequently. However they require tiles to
be rendered in advance, making them less useful for maps covering large areas
and zoom levels, or data sources that often require updating.
File system
Map tiles are individual image files, usually 256x256 pixels in dimension and
rendered in a compressed image format such as .png. In most situations storing
them directly on a file system is satisfactory.
If you are expecting a lot of requests concurrently, you may want to avoid the
file system and cache tiles in memory. Memcache or similar systems are made for
this task.
All together
There are a plenty of options available for tile servers including
TileCache, TileStache,
TileLite,
TileStream and Mod Tile. We have been using TileStache
as it has an excellent balance of features and simplicity.
TileStache is a server application that handles requests for tiles and serves
and caches tiles generated from Mapnik or other rendering providers. It's
implemented in Python and designed to be extended with a solid plugin system.
Out of the box, its features include:
- Rendering Mapnik maps
- Serving MBTiles tilesets
- Caching tiles to file system, MBTiles, Memcache or Amazon S3
- Composite 'layers' into single tilesets
The compositing feature in particular is very powerful. In TileStache's
configuration you define a set of 'layers', each layer being a different
tileset and effectively its own map. You can then define composite layers which
are new tilesets comprising of other layers on top of one another. This allows
you to do things like combining a pre-rendered tileset stored in an MBTiles
file with a tileset of features stored in PostGIS and serving them to your
visitors browser as one flat set of tiles.
Shifting constraints
The range of tools and techniques described provides plenty of flexibility when
we are working on mapping projects. It is all achieved without wasting
bandwidth or bogging down our visitor's machines with redundant computation.
Previously we had a strict upper-limit on the amount of geo-data we could
manage and serve, based on the limits of the network and our visitor's
hardware. As evident in this
final example, our challenge now is deciding how much data can we can fit
into our maps without sacrificing their readability.