Introducing Geolambda
Geolambda greatly simplifies the process to develop and deploy code that uses standard geospatial libraries. It takes the guesswork out of bundling native binaries with your AWS Lambda functions, so you can focus on building. We run Geolambda in production on projects for NASA and Astro Digital.
Geospatial processing code often depends on a small pool of standard libraries: GDAL, Proj.4, image format libraries, etc. Bundling these libraries as dependencies is often not well documented for cloud computing environments, leading to the trial and error of uploading new code and testing to get it right.
This is a hassle. So we built Geolambda. At its core, Geolambda is a Docker image pre-loaded with standard geospatial libraries, and scripts to package them into a zip file that can be uploaded directly to AWS.
We built Geolambda with AWS Lambda functions in mind. Essentially on-demand cloud computing, Lambda functions have proven capable of transparent and automatic scaling, terrific flexibility, and ease of maintenace, all while being cheaper than always-running alternatives. That said, Geolambda is for generating a portable code package, and you can deploy it to anywhere running Amazon-flavored Linux.
I’m going to run through the step-by-step of how to use Geolambda. You can also browse the code on Github.
Let’s make a Geolambda
You can create a very simple Geolambda with just a few files, which I’ll detail below. A starter version of these files are included in the Github repository in a directory called geolambda-seed, so you can follow along.
Dockerfile
First, you need a Dockerfile that specifies the Geolambda image to use:
FROM developmentseed/geolambda:fullWORKDIR /home/geolambda
Geolambda Docker images are available on Docker Hub. There are a couple of tags you can choose from, corresponding to how much of a pre-built environment you’re looking for. If you’re just trying this out or not sure what to use, use developmentseed/geolambda:full
.
lambda/lambda_handler.py
This is the code responsible for handling a Lambda event. The example below sets up a logger for each event, then calculates statistics for a file stored on AWS S3, given it’s S3 url.
import osimport sysimport loggingfrom osgeo import gdal# add path to included Python packagespath = os.path.dirname(os.path.realpath(__file__))sys.path.insert(0, os.path.join(path, 'lib/python2.7/site-packages'))# set up loggerlogger = logging.getLogger(__file__)logger.setLevel(logging.DEBUG)def handler(event, context): """ Lambda handler """ logger.debug(event) # read filename from event payload and get image statistics fname = event['filename'].replace('s3://', '/vsis3/') # open and return metadata ds = gdal.Open(fname) band = ds.GetRasterBand(1) stats = band.GetStatistics(0, 1) return stats
test/test_lambda.py
Testing Lambda functions can help you avoid hairy situations. The example file structure in geolambda-seed includes a directory for unit and integration tests, and starts you off with a dummy test. As your handler grows, use this file as a starting point for future tests.
docker-compose.yml
With a Dockerfile, a handler, and tests, all that remains is to build the image and create a deployment package. This is easy to do with docker-compose. A docker-compose.yml
file contains the recipe for building and testing your deployment.
You’ll notice that our example file contains a couple different services. We’ll go over these later.
.env
The .env
file contains any environment variables that may be used in your handler. Our example uses AWS services, and if you plan to as well, you’ll need this file, which docker-compose
reads during development. Note, deployed Lambdas have environment variables set through the AWS console, or the AWS CLI.
AWS_ACCESS_KEY_ID=*id*AWS_SECRET_ACCESS_KEY=*access_key*AWS_DEFAULT_REGION=us-east-1
If you’re using Git or another version control system, make sure not to commit this file!
Running docker-compose services for testing and packaging
The docker-compose.yml file provdes several services. To access them, first build the Docker image:
$ docker-compose build
Then run one of the services:
$ docker-compose run *servicename*
- base: Starts the container and provides a bash shell for working with it interactively. This also mounts the current directory on the host.
- test: Use the Nose library to run any tests in the test directory.
- package: Runs the packaging script, which will collect needed libraries in the lambda directory, alongside
lambda_handler.py
, and creates a zip file of that directory. - testpackage: Runs your tests using the base Geolambda image, in order to test with just the files deployed.
Deploy it and run it
The Geolambda images contain a script to collect and zip everything you need to deploy to AWS. This produces a zip file, which you can can either upload using the AWS console or with the AWS CLI. The below assumes you have already set up a Lambda function through the AWS console called “geolambda-stats” that uses the Python 2.7 runtime.
$ aws lambda update-function-code --function-name geolambda-stats --zip-file fileb://lambda-deploy.zip
Voila! You’ve deployed your Geolambda. Now you can run it from the command line, passing it any raster image file (supported by GDAL) stored on s3. It will return the statistics of the image.
$ aws lambda invoke --function-name geolambda-stats --invocation-type RequestResponse --payload '{"filename": "s3://landsat-pds/L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B3.TIF"}' stats$ more stats[ 0, 22715, 7036.3087310300425, 6873.612118202581]
Beyond simple Geolambdas
If you have an existing Python project, you can easily incorporate Geolambda to run it on AWS. Drop the seed files in the top level of a Python project, so that Dockerfile
and docker-compose.yml
are alongside setup.py
. Then modify your Dockerfile as such:
FROM developmentseed/geolambda:full# install appCOPY . /buildWORKDIR /buildRUN \ pip install -r /build/requirements.txt; \ pip install . -v; \ rm -rf /build/*;WORKDIR /home/geolambda
This installs your Python package along with any required Python dependencies. Geolambda will automatically include these during packaging. To install other system dependencies beyond what is made available in Geolambda (ie, compiled C code), use lambda-package.sh
in the geolambda-seed directory.
Extending Geolambda
AWS Lambda imposes a 50MB size limit to zip files you upload. Uncompressed (unzipped) files cannot exceed 250MB. This puts a severe constraint on the amount of additional code you can include, since standard geospatial libraries already create a 46MB deployment package. To squeeze out more space, you can extend Geolambda and adjust the underlying geospatial libraries.
Many applications will more control over the configuration of the underlying libraries. Production applications should not use the Development Seed Geolambda repository directly from Docker Hub, as it will change over time. To ensure a consistent base image, we recommended forking the GitHub repository and creating your own geolambda images in Docker Hub. This would allow custom images, such as one that includes a specific set of GDAL drivers.
Processing at scale
We’ve been running Geolambda in production for a while now. Knowing our processing environment will just work has cut down our time-to-deployment, and reduced the surface area for bugs.
Combined with cheap cloud storage and on-demand processing, we think the future is bright for impactful, open data applications running Geolambda. Read more about how we’re using AWS to publish application-ready satellite images to the web. We’d love to hear from you, and we’re also hiring, so give us a shout at one of the links below.