Lambda Layer Tool

Posted on Mar 31, 2020

Recently at work, I wanted to standardize our workflow for building the layers used by our AWS Lambda functions.

A Lambda layer is a ZIP archive that contains support libraries and other dependencies. With layers, not all the libraries of a Lambda function need to be included in the deployment package, thereby making development faster and easier. It is especially useful when you’re bundling native libraries with your functions (e.g. for image and video processing).

First, I tried to hack together a shell script. While this solution offers the most flexibility, I quickly had to realize it does not scale well since you have to repeat a lot of code.

There are various Serverless tools which support creating layers as a part of building and deploying an entire microservice. However, this is not the approach we favored. We want to build our layers “out-of-band” and then use these layers as dependencies in all of our projects, distributed across multiple repositories. This way, the building of the application (specifically its deployment package) and the building of the layer are decoupled from each other.

Here is an example of this workflow with the Serverless framework:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
functions:
  helloWorld:
    handler: helloWorld.handler
    memorySize: 256
    layers:
      - arn:aws:lambda:eu-central-1:1234567890:layer:boto3:1
      - arn:aws:lambda:eu-central-1:1234567890:layer:pandas:2
      - arn:aws:lambda:eu-central-1:1234567890:layer:sklearn:3
    events:
      - httpApi: 'GET /hello-world'

To solve this problem, I decided to write my own tool: Lambda Layer Tool. It uses a basic YAML configuration file for specifying the layer build instructions along various metadata, such as name, description and runtime.

Using this information, a layer can either be built (creating a ZIP archive) or published (uploading the archive to AWS).

Here is a short ASCIIcast of the workflow. It also shows the reduction in package size.

And here is an example layers.yaml file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
---
version: '0.3'
default_excludes:
  - '*.dist-info/*'
  - '*.egg-info/*'
  - '*/__pycache__/*'
  - '*.pyc'
layers:
  awesome-numpy:
    description: 'Minimal numpy 1.18'
    runtimes: 'python3.6'
    pre_installs:
      - 'yum install gcc-gfortran'
    requirements:
      - 'numpy==1.18.2'
    excludes:
      - '*/numpy/tests/*'

The first YAML key specifies the version of the configuration file, to ensure proper forward and backward compatibility.

The default_excludes list is merged with the excludes list of every individual layer. I will go into detail about this later.

The layers section contains all layer specifications. You can either build and publish all of your layers at once (with ./layer-tool.py [--build|--publish]) or specify which of the layers you want to interact with (./layer-tool.py [--build|--publish] layerA layerB).

The description and runtimes keys are used in the publishing step to set the appropriate metadata fields of the layer on AWS.

pre_installs specifies arbitrary commands that should be run in the build environment before actually installing the requirements. You can use this to create files, copy and move paths or install libraries.

The requirements key contains a list of requirements that will be installed in the virtualenv environment. These are passed as command-line arguments to pip install, so you can use the usually pip syntax here.

Finally, before creating the layer archive, the patterns specified in excludes are ignored. This is especially important for trimming down the size of large packages (e.g., Sklearn, Pandas, …), by removing parts that are not required during runtime (e.g., tests, documentation, metadata files …). The default_excludes key gives a good indication on which files should be ignored for Python projects.

Furthermore, the tool also puts the configuration file used to build the layer (layers.yaml) in the archive itself, so later you can still figure out which package versions have been installed, what was removed and so on.

Currently, the tool just supports building Python layers, however I am interested in supporting other runtimes. Adjusting the current source code for this should not be too difficult.

In general, if you have suggestions or ideas for the tool, just let me know or open an issue.

# Writing this tool in Python

In the past, I have mostly read Python, but also written a little bit here and there. Thus, I decided that writing this tool in Python was a great opportunity for me to explore the Python ecosystem more.

In hindsight, I have to say Python is a really great fit.

On one hand, it allows to write fragile code quickly, which is important for prototyping a tool like this in an afternoon. One example of this “fragile code” is the ability to create and access non-structured dictionaries:

1
runtimes = options['runtime']`

If the key doesn’t exist, this will yield a KeyError (instead of gracefully failing), but that’s totally fine for the beginning and can later easily be fixed.

On the other hand, it offers enough options to safely handle resources like files, processes etc.

1
2
3
4
5
6
7
    # install requirements with pip in venv
    for r in requirements:
        try:
            subprocess.run([pip_bin, "install", r], check=True)
        except subprocess.CalledProcessError as e:
            print(e)
            return 1

Nevertheless, I was absolutely disappointed by Python’s “new”, “optional” typing system. I won’t go into the details here (see these getting started articles [1] [2] and the Python typing documentation), but the type annotations the developer writes are actually just for the developer, and the Python runtime completely ignores them. While they are not useless, I think they are about as useful as documenting types by writing comments in your code (2b7a14e: Implement Python type hinting with mypy).