Building a minimal Boto3 Lambda Layer

Anyone who is heavily using AWS Lambda functions has probably heard about Lambda layers. They are a way to centrally manage (large) code and data that is shared across multiple functions. In essence, they work like a container image layer: each layer is just a compressed archive of a filesystem, and they get stacked on top of each other to build the final filesystem visible to the Lambda function.

To build and publish these layers in a programmatic way, I developed lambda-layer-tool. For more details, you can read the introductory blog post here.

In this post I want to walk you through how to build a minimal layer for the boto3 library. boto3 is the official Python AWS SDK and implements interfaces for interacting with AWS APIs directly from Python.


I started out by building a full layer, just with basic exclusions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
---
version: '0.3'

default_excludes:
  - '*/pkg_resources/*'
  - '*/pip/*'
  - '*/setuptools/*'
  - '*/wheel/*'
  - '*.dist-info/*'
  - '*.egg-info/*'
  - '*/__pycache__/*'
  - '*.pyc'

layers:
  boto3:
    description: 'Minimal boto3 and botocore libraries'
    runtimes: 'python3.7'
    requirements:
      - 'boto3==1.12.39'
    excludes: []

 

$ ./layer-tool.py --build boto3
...
$ du -h boto3.zip
7.0M    boto3.zip
$ unzip -q boto3.zip
$ du -hs python/
51M   python/

This resulted in an archive of 7 MB (with the highest ZIP compression setting), which comes in at 51 MB uncompressed. Considering that the filesystem of a Lambda function may not exceed 250 MB, this is already one fifth of that!

A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can’t exceed the unzipped deployment package size limit of 250 MB.

Therefore, we should slim down this layer, i.e. remove some of the contents. A smaller layer will also reduce the coldstart time of the Lambda function.

Usually, a first good step is to remove example (examples/), testing (testing/ or tests/) and documentation (docs/) directories, since you most likely do not need those at runtime anymore. Especially for packages with really large testsuites (e.g. Pandas or Scipy) this helps a lot. However, be aware that in the case of boto3, the directory called docs/ actually contains Python code and is required!

To inspect the archive more closely, unzip it and check the size of the individual folders:

$ unzip -q boto3.zip
$ cd python/lib/python3.7/site-packages/
$ du -hs | sort -h
4.0K    easy_install.py
36K     six.py
84K     jmespath
276K    s3transfer
436K	urllib3
484K    dateutil
1.2M    boto3
1.4M    setuptools
2.3M    docutils
46M     botocore
$ du -hs botocore/* | sort -h | tail -5
72K	   botocore/vendored
76K	   botocore/credentials.py
172K   botocore/docs
268K   botocore/cacert.pem
45M	   botocore/data

We find that most of the filesize is coming from the data/ directories, both for boto3 as well as botocore. These directories contain massive amounts of JSON files which describe the AWS API endpoints. These JSON files are used by the libraries to build the API requests which will then be sent to AWS servers. The boto3 documentation states the following:

[Boto 3] uses a data-driven approach to generate classes at runtime from JSON description files that are shared between SDKs in various languages. Because Boto 3 is generated from these shared JSON files, we get fast updates to the latest services and features and a consistent API across services. Community contributions to JSON description files in other SDKs also benefit Boto 3, just as contributions to Boto 3 benefit the other SDKs.

However, this also means that if we only use a certain subset of AWS services, we can omit the other endpoint descriptions (JSON documents).

There are over 200 service endpoints in the botocore library, so most likely you are not using all of them.

$ ls -1 botocore/data/ | wc -l
222
$ du -hcs botocore/data/* | sort -h | tail
588K    botocore/data/iam
632K    botocore/data/ssm
644K    botocore/data/s3
660K    botocore/data/pinpoint
672K    botocore/data/elasticache
708K    botocore/data/sagemaker
1.2M    botocore/data/rds
3.5M    botocore/data/cloudfront
7.0M    botocore/data/ec2
44M     total

The boto3 library itself is quite small compared to that:

$ ls -1 boto3/data | wc -l
10
$ du -hcs boto3/data/* | sort -h
12K   boto3/data/dynamodb
16K   boto3/data/cloudformation
16K   boto3/data/opsworks
16K   boto3/data/sqs
20K   boto3/data/cloudwatch
20K   boto3/data/sns
28K   boto3/data/glacier
48K   boto3/data/s3
60K   boto3/data/iam
540K  boto3/data/ec2
776K  total

At least for the purposes of my organization, the Lambda functions are just interacting with the S3, SQS, Lambda, DynamoDB, RDS and SecretsManager services anyway, so we don’t need all of the 200 service descriptions.

Thus, I went through all of the directories in the data directory and created a list of services that we actually use from our Lambda functions:

apigateway
apigatewayv2
cloudwatch
cognito-identity
cognito-idp
cognito-sync
config
dynamodb
dynamodbstreams
elasticache
events
iam
lambda
logs
rds
rds-data
s3
s3control
secretsmanager
sns
sqs
ssm

Lambda layer tool supports excluding files and directories according to exclude patterns. However, since we do not want to omit all services in the data/ directory, we need to list out the excluded services explicitly. To still have an overview over which services are included, I am just commenting out the pattern of this service, instead of completely omitting it.

This results in the following monstrous YAML file:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
---
version: '0.3'

default_excludes:
  - '*/pkg_resources/*'
  - '*/pip/*'
  - '*/setuptools/*'
  - '*/wheel/*'
  - '*.dist-info/*'
  - '*.egg-info/*'
  - '*/__pycache__/*'
  - '*.pyc'

layers:
  boto3:
    description: 'Minimal boto3 and botocore libraries'
    runtimes: 'python3.7'
    requirements:
      - 'boto3==1.12.31'
    excludes:
      - '*/boto3/examples/*'
      # NOTE: commented out services are INCLUDED
      - '*/botocore/data/accessanalyzer/*'
      - '*/botocore/data/acm/*'
      - '*/botocore/data/acm-pca/*'
      - '*/botocore/data/alexaforbusiness/*'
      - '*/botocore/data/amplify/*'
      # - '*/botocore/data/apigateway/*'
      - '*/botocore/data/apigatewaymanagementapi/*'
      # - '*/botocore/data/apigatewayv2/*'
      - '*/botocore/data/appconfig/*'
      - '*/botocore/data/application-autoscaling/*'
      - '*/botocore/data/application-insights/*'
      - '*/botocore/data/appmesh/*'
      - '*/botocore/data/appstream/*'
      - '*/botocore/data/appsync/*'
      - '*/botocore/data/athena/*'
      - '*/botocore/data/autoscaling/*'
      - '*/botocore/data/autoscaling-plans/*'
      - '*/botocore/data/backup/*'
      - '*/botocore/data/batch/*'
      - '*/botocore/data/budgets/*'
      - '*/botocore/data/ce/*'
      - '*/botocore/data/chime/*'
      - '*/botocore/data/cloud9/*'
      - '*/botocore/data/clouddirectory/*'
      - '*/botocore/data/cloudformation/*'
      - '*/botocore/data/cloudfront/*'
      - '*/botocore/data/cloudhsm/*'
      - '*/botocore/data/cloudhsmv2/*'
      - '*/botocore/data/cloudsearch/*'
      - '*/botocore/data/cloudsearchdomain/*'
      - '*/botocore/data/cloudtrail/*'
      # - '*/botocore/data/cloudwatch/*'
      - '*/botocore/data/codebuild/*'
      - '*/botocore/data/codecommit/*'
      - '*/botocore/data/codedeploy/*'
      - '*/botocore/data/codeguruprofiler/*'
      - '*/botocore/data/codeguru-reviewer/*'
      - '*/botocore/data/codepipeline/*'
      - '*/botocore/data/codestar/*'
      - '*/botocore/data/codestar-connections/*'
      - '*/botocore/data/codestar-notifications/*'
      # - '*/botocore/data/cognito-identity/*'
      # - '*/botocore/data/cognito-idp/*'
      # - '*/botocore/data/cognito-sync/*'
      - '*/botocore/data/comprehend/*'
      - '*/botocore/data/comprehendmedical/*'
      - '*/botocore/data/compute-optimizer/*'
      # - '*/botocore/data/config/*'
      - '*/botocore/data/connect/*'
      - '*/botocore/data/connectparticipant/*'
      - '*/botocore/data/cur/*'
      - '*/botocore/data/dataexchange/*'
      - '*/botocore/data/datapipeline/*'
      - '*/botocore/data/datasync/*'
      - '*/botocore/data/dax/*'
      - '*/botocore/data/detective/*'
      - '*/botocore/data/devicefarm/*'
      - '*/botocore/data/directconnect/*'
      - '*/botocore/data/discovery/*'
      - '*/botocore/data/dlm/*'
      - '*/botocore/data/dms/*'
      - '*/botocore/data/docdb/*'
      - '*/botocore/data/ds/*'
      # - '*/botocore/data/dynamodb/*'
      # - '*/botocore/data/dynamodbstreams/*'
      - '*/botocore/data/ebs/*'
      - '*/botocore/data/ec2/*'
      - '*/botocore/data/ec2-instance-connect/*'
      - '*/botocore/data/ecr/*'
      - '*/botocore/data/ecs/*'
      - '*/botocore/data/efs/*'
      - '*/botocore/data/eks/*'
      - '*/botocore/data/elasticache/*'
      - '*/botocore/data/elasticbeanstalk/*'
      - '*/botocore/data/elastic-inference/*'
      - '*/botocore/data/elastictranscoder/*'
      - '*/botocore/data/elb/*'
      - '*/botocore/data/elbv2/*'
      - '*/botocore/data/emr/*'
      - '*/botocore/data/es/*'
      # - '*/botocore/data/events/*'
      - '*/botocore/data/firehose/*'
      - '*/botocore/data/fms/*'
      - '*/botocore/data/forecast/*'
      - '*/botocore/data/forecastquery/*'
      - '*/botocore/data/frauddetector/*'
      - '*/botocore/data/fsx/*'
      - '*/botocore/data/gamelift/*'
      - '*/botocore/data/glacier/*'
      - '*/botocore/data/globalaccelerator/*'
      - '*/botocore/data/glue/*'
      - '*/botocore/data/greengrass/*'
      - '*/botocore/data/groundstation/*'
      - '*/botocore/data/guardduty/*'
      - '*/botocore/data/health/*'
      # - '*/botocore/data/iam/*'
      - '*/botocore/data/imagebuilder/*'
      - '*/botocore/data/importexport/*'
      - '*/botocore/data/inspector/*'
      - '*/botocore/data/iot/*'
      - '*/botocore/data/iot1click-devices/*'
      - '*/botocore/data/iot1click-projects/*'
      - '*/botocore/data/iotanalytics/*'
      - '*/botocore/data/iot-data/*'
      - '*/botocore/data/iotevents/*'
      - '*/botocore/data/iotevents-data/*'
      - '*/botocore/data/iot-jobs-data/*'
      - '*/botocore/data/iotsecuretunneling/*'
      - '*/botocore/data/iotthingsgraph/*'
      - '*/botocore/data/kafka/*'
      - '*/botocore/data/kendra/*'
      - '*/botocore/data/kinesis/*'
      - '*/botocore/data/kinesisanalytics/*'
      - '*/botocore/data/kinesisanalyticsv2/*'
      - '*/botocore/data/kinesisvideo/*'
      - '*/botocore/data/kinesis-video-archived-media/*'
      - '*/botocore/data/kinesis-video-media/*'
      - '*/botocore/data/kinesis-video-signaling/*'
      - '*/botocore/data/kms/*'
      - '*/botocore/data/lakeformation/*'
      # - '*/botocore/data/lambda/*'
      - '*/botocore/data/lex-models/*'
      - '*/botocore/data/lex-runtime/*'
      - '*/botocore/data/license-manager/*'
      - '*/botocore/data/lightsail/*'
      # - '*/botocore/data/logs/*'
      - '*/botocore/data/machinelearning/*'
      - '*/botocore/data/macie/*'
      - '*/botocore/data/managedblockchain/*'
      - '*/botocore/data/marketplace-catalog/*'
      - '*/botocore/data/marketplacecommerceanalytics/*'
      - '*/botocore/data/marketplace-entitlement/*'
      - '*/botocore/data/mediaconnect/*'
      - '*/botocore/data/mediaconvert/*'
      - '*/botocore/data/medialive/*'
      - '*/botocore/data/mediapackage/*'
      - '*/botocore/data/mediapackage-vod/*'
      - '*/botocore/data/mediastore/*'
      - '*/botocore/data/mediastore-data/*'
      - '*/botocore/data/mediatailor/*'
      - '*/botocore/data/meteringmarketplace/*'
      - '*/botocore/data/mgh/*'
      - '*/botocore/data/migrationhub-config/*'
      - '*/botocore/data/mobile/*'
      - '*/botocore/data/mq/*'
      - '*/botocore/data/mturk/*'
      - '*/botocore/data/neptune/*'
      - '*/botocore/data/networkmanager/*'
      - '*/botocore/data/opsworks/*'
      - '*/botocore/data/opsworkscm/*'
      - '*/botocore/data/organizations/*'
      - '*/botocore/data/outposts/*'
      - '*/botocore/data/personalize/*'
      - '*/botocore/data/personalize-events/*'
      - '*/botocore/data/personalize-runtime/*'
      - '*/botocore/data/pi/*'
      - '*/botocore/data/pinpoint/*'
      - '*/botocore/data/pinpoint-email/*'
      - '*/botocore/data/pinpoint-sms-voice/*'
      - '*/botocore/data/polly/*'
      - '*/botocore/data/pricing/*'
      - '*/botocore/data/qldb/*'
      - '*/botocore/data/qldb-session/*'
      - '*/botocore/data/quicksight/*'
      - '*/botocore/data/ram/*'
      # - '*/botocore/data/rds/*'
      # - '*/botocore/data/rds-data/*'
      - '*/botocore/data/redshift/*'
      - '*/botocore/data/rekognition/*'
      - '*/botocore/data/resource-groups/*'
      - '*/botocore/data/resourcegroupstaggingapi/*'
      - '*/botocore/data/robomaker/*'
      - '*/botocore/data/route53/*'
      - '*/botocore/data/route53domains/*'
      - '*/botocore/data/route53resolver/*'
      # - '*/botocore/data/s3/*'
      # - '*/botocore/data/s3control/*'
      - '*/botocore/data/sagemaker/*'
      - '*/botocore/data/sagemaker-a2i-runtime/*'
      - '*/botocore/data/sagemaker-runtime/*'
      - '*/botocore/data/savingsplans/*'
      - '*/botocore/data/schemas/*'
      - '*/botocore/data/sdb/*'
      # - '*/botocore/data/secretsmanager/*'
      - '*/botocore/data/securityhub/*'
      - '*/botocore/data/serverlessrepo/*'
      - '*/botocore/data/servicecatalog/*'
      - '*/botocore/data/servicediscovery/*'
      - '*/botocore/data/service-quotas/*'
      - '*/botocore/data/ses/*'
      - '*/botocore/data/sesv2/*'
      - '*/botocore/data/shield/*'
      - '*/botocore/data/signer/*'
      - '*/botocore/data/sms/*'
      - '*/botocore/data/sms-voice/*'
      - '*/botocore/data/snowball/*'
      # - '*/botocore/data/sns/*'
      # - '*/botocore/data/sqs/*'
      # - '*/botocore/data/ssm/*'
      - '*/botocore/data/sso/*'
      - '*/botocore/data/sso-oidc/*'
      - '*/botocore/data/stepfunctions/*'
      - '*/botocore/data/storagegateway/*'
      - '*/botocore/data/sts/*'
      - '*/botocore/data/support/*'
      - '*/botocore/data/swf/*'
      - '*/botocore/data/textract/*'
      - '*/botocore/data/transcribe/*'
      - '*/botocore/data/transfer/*'
      - '*/botocore/data/translate/*'
      - '*/botocore/data/waf/*'
      - '*/botocore/data/waf-regional/*'
      - '*/botocore/data/wafv2/*'
      - '*/botocore/data/workdocs/*'
      - '*/botocore/data/worklink/*'
      - '*/botocore/data/workmail/*'
      - '*/botocore/data/workmailmessageflow/*'
      - '*/botocore/data/workspaces/*'
      - '*/botocore/data/xray/*'

But a much less monstrous layer archive (2.3 MB compressed, 13 MB uncompressed):

$ ./layer-tool.py --build boto3
...
$ du -h boto3.zip
2.3M    boto3.zip
$ unzip -q boto3.zip
$ du -hs python/
13M    python/

Of course, everyone needs to compile their own list, so it fits their use-case. Depending on the required services, the resulting layer will be smaller or larger.

For more details about the lambda-layer-tool, check out the [introductory blog post]({{ ref “lambda-layer-tool.md” }}) and Github repository.