lambda

Serverless compute services, or Functions as a Service (FaaS), e.g. AWS Lambda, provide a cost effective, scalable, and agile way to run scripts or programs on a schedule.

They offer a modern, superior alternative to older solutions like running Cron jobs on an always-on server.

In this post, we will run Python code on a schedule using AWS Lambda.

The main tool we will be using is the AWS Serverless Application Model Command Line Interface (SAM CLI).

To make our example more practical, our Python code will use third party libraries.

In what follows, the AWS region is us-east-1 (North Virginia).

AWS resources

Create an S3 bucket

In the S3 console, create a bucket <aws-account-id>-lambda-scheduled-task.

Create an IAM policy

In the IAM console, create a policy LambdaSAMSchedule with description "Allows SAM to create Lambda functions that run on a schedule" with the JSON

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:CreateMultipartUpload"],
      "Resource": [
        "arn:aws:s3:::<aws-account-id>-lambda-scheduled-task/*",
        "arn:aws:s3:::<aws-account-id>-lambda-scheduled-task"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["iam:ListPolicies"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cloudformation:CreateChangeSet",
        "cloudformation:DescribeChangeSet",
        "cloudformation:ExecuteChangeSet",
        "cloudformation:DescribeStackEvents",
        "cloudformation:DescribeStacks"
      ],
      "Resource": [
        "arn:aws:cloudformation:*:aws:transform/Serverless-2016-10-31",
        "arn:aws:cloudformation:us-east-1:<aws-account-id>:stack/scheduled-task/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": "cloudformation:GetTemplateSummary",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:GetRole",
        "iam:CreateRole",
        "iam:PassRole",
        "iam:DeleteRole",
        "iam:GetRolePolicy",
        "iam:PutRolePolicy",
        "iam:AttachRolePolicy",
        "iam:DetachRolePolicy",
        "iam:DeleteRolePolicy",
        "iam:TagRole",
        "iam:UntagRole"
      ],
      "Resource": "arn:aws:iam::<aws-account-id>:role/scheduled-task-ScheduledTaskRole-*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "lambda:UpdateFunctionCode",
        "lambda:ListTags",
        "lambda:TagResource",
        "lambda:UntagResource",
        "lambda:GetFunctionConfiguration",
        "lambda:CreateFunction",
        "lambda:DeleteFunction",
        "lambda:AddPermission"
      ],
      "Resource": "arn:aws:lambda:us-east-1:<aws-account-id>:function:scheduled-task-ScheduledTask-*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "events:DescribeRule",
        "events:PutRule",
        "events:RemoveTargets",
        "events:DeleteRule",
        "events:PutTargets",
        "events:EnableRule",
        "events:DisableRule",
        "events:DeleteRule",
        "events:RemoveTargets",
        "events:ListTargetsByRule"
      ],
      "Resource": "arn:aws:events:us-east-1:<aws-account-id>:rule/schedule-1"
    }
  ]
}

Create an IAM user

In the IAM console, create a user local-sam with programmatic access.

Attach to local-sam the policy LambdaSAMSchedule (NB: best practice is to attach ScheduleAWSLambda to a group and make local-sam a member of the group).

Make a note of the secret access key as this is the only time it will be available and will be needed later. It will be referred to as <secret-access-key>.

Local development environment

Ideally, we would develop our Python code locally in an environment identical to the production one in which it will be deployed.

In other words, we would like our local development environment to resemble as much as possible the AWS environment in the cloud in which Lambda functions run.

This is one advantage of SAM as it enables you to run code locally in a Docker container that replicates the AWS Lambda environment.

Prerequisites

  • Docker
  • conda

Hello, World!

Create files

  • cd /my/local/path
  • mkdir -p scheduled-task/config
  • cd scheduled-task
  • touch environment.yml
  • touch app.py
  • touch event.json
  • touch config/template.yml

which gives a directory structure

/my/local/path/scheduled-task/
├── app.py
├── config
│   └── template.yml
├── environment.yml
└── event.json

In our conda virtual environment environment.yml we only have one dependency, the SAM CLI, which is written in Python (latest version 0.44.0, compatible with Python 3.6 at time of writing)

environment.yml

name: scheduled-task
dependencies:
  - python=3.6
  - pip=20.0.2
  - pip:
      - aws-sam-cli==0.44.0

Our Python file app.py consists of one function that prints:

  • "Hello, World!"
  • Event that triggered the function invocation
  • Context of the function invocation
  • Python version

and returns a success message:

app.py

import sys


def task(event, context):
    print('Hello, World!\n'
          f'Event: {event}\n'
          f'Context: {context}\n'
          f'Python version: {sys.version}')
    return {'success': True}

Our event JSON event.json contains a dummy event for local testing

event.json

{
  "version": "0",
  "id": "608d7ceb-0671-2677-ac76-6a6c2b45c045",
  "detail-type": "Scheduled Event",
  "source": "aws.events",
  "account": "012345678999",
  "time": "2020-03-15T14:44:00Z",
  "region": "us-east-1",
  "resources": ["arn:aws:events:us-east-1:012345678999:rule/schedule-1"],
  "detail": {}
}

Our SAM template template.yml (a superset of CloudFormation templates) defines the handler for our Lambda function and its schedule in Rate syntax.

config/template.yml

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: Template for scheduling Lambda functions
Resources:
  ScheduledTask:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.7
      Handler: app.task
      CodeUri: ..
      Timeout: 10
      Events:
        RateSchedule:
          Properties:
            Description: Schedule for testing and demo purposes
            Enabled: true
            Name: schedule-1
            Schedule: rate(3 minutes)
          Type: Schedule

Why is the Python version 3.7 not the same as in environment.yml, 3.6?

The Python version in template.yml refers to the Python version in the Docker container run locally by SAM and in the AWS Lambda runtime environment.

It is the Python version that runs the code in app.py.

The Python version in environment.yml refers to the version that runs inside the conda virtual environment (that the SAM CLI depends on).

Why is CodeUri set to ..?

Because app.py is one level above config/template.yml in the directory hierarchy.

Test it works

Build and activate the conda environment

  • conda env create --file environment.yml
  • conda activate scheduled-task

Check we are using Python in the virtual environment and that its version is 3.6

  • which python
  • python --version

Check the SAM CLI is installed sam --version.

Create environment variables for the SAM CLI

  • export AWS_ACCESS_KEY_ID=<access-key-id>
  • export AWS_SECRET_ACCESS_KEY=<secret-access-key>
  • export AWS_DEFAULT_REGION=us-east-1

where <access-key-id> is in the IAM console under the user local-sam and "Access key ID".

Validate the template sam validate --template config/template.yml.

Assuming successful validation, invoke our Lambda function:

sam local invoke ScheduledTask --event event.json --template config/template.yml.

(Add --debug for troubleshooting).

The first time sam local invoke runs, it pulls down the Docker image lambci/lambda. This will probably take a while as it is almost 1 GB in size.

In subsequent runs, add --skip-pull-image to avoid pulling down the image again.

If the Lambda function is invoked successfully, you should see some something like:

Invoking app.task (python3.7)

Mounting /Users/guy/Documents/blog-post-repos/scheduled-task as /var/task:ro,delegated inside runtime container

START RequestId: 61fc63e3-b7d5-1031-6deb-89fbfac4e697 Version: $LATEST

Hello, World!

Event: {'version': '0', 'id': '608d7ceb-0671-2677-ac76-6a6c2b45c045', 'detail-type': 'Scheduled Event', 'source': 'aws.events', 'account': '012345678999', 'time': '2020-03-15T14:44:00Z', 'region': 'us-east-1', 'resources': ['arn:aws:events:us-east-1:012345678999:rule/schedule-1'], 'detail': {}}

Context: <bootstrap.LambdaContext object at 0x7fcdf65cff90>

Python version: 3.7.6 (default, Feb 5 2020, 14:03:26)

[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)]

END RequestId: 61fc63e3-b7d5-1031-6deb-89fbfac4e697

REPORT RequestId: 61fc63e3-b7d5-1031-6deb-89fbfac4e697 Init Duration: 1400.57 ms Duration: 32.67 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 23 MB

{"success":true}

A more interesting example

Let’s modify our Lambda function so that it extracts some text from the Wikipedia homepage.

We will use the third party libraries requests and pyquery to do so.

Update files

As our Lambda function requires requests and pyquery, we have to specify these two libraries in requirements.txt as dependencies

requirements.txt

requests==2.23.0
pyquery==1.4.1

To make use of these two libraries and extract some text from Wikipedia, update app.py accordingly

app.py

import sys

import requests
from pyquery import PyQuery


def task(event, context):
    print('Hello, World!\n'
          f'Event: {event}\n'
          f'Context: {context}\n'
          f'Python version: {sys.version}')
    r = requests.get('https://www.wikipedia.org/')
    pq = PyQuery(r.content)
    for div in pq('div').filter('.central-featured-lang').items():
        print(f'Language: {div("a strong").text()}')
    return {'success': True}

Test it works

Download pyquery and requests

sam build --template config/template.yml --manifest requirements.txt

EDIT: This should be sam build --template config/template.yml --manifest requirements.txt --use-container (if you have Python 3.7 installed locally, the original command will work too)

In /my/local/path/scheduled-task/.aws-sam/build you should see the project files copied over with all third party dependencies.

Invoke the Lambda function

sam local invoke ScheduledTask --event event.json --template .aws-sam/build/template.yaml --skip-pull-image

which should output something like

Language: English

Language: Español

Language: 日本語

Language: Deutsch

Language: Русский

Language: Français

Language: Italiano

Language: 中文

Language: Português

Language: Polski

NB: each time you make changes to the code you need to build again, i.e.

sam build --template config/template.yml --manifest requirements.txt && sam local invoke ScheduledTask --event event.json --template .aws-sam/build/template.yaml --skip-pull-image

Deploy to production

Package files for S3

sam package --template-file .aws-sam/build/template.yaml --output-template-file packaged.yml --s3-bucket <aws-account-id>-lambda-scheduled-task

zips up the files in .aws-sam/build/ScheduledTask and uploads them to the S3 bucket <aws-account-id>-lambda-scheduled-task.

You should see a new file in the S3 bucket; this file contains all the information required to run the Lambda function.

The command also creates a new template file packaged.yml which is the same as .aws-sam/build/template.yaml except CodeUri points to the new file in the S3 bucket. packaged.yml will be used in the next step.

Deploy

The final step is to use the SAM CLI to create the necessary resources in AWS for the Lambda function to run:

sam deploy --template-file packaged.yml --stack-name scheduled-task --capabilities CAPABILITY_IAM

Under the hood, this creates a CloudFormation stack scheduled-task which in turn creates the Lambda function.

If you get an IAM permissons error, update the IAM policy LambdaSAMSchedule accordingly.

If the command is successful, you should see the Lambda function invoked every three minutes (you can check this in the Lambda or CloudWatch console).

In the CloudWatch console, you should also see the invoked Lambda function’s logs (which should mirror those when the Lambda function ran locally).

Clean up

In the AWS console:

  • Remove S3 bucket <aws-account-id>-lambda-scheduled-task
  • Remove IAM policy LambdaSAMSchedule
  • Remove IAM user local-sam
  • Remove CloudFormation stack scheduled-task
  • Remove CloudWatch log group /aws/lambda/scheduled-task-ScheduledTask-<id>

On your local machine, remove the directory /my/local/path/scheduled-task.