loading...

August 6, 2020

Auto Delete AWS Snapshots with Python Lambda Script

Auto Delete AWS Snapshots with Python Lambda Script

So the billing department called you inquiring why they were incurring extra expense from AWS due to 14,000 32GB snapshots having piled up? Unbeknownst to you, some backup script had gotten away from your team and started backing up repeatedly? Maybe you’re intentionally keeping around petabytes of backup data but you need a pruning script?

Even worse, when you try to select and delete 50 snapshots at a time in AWS Console manually you get random cryptic warnings and failures to delete.

Now, I considered calling AWS support at this point before exhaling through my nose heartily at the indignation and deciding to do it myself. Why not use the power of AWS Lambda to work around one of Amazon Web Service’s own bugs?

Well, script kiddies, grab a coffee and dust off your Python skills, because like Leeroy Jenkins, “we’re going in”.

“Oh no, Leeroy, STOP!!!”, one of your edge devs yells. Why? Because like most things AWS we can’t just go around writing Python scripts all willy-nilly. First we need an IAM user role and some permission policies. Jeez Leeroy, you’re gonna get us killed here (by upper management).

Head to IAM -> Roles and make yourself a new role. Then add the AWSLambdaBasicExecutionRole and AWSXRayDaemonWriteAccess permission policies as usual for a Lambda role. In addition, because we’re working with EC2, add the AmazonEC2FullAccess permission policy. There may be some lesser access level worth investigating if your companies policies are tight surrounding IAM permissions, but I’ll leave that to a future episode.

For now, these three Permission policies should be enough to run the Lambda script and your IAM Role summary should look about like so:

With our IAM Role set, we can catch back up to Leeroy over in AWS Lambda.

Create a standard Python 3.7 script. Attach the IAM Role. For your “Test” scenario just choose the “Hello World” settings, they are irrelevant to this script as it will be returning void.

In lambda_function.py add the following script:

import boto3
import datetime
import getopt

############################################################
# leifCleanAwsEc2Snapshots                                                                                     
# Script will delete all snapshots created before dateLimit.                                
# ALL SNAPSHOTS OLDER THAN THIS DATE WILL BE DELETED!!!                       
dateLimit = datetime.datetime(2018, 1, 1)    # yyyy, mm, dd                           
############################################################

#AWS Settings
client = boto3.client('ec2',region_name='us-east-1')
snapshots = client.describe_snapshots(OwnerIds=['1111111111111'])

def lambda_handler(event, context):
    
    # Calculate the number of days ago the date limit is.
    dateToday = datetime.datetime.now()
    dateDiff=dateToday-dateLimit
    
    # Could base this clean-up on the number of snapshots too.
    #snapshotCount=len(snapshots['Snapshots'])
    
    for snapshot in snapshots['Snapshots']:
        a=snapshot['StartTime']
        b=a.date()
        c=datetime.datetime.now().date()
        d=c-b
        try:
            if d.days>dateDiff.days:
                id = snapshot['SnapshotId']
                started = snapshot['StartTime']
                print(id + "********************")
                print(started)
                #Uncomment below line for "live run"
                #client.delete_snapshot(SnapshotId=id)
                print("DELETED^^^^^^^^^^^^^^^^^^")
        except getopt.GetoptError as e:
            if 'InvalidSnapshot.InUse' in e.message:
                print("skipping this snapshot")
                continue

Once you’ve saved the script, you could now click the “Test” button to run it, as Leeroy is over in the corner currently doing. However, it’s not going to run yet.

Enter your AWS details for Region and OwnerID. In my case the Owner ID was our main IAM Role, but your mileage may very depending on your EC2 setup. Anyway, find the owner IAM of those snapshots and enter that for the OwnerID.

No, Leeroy, still not yet, stop pressing that Test button.

We still need to set our date. The core purpose of this script is to delete all EC2 snapshots before a certain date, so we have to make sure we set the “dateLimit” variable to whichever date we want. I recommend starting long ago, and working your way forward.

Now, since we’ve got our dateLimit set, and we’ve got the right AWS Region and OwnerID, and because Leeroy is still hammering that Test button over there, our script finally produces a “dry run”.

What you should expect to see is a list of snapshot names (and the dates they were created) in the console. Should look about like so:

leifCleanAwsEc2Snapshots

July 28, 2020

Run 1:
Set date limit: 1-1-2019
Perform dry run. OK
Set execution time: 5 minutes
SUCCESS: Removed 1308 snapshots.

Log:
snap-073d12f68267c9178********************
2018-07-08 15:36:21+00:00
DELETED^^^^^^^^^^^^^^^^^^
snap-0bfef6b0debd78d6a********************
2018-07-08 15:37:21+00:00
DELETED^^^^^^^^^^^^^^^^^^
snap-0432066f851a8b197********************
2018-07-08 15:36:21+00:00
DELETED^^^^^^^^^^^^^^^^^^
snap-00b67e45a0a84ec32********************
2018-07-08 14:02:08+00:00
DELETED^^^^^^^^^^^^^^^^^^
snap-0554604daa7202a4b********************
2018-07-07 15:37:21+00:00
DELETED^^^^^^^^^^^^^^^^^^
snap-07b7d235e82414dda********************
2018-07-07 15:36:21+00:00
DELETED^^^^^^^^^^^^^^^^^^

Number Deleted: 19

While Leeroy celebrates his victory, you head back to EC2 and check and note that no snapshots have been deleted. That’s because in order to run this script in a “live run” you must uncomment the line:

client.delete_snapshot(SnapshotId=id)

So, assuming the above console output does list only snapshots you want deleted, and assuming Leeroy hasn’t broken anything, you can uncomment that line, save, and then walk over to the corner slap Leeroy’s hand before finally pressing the “Test” button like a responsible Dev Ops yourself.

Depending how many snapshots meeting the delete criteria are found, this script can take a long time to run. I estimate about 300 – 350 snapshots can be deleted every 5 minute run. Using the 15 minute max Lambda execution time one could wipe out about a thousand snapshots in a single run.

Disclaimer: Use at your own risk. Once the “live run” line above has been uncommented this script can and will delete hundreds of snapshots without further warning. With great power comes great responsibility and I recommend you play with the script in “dry run” mode without uncommenting that line for a several Tests to understand exactly how the script will work before running a live run. Indeed, the word “Test” on the button to start this script is a misnomer here since we’re hand firing a script that will do much more than simply “Test”. It will lay waste to potentially petabytes of snapshot data.

Conclusion: This script could be easily adapted to be fired when an EC2 or other kind of snapshot were created, and in such a way, one could tie their pruning script to the creation of new snapshots automatically if one were so inclined.


Leif Hanson is a freelance Web, iOS, and Android developer with years of experience in virtually the entire AWS stack and many related SDKs and APIs across a dozen programming languages.

Posted in AWS Lambda, Past Experience, Python
Write a comment