Categories
Cloud Developer Tips

S3 Reduced Redundancy Storage with Simple Notification Service: What, Why, and When

AWS recently added support for receiving Simple Notification Service notifications when S3 loses a Reduced Redundancy Storage S3 object. This raises a number of questions:

  • What the heck does that even mean?
  • Why would I want to do that?
  • Under what conditions does it make financial sense to do that?

Let’s take a look at these questions, and we’ll also do a bit of brainstorming (please participate!) to design a service that puts it all together.

What is S3 Reduced Redundancy Storage?

Standard objects stored in S3 have “eleven nines” of durability annually. This means 99.999999999% of your objects stored in S3 will still be there after one year. On average, you will need to store 100,000,000,000 – that’s one hundred billion – objects in standard S3 storage before you will, on average, have one of them disappear over a year’s time. Pretty great.

Reduced Redundancy Storage (RRS) is a different class of S3 storage that, in effect, has a lower durability: 99.99% annually. On average, you will need to store only 10,000 objects in RRS S3 before you should expect one of them to disappear over a year’s time. Not quite as great, but still more than 400 times better than a traditional hard drive.

When an RRS object is lost S3 will return an HTTP 405 response code, and your application is supposed to be built to understand that and take the appropriate action: most likely regenerate the object from its source objects, which have been stored elsewhere more reliably – probably in standard eleven-nines S3. It’s less expensive for AWS to provide a lower durability class of service, and therefore RRS storage is priced accordingly: it’s about 2/3 the cost of standard S3 storage.

RRS is great for derived objects – for example, image thumbnails. The source object – the full-quality image or video – can be used to recreate the derived object – the thumbnail – without losing any information. All it costs to create the derived object is time and CPU power. And that’s most likely why you’re creating the derived objects and storing them in S3: to act as a cache so the app server does not need to spend time and CPU power recreating them for every request. Using S3 RRS as a cache will save you 1/3 of your storage costs for the derived objects, but  you’ll need to occasionally recreate a derived object in your application.

How Do You Handle Objects Stored in RRS?

If you serve the derived objects to clients directly from S3 – as many web apps do with their images – your clients will occasionally get a HTTP 405 response code (about once a year for every 10,000 RRS objects stored). The more objects you store the higher the likelihood of a client’s browser encountering a HTTP 405 error – and most browsers show ugly messages when they get a 405 error. So your application should do some checking.

To get your application to check for a lost object you can do the following: Send S3 an HTTP HEAD request for the object before giving the client its URL. If the object exists then the HEAD request will succeed. If the object is lost the HEAD request will return a 405 error. Once you’re sure the object is in S3 (either the HEAD request succeeded, or you recreated the derived object and stored it again in S3), give the object’s URL to the client.

All that HEAD checking is a lot of overhead: each S3 RRS URL needs to be checked every time it’s served. You can add a cache of the URL of objects you’ve checked recently and skip those. This will cut down on the overhead and reduce your S3 bill – remember that each HEAD request costs 1/10,000 of a cent – but it’s still a bunch of unnecessary work because most of the time you check its HEAD the object will still be there.

Using Simple Notification Service with RRS

Wouldn’t it be great if you could be notified when S3 RRS loses an object?

You can. AWS’s announcement introduces a way to receive notification – via Simple Notification Service, SNS – when S3 RRS detects that an object has been lost. This means you no longer need your application to check for 405s before serving objects. Instead you can have your application listen for SNS notifications (either via HTTP or via email or via SQS) and proactively process them to restore lost objects.

Okay, it’s not really true that your application no longer needs to check for lost objects. The latency between the actual loss of an object and the time you recreate and replace it is still nonzero, and during that time you probably want your application to behave nicely.

[An aside: I do wonder what the expected latency is between the object’s loss and the SNS notification. I’ve asked on the Forums and in a comment to Jeff Barr’s blog post – I’ll update this article when I have an answer.]

When Does it Make Financial Sense to Use S3 RRS?

While you save on storage costs for using S3 RRS you still need to devote resources to recreating any lost objects. How can you decide when it makes sense to go with RRS despite the need to recreate lost objects?

There are a number of factors that influence the cost of recreating lost derived objects:

  • Bandwidth to get the source object from S3 and return the derived object to S3. If you perform the processing inside the same EC2 region as the S3 region you’re using then this cost is zero.
  • CPU to perform the transformation of the source object into the derived object.
  • S3 requests for GETting the source object and PUTting the derived object.

I’ve prepared a spreadsheet analyzing these costs for various different numbers of objects, sizes of objects, and CPU-hours required for each derived object.

For 100,000 source objects of average 5MB size stored in Standard S3, each of which creates 5 derived objects of average 500KB size stored in RRS and requiring 1 second of CPU time to recreate, the savings in choosing RRS is $12.50 per month. Accounting for the cost of recreating lost derived objects reduces that savings to $12.37.

For the same types of objects but requiring 15 minutes of CPU time to recreate each derived object the net savings overall is $12.28. Still very close to the entire savings generated by using RRS.

For up to about 500,000 source objects it doesn’t pay to launch a dedicated m1.small instance just for the sake of recreating lost RRS objects. An m1.small costs $61.20 per month, which is approximately the same as the net savings from 500,000 source objects of average 5MB size with 5 derived objects each of average size 500KB. At this level of usage, if you have spare capacity on an existing instance then it would make financial sense to run the recreating process there.

For larger objects the savings is also almost the entire amount saved by using RRS, and the amounts saved are larger than the cost of a single m1.small so it already pays to launch your own instance for the processing.

For larger numbers of objects the savings is also almost the entire amount saved by using RRS.

As far down as you go in the spreadsheet, and as much as you may play with the numbers, it makes financial sense to use RRS and have a mechanism to recreate derived objects.

Which leads us to the the brainstorming.

Why Should I Worry About Lost Objects?

Let’s face it, nobody wants to operate a service that is not core to their business. Most likely, creating the derived objects from the source object is not your business core competency. Creating thumbnails and still frame video captures is commodity stuff.

So let’s imagine a service that does the transformation, storage in S3, and maintenance of RRS derived objects for you so you don’t have to.

You’d drop off your source object in your bucket in S3. Then you’d send an SQS message to the service containing the new source object’s key and a list of the transformations you want applied. As Jeff Bar suggests in his blog, the service would process the message and create derived objects (stored in RRS) whose keys (the name) would be composed of the source object’s name and the name of the transformation applied. You’d know how to construct the name of every derived object, so you would know how to access them. The service would subscribe to the RRS SNS notifications and recreate the derived objects when they are lost.

This service would need a way for clients to discover the supported file types and the supported transformations for each file type.

As we pointed out above, there is a lot of potential financial savings in using RRS, so such a service has plenty of margin to price itself profitably, below the cost of standard S3 storage.

What else would such a service need? Please comment.

If you build such a service, please cut me in for 30% for giving you the idea. Or, at least acknowledge me in your blog.

Categories
Cloud Developer Tips

Storing AWS Credentials on an EBS Snapshot Securely

Thanks to reader Ewout and his comment on my article How to Keep Your AWS Credentials on an EC2 Instance Securely for suggesting an additional method of transferring credentials: via a snapshot. It’s similar to burning credentials into an AMI, but easier to do and less prone to accidental inclusion in the application’s AMI.

Read on for a discussion of how to implement this technique.

How to Store AWS Credentials on an EBS Snapshot

This is how to store a secret on an EBS snapshot. You do this only once, or whenever you need to change the secret.

We’re going to automate as much as possible to make it easy to do. Here’s the command that launches an instance with a newly created 1GB EBS volume, formats it, mounts it, and sets up the root user to be accessible via ssh and scp. The new EBS volume created will not be deleted when the instance is terminated.

$ ec2-run-instances -b /dev/sdf=:1:false -t m1.small -k \
my-keypair -g default ami-6743ae0e -d '#! /bin/bash
yes | mkfs.ext3 /dev/sdf
mkdir -m 000 /secretVol
mount -t ext3 -o noatime /dev/sdf /secretVol
cp /home/ubuntu/.ssh/authorized_keys /root/.ssh/'

We have set up the root user to be accessible via ssh and scp so we can store the secrets on the EBS volume as the root user by directly copying them to the volume as root. Here’s how we do that:

$ ls -l
total 24
-r--r--r-- 1 shlomo  shlomo  916 Jun 20  2010 cert-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem
-r--------  1 shlomo  shlomo   90 Jun  1  2010 creds
-r-------- 1 shlomo  shlomo  926 Jun 20  2010 pk-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem
$ scp -i /path/to/id_rsa-my-keypair * root@174.129.83.237:/secretVol/

Our secret is now on the EBS volume, visible only to the root user.

We’re almost done. Of course you want to test that your application can access the secret appropriately. Once you’ve done that you can terminate the instance – don’t worry, the volume will not be deleted due to the “:false” specification in our launch command.

$ ec2-terminate-instance $instance
$ ec2-describe-volumes
VOLUME	vol-7ce48a15	1		us-east-1b	available	2010-07-18T17:34:01+0000
VOLUME	vol-7ee48a17	15	snap-5e4bec36	us-east-1b	deleting	2010-07-18T17:34:02+0000

Note that the root EBS volume is being deleted but the new 1GB volume we created and stored the secret on is intact.

Now we’re ready for the final two steps:
Snapshot the volume with the secret:

$ ec2-create-snapshot $secretVolume
SNAPSHOT	snap-2ec73045	vol-7ce48a15	pending	2010-07-18T18:05:39+0000		540528830757	1

And, once the snapshot completes, delete the volume:

$ ec2-describe-snapshots -o self
SNAPSHOT	snap-2ec73045	vol-7ce48a15	completed	2010-07-18T18:05:40+0000	100%	540528830757	1
$ ec2-delete-volume $secretVolume
VOLUME	vol-7ce48a15
# save the snapshot ID
$ secretSnapshot=snap-2ec73045

Now you have a snapshot $secretSnapshot with your credentials stored on it.

How to Use Credentials Stored on an EBS Snapshot

Of course you can create a new volume from the snapshot, attach the volume to your instance, mount the volume to the filesystem, and access the secrets via the root user. But here’s a way to do all that at instance launch time:

$ ec2-run-instances un-instances -b /dev/sdf=$secretSnapshot -t m1.small -k \
my-keypair -g default ami-6743ae0e -d '#! /bin/bash
mkdir -m 000 /secretVol
mount -t ext3 -o noatime /dev/sdf /secretVol
# make sure it gets remounted if we reboot
echo "/dev/sdf /secretVol ext3 noatime 0 0" > /etc/fstab'

This one-liner uses the -b option of ec2-run-instances to specify a new volume be created from $secretSnapshot, attached to /dev/sdf, and this volume will be automatically deleted when the instance terminates. The user-data script sets up the filesystem mount point and mounts the volume there, also ensuring that the volume will be remounted if the instance reboots.
Check it out, a new volume was created for /dev/sdf:

$ ec2-describe-instances
RESERVATION	r-e4f2608f	540528830757	default
INSTANCE	i-155b857f	ami-6743ae0e			pending	my-keypair	0		m1.small	2010-07-19T15:51:13+0000	us-east-1b	aki-5f15f636	ari-d5709dbc	monitoring-disabled					ebs
BLOCKDEVICE	/dev/sda1	vol-8a721be3	2010-07-19T15:51:22.000Z
BLOCKDEVICE	/dev/sdf	vol-88721be1	2010-07-19T15:51:22.000Z

Let’s make sure the files are there. SSHing into the instance (as the ubuntu user) we then see:

$ ls -la /secretVol
ls: cannot open directory /secretVol: Permission denied
$ sudo ls -l /secretVol
total 28
-r--------  1 root root   916 2010-07-18 17:52 cert-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem
-r--------  1 root root    90 2010-07-18 17:52 creds
dr--------  2 root   root   16384 2010-07-18 17:42 lost+found
-r--------  1 root root   926 2010-07-18 17:52 pk-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem

Your application running the instance (you’ll install it by adding to the user-data script, right?) will need root privileges to access those secrets.