Cloud Developer Tips

Recapture Unused EC2 Minutes

How much time is “wasted” in the paid-for but unused portion of the hour when you terminate an instance? How can you recapture this time – which represents compute power – and put it to good use? After all, you’ve paid for it already. This article presents a technique for repurposing an instance after you’re “done” with it, until the current billing hour is up. It’s inspired by a tweet from DEVOPS_BORAT:

We have new startup CloudJanitor. We recycle old or unuse cloud instance. Need only your cloud account login!

To clarify, we’re talking about per-hour pricing in public cloud IaaS services, where partial hours consumed are billed as whole hours. AWS EC2 is the most prominent example of a cloud sporting this pricing policy (search for “partial”). In this pricing policy, terminating (or stopping) an instance after it’s been running for 121 minutes results in a usage charge for three hours, “wasting” an extra 59 minutes that you have paid for but not used.

What’s Involved

You might think it’s easy to repurpose an instance: just Stop it (if it’s EBS-backed), change its root volume to a new one, and Start the instance again. Not so fast: Stopping an EC2 instance immediately ends the current billing hour before you can use it all, and when you Start the instance again a new billing hour begins – so we can’t Stop the instance. We also can’t Terminate the instance – that would also immediately curtail the billing hour and prevent us from utilizing it. Instead, we’re going to reboot the instance, which does not affect the billing.

We’ll need an EBS volume that has a bootable distro on it – let’s call this the “beneficiary” volume, because it’s going to benefit from the extra time on the clock. The beneficiary volume should have the same distro as the “normal” root volume has. [Actually, to be more precise, it need only have a distro that works with the same kernel that the instance is currently running.] I’ve tested this technique with Ubuntu 10.04 Lucid and 10.10 Maverick.

One of the great things about the Ubuntu images is how easy it is to play this root volume switcheroo: these distros boot from any volume that has the label uec-rootfs. To change the root volume we’ll change the volume labels, so a different volume is used as the root filesystem upon reboot.

It’s very important to disassociate the instance from all external hooks, such as Auto-Scaling Triggers and Elastic Load Balancers before you repurpose it. Otherwise the beneficiary workload will influence those no-longer-relevant systems. However, this may not be possible if you use hooks that cannot be de-coupled from the instance, such as a CloudWatch Dimension of ImageIdInstanceId, or InstanceType.

The network I/O incurred during the recaptured time may be subject to additional charges. In EC2, only communications between instances in the same availability zone, or between EC2 and S3 in the same region, are free of charge.

You’ll need to make sure the beneficiary workload only accepts communications on ports that are open in the normal instance’s security groups. It’s not possible to add or remove security groups while an instance is running. You also wouldn’t want to be modifying the security groups dynamically because that will influence all instances in those security groups – and you may have other instances that are still performing their normal workload.

The really cool thing about this technique is that it can be used on both EBS-backed and instance-store instances. However, you’ll need to prepare separate beneficiary volumes (or snapshots) for 32-bit and 64-bit instances.

How to Do it

There are three stages in repurposing an instance:

  1. Preparing the beneficiary volume (or snapshot).
  2. Preparing the normal workload image.
  3. Actually repurposing the instance.

Stages 1 and 2 are only performed once. Stage 3 is performed for every instance you want to repurpose.

Preparing the beneficiary snapshot

First we’re going to prepare the beneficiary snapshot. Beginning with a pristine Ubuntu 10.10 Maverick EBS-based instance (at the time of publishing this article that’s ami-ccf405a5 for 32-bit instances), let’s create a clone of the root filesystem:

ec2-run-instances ami-ccf405a5 -k my-keypair -t m1.small -g default

ec2-describe-instances $instanceId #use the instanceId outputted from the previous command

Wait for the instance to be “running”. Once it is, identify the volumeId of the root volume – it will be indicated in the ec2-describe-instances output, the one attached to device /dev/sda1.

At this point you have a running Ubuntu 10.10 instance. For real-world usage you’ll want to customize this instance by installing the beneficiary workload and arranging for it to automatically start up on boot. (I recommend Folding@home as a worthy beneficiary project.)

Now we create the beneficiary snapshot:

ec2-create-snapshot $volumeId #use the volumeId from the previous command

And now we have the beneficiary snapshot.

Preparing the normal workload image

Begin with the same base AMI that you used for the beneficiary snapshot. Launch it and customize it to contain your normal workload stuff. You’ll also need to put in a custom script that will perform the repurposing. Here’s what that script will do:

  1. Determine how much time is left on the clock in the current billing hour. If it’s not enough time to prepare and to reboot into the beneficiary volume, just force ourselves to shut down.
  2. Disassociate any external hooks the instance might participate in: remove it from ELBs, force it to fail any Auto-Scaling health checks, and make sure it’s not handling “normal” workloads anymore.
  3. Attach the beneficiary volume to the instance.
  4. Change the volume labels so the beneficiary volume will become the root filesystem at the next reboot.
  5. Edit the startup scripts on the beneficiary volume to start a self-destruct timer.
  6. Reboot.

The following script performs steps 1, 4, 5, and 6, and clearly indicates where you should perform steps 2 and 3.

#! /bin/bash
# reboot into the attached EBS volume on the specified device, but terminate
# before this billing hour is complete.
# requires the first argument to be the device on which the EBS volume is attached

safetyMarginMinutes=1 # set this to how long it takes to attach and reboot

# make sure we have at least "safetyMargin" minutes left this hour
if wget -q -O $t ; then
	# add 60 seconds artificially as a safety margin
	let runningSecs=$(( `date +%s` - `date -r $t +%s` ))+60
	rm -f $t
	let runningSecsThisHour=$runningSecs%3600
	let runningMinsThisHour=$runningSecsThisHour/60
	let leftMins=60-$runningMinsThisHour-$safetyMarginMinutes
	# start shutdown one minute earlier than actually required
	let shutdownDelayMins=$leftMins-1
	if [[ $shutdownDelayMins < 2 || $shutdownDelayMins > 59 ]]; then
		echo "Shutting down now."
		shutdown -h now
		exit 0

## here is where you would disassociate this instance from ELBs,
# force it to fail AutoScaling health checks, and otherwise make sure
# it does not participate in "normal" activities.

## here is where you would attach the beneficiary volume to $device
# ec2-create-volume --snapshot snap-00000000 -z this_availability_zone
# dont forget to wait until the volume is "available"

# ec2-attach-volume . . . and don't forget to wait until the volume is "attached"

## (optionally) force the beneficiary volume to be deleted when this instance terminates:
# ec2-modify-instance-attribute --block-device-mapping '$device=::true' this_instance_id

## get the beneficiary volume ready to be rebooted into
# change the filesystem labels
e2label /dev/sda1 old-uec-rootfs
e2label $device uec-rootfs
# mount the beneficiary volume
mkdir -m 000 $mountPoint
mount $device $mountPoint
# install the self-destruct timer
sed -i -e "s/^exit 0$/shutdown -h +$shutdownDelayMins\nexit 0/" \
# neutralize the self-destruct for subsequent boots
sed -i -e "s#^exit 0#chmod -x /etc/rc.local\nexit 0#" $mountPoint/etc/rc.local
# get out
umount $mountPoint
rmdir $mountPoint

# do the deed
shutdown -r now
exit 0

Save this script into the instance you’re preparing for the normal workload (perhaps, as the root user, into /root/ and chmod it to 744.

Now, make your application detect when its normal workload is completed – this exact method will be very specific to your application. Add in a hook there to invoke this script as the root user, passing it the device on which the beneficiary volume will be attached. For example, the following command will cause the instance to repurpose itself to a volume attached on /dev/sdp:

sudo /root/ /dev/sdp

Once all this is set up, use the usual EC2 AMI creation methods to create your normal workload image (either as an instance-store AMI or as an EBS-backed AMI).

Actually repurposing the instance

Now that everything is prepared, this is the easy part. Your normal workload image can be launched. When it is finished, the repurposing script will be invoked and the instance will be rebooted into the beneficiary volume. The repurposed instance will self-destruct before the billing hour is complete.

You can force this repurposing to happen by explicitly invoking the command at an SSH prompt on the instance:

sudo /root/ /dev/sdp

Notice that you will be immediately kicked out of your SSH session – either the instance will reboot or the instance will terminate itself because there isn’t enough time left in the current billable hour. If it’s just a reboot (which happens when there is significant time left in the current billing hour) then be aware: the SSH host key will most likely be different on the repurposed instance than it was originally, and you may need to clean up your local ~/.ssh/known_hosts file, removing the entry for the instance, before you can SSH in again.

Cloud Developer Tips

Play “Chicken” with Spot Instances

AWS Spot Instances have an interesting economic characteristic that make it possible to game the system a little. Like all EC2 instances, when you initiate termination of a Spot Instance then you incur a charge for the entire hour, even if you’ve used less than a full hour. But, when AWS terminates the instance due to the spot price exceeding the bid price, you do not pay for the current hour.

What if your Spot Instance could wait, after finishing its work, to see if AWS will terminate it involuntarily in this hour and avoid the hour’s cost? In the worst case, your instance can kill itself in the last few minutes of the hour and you will not have incurred any extra unplanned cost. In the best case, the spot price will rise above the instance’s bid price before the hour is up, AWS will terminate the instance involuntarily, and you will not be charged for that entire hour. Wouldn’t this technique reduce costs, especially when performed at large scale?

I call this technique Playing Chicken, based on the game of that name, because it shares similar characteristics to the game:

  • Whoever “swerves” (terminates) first, loses (pays for the hour)
  • If nobody “swerves” (terminates), then an undesirable situation occurs (the instance remains running)

How to Play Chicken

Playing Chicken is really as simple as running a script on the instance when you’re done with the work. Here’s such a script:

#! /bin/bash
if wget -q -O $t ; then
	# add 60 seconds artificially as a safety margin
	let runningSecs=$(( `date +%s` - `date -r $t +%s` ))+60
	rm -f $t
	let runningSecsThisHour=$runningSecs%3600
	let runningMinsThisHour=$runningSecsThisHour/60
	let leftMins=60-$runningMinsThisHour
	# start shutdown one minute earlier than actually required
	let shutdownDelayMins=$leftMins-1
	if [[ $shutdownDelayMins > 1 && $shutdownDelayMins < 60 ]]; then
		echo "Shutting down in $shutdownDelayMins mins."
		# TODO: Notify off-instance listener that the game of chicken has begun
		sudo shutdown -h +$shutdownDelayMins
		echo "Shutting down now."
		sudo shutdown -h now
	exit 0
echo "Failed to determine remaining minutes in this billable hour. Terminating now."
sudo shutdown -h now
exit 1

This script uses the technique published by Dmitriy Samovskiy to determine the launch time of the current instance without using the EC2 API, using the instance meta-data instead. We include a safety margin of two minutes: accounting for the remaining time conservatively adding 1 minute, and beginning the shutdown sequence one minute earlier.

You would run this script on the instance when the Spot Instance is done with its work instead of terminating the instance immediately. You also can add a hook at the indicated place to notify an off-instance listener that the game of chicken has begun, to allow you to track the savings delivered by this technique.

Warning: Make sure you really understand what this script does before you use it. If you mistakenly schedule an instance to be shut down you can cancel it with this command, run on the instance: sudo shutdown -c

How Much is Saved by Playing Chicken?

The extent to which you can benefit from playing chicken depends on a number of factors:

  • The difference between the spot price and your instance’s bid price. The further away the spot price is from your bid, the less likely it is that the spot price will hit the bid and save you money.
  • The volatility of the spot price. The more volatile the spot price, the more likely it will hit the bid and save you money.
  • The number of Spot Instances you terminate in a given period of time. If you normally don’t terminate any Spot Instances then you won’t save anything; if you terminate many then you can potentially save an hour’s worth of cost for each of them.
  • The EC2 Region and instance type. The actual spot price varies by region and instance type, so the potential savings depends on these factors as well.

I’m looking for help to work out a model that can describe the potential savings. If you are interested and able to help with the financial math, please get in touch.

Update: Hat tip to Simon Wardley who pointed out the site CloudExchange that shows great visualizations of the spot prices by region and instance type. This may help you formulate a bidding strategy.

Cloud Developer Tips The Business of IT

CloudConnect 2011 Platforms and Ecosystems BOF

This March 7-10 2011 I will be in Santa Clara for the CloudConnect conference. There are many reasons you should go too, and I’d like to highlight one session that you should not miss: the Platforms and Ecosystems BOF, on Tuesday March 8 at 6:00 – 7:30 PM. Read on for a detailed description of this BOF session and why it promises to be worthwhile.

[Full disclosure: I’m the track chair for the Design Patterns track and I’m running the Platforms and Ecosystems BOF. The event organizers are sponsoring my hotel for the conference, and like all conference speakers my admission to the event is covered.]

CloudConnect 2011 promises to be a high-quality conference, as last year’s was. This year you will be able to learn all about design patterns for cloud applications in the Design Patterns track I’m leading. You’ll also be able to hear from an all-star lineup about many aspects of using cloud: cloud economics, cloud security, culture, risks, and governance, data and storage, devops and automation, performance and monitoring, and private clouds.

But I’m most looking forward to the Platforms and Ecosystems BOF because the format of the event promises to foster great discussions.

The BOF Format

The BOF will be conducted as a… well, it’s hard to describe in words only, so here is a picture:

BOF Overview

At three fixed points around the outside of the room will be three topics: Public IaaS, Public PaaS, and Private Cloud Stacks. There will also be three themes which rotate around the room at each interval; these themes are: Workload Portability, Monitoring & Control, and Avoiding Vendor Lock-in. At any one time there will be three discussions taking place, one in each corner, focusing on the particular combination of that theme and topic.

Here is an example of the first set of discussions:

BOF Session 1

And here is the second set of discussions, which will take place after one “turn” of the inner “wheel”:

BOF Session 2

And here is the final set of discussions, after the second turn of the inner wheel:

BOF Session 3

In all, nine discussions are conducted. Here is a single matrix to summarize:

BOF Summary Matrix

Anatomy of a Discussion

What makes a discussion worthwhile? Interesting questions, focus, and varied opinions. The discussions in the BOF will have all of these elements.

Interesting Questions

The questions above are just “seeder” questions. They are representative questions related to the intersection of each topic and theme, but they are by no means the only questions. Do you have questions you’d like to see discussed? Please, leave a comment below. Better yet, attend the BOF and raise them there, in the appropriate discussion.


Nothing sucks more than a pointless tangent or a a single person monopolizing the floor. Each discussion will be shepherded by a capable moderator, who will keep things focused on the subject area and encourage everyone’s participation.

Varied Opinions

Topic experts and theme experts will participate in every discussion.

Vendors will be present as well – but the moderators will make sure they do not abuse the forum.

And interested parties – such as you! – will be there too. In unconference style, the audience will drive the discussion.

These three elements all together provide the necessary ingredients to make sure things stay interesting.

At the Center

What, you may ask, is at the center of the room? I’m glad you asked. There’ll be beer and refreshments at the center. This is also where you can conduct spin-off discussions if you like.

The Platforms and Ecosystems BOF at CloudConnect is the perfect place to bring your cloud insights, case studies, anecdotes, and questions. It’s a great forum to discuss the ideas you’ve picked up during the day at the conference, in a less formal atmosphere where deep discussion is on the agenda.

Here’s a discount code for 25% off registrationCNXFCC03. I hope to see you there.