The Business of IT

Cloud pricing wrinkles and tea in China

As you’ve probably heard, Amazon Web Services reduced their on-demand cloud prices significantly last week. You’d think customers would be happy across the board, but that’s not the case. Here’s why, and what will happen as a result.

As discussed previously, AWS customers translate their capacity planning into Reserved Instance purchases, based on the relative savings these RIs provide over on-demand prices. But, when on-demand prices are reduced without a corresponding reduction in the instance-hour price for RIs – as happened last week – the RI breakeven point shifts and upsets the optimal RI coverage calculus. AWS customers who purchased RIs before the price reduction can find themselves stuck with inventory that now costs more per hour, in amortized terms, than the cost if they had not purchased the RI. I have several clients in this situation, and none are very happy about it.

Tea in China

I can imagine the counterargument to this thinking, no doubt coming from the mouth of a cool-headed economist: If the customer was happy when she bought tea at a 10% discount, why should she be any less happy when the price is further reduced a day or a week later? She deemed the value of the tea to be worth the cost when she bought it, and that value has not changed. By rational reasoning, she should be equally satisfied today.

But people don’t think purely in economic terms – emotions play a part in their decisions as well. And the AWS customers who are feeling unloved by this price reduction are the very customers, I imagine, that AWS wants to keep most: These customers have made long-term commitments to AWS already. So, it won’t be long until we hear another price reduction announcement from AWS, specifically directed at this customer segment. Don’t be surprised to see AWS granting a proportional hourly price reduction – or an equivalent credit – for already-purchased reserved instances.

The Business of IT

Cloud price reductions and capacity planning

Last week both Google Cloud Platform and Amazon Web Services reduced their prices for cloud computing services significantly to comparable levels, and both now offer significant discounts for long-term usage. Yet, though the two cloud services may seem similar, their radically different long-term pricing models reveal just how different these cloud offerings really are.

Whose responsibility is capacity planning?

The core difference between GCP and AWS is in capacity planning: Whose responsibility is it? In AWS, the customer owns their own capacity planning. If the customer can accurately predict their needs for the long term, they can purchase Reserved Instances and save significantly as compared to the on-demand cost. Whereas in GCE, Google owns the capacity planning. GCE customers are granted a Sustained Use discount at the end of the month for resources that were active for a significant portion of the month. The GCE customer might track their expected vs. actual costs and be pleasantly surprised when their bill at the end of the month is lower than expected, but the GCE customer cannot a priori translate their capacity planning prowess into reduced costs.

This begs the question: Are cloud consumers actually good at capacity planning? Sadly, no. Capacity planning in the pre-cloud age of the data center was dicey at best, with managers relying on overprovisioning to save their necks in the face of so much uncertainty. This overprovisioning is no longer a time-to-market-driven necessity in the cloud – but classic capacity planning is no more accurate in the cloud than it was in the data center. That’s why a host of new cloud financial management tools, such as Cloudyn, have sprung up in recent years: These tools help the cloud consumer predict usage and optimize their up-front commitments to maximize cost savings.

Using cloud financial management tools can help you reduce long-term costs, but only to the extent that the cloud provider rewards accurate capacity planning with lower prices – as does AWS. With Google Cloud, there’s no such pricing lever to pull.

But, make no mistake: Choose your cloud provider based on technical and business merits, not based solely on the long-term pricing model. Contact me if you need help.

Cloud Developer Tips

Using the new awscli with Chef and OpsWorks

We used to go through so much trouble to manipulate Amazon Web Services resources from the command-line. There were separate AWS command-line tools for each service, and each needed to be downloaded and had unique configuration, and each had its own unique output format. Eric Hammond wrote in detail about this. With the new awscli tool by Mitch Garnaat (of python boto fame) and his team at AWS, we have a single tool that does it all, uniformly, and can be simply configured. Thanks, Amazon!

But what if you need to use the awscli from within a Chef recipe? How do you install and configure the awscli with Chef and OpsWorks? I had this very same question on some recent projects. Here’s how to do this easily.

The awscli cookbook

I created the awscli cookbook to install and configure the awscli command line tools. Add this cookbook to your deployment and include the awscli::default recipe in your run list, and watch it work.

This cookbook supports some common deployment scenarios:

  • Installing the awscli “plain vanilla”, during the Chef “execute” stage.
  • Installing the awscli during the Chef “compile” phase, so it can be called from within Chef.
  • Configuring the awscli’s AWS access credentials.
  • Configuring multiple configuration profiles.

It does not need to be run on an AWS instance – you can use this cookbook to install the awscli anywhere that you can run Chef.

See the cookbook’s README for more details.

Using awscli with IAM Roles

It’s not obvious from the awscli documentation, but when an instance has an associated IAM Role, the awscli will automatically read its credentials from the instance’s IAM metadata. This will often be the case when you are using OpsWorks to launch your instances. In these circumstances, you don’t need to configure any awscli access credentials. But, make sure you use IAM to grant the IAM Role permissions for the AWS API calls you’ll be making with the awscli.

Why use the awscli from within Chef?

Chef recipes are written in Ruby, and it’s easy to parse and manipulate JSON in Ruby. The awscli outputs its responses in JSON format – so it’s easy to parse those responses into your ruby code. The two make a very handy combination. For example, here is how to wait for a newly-created EBS volume to be available (so you can attach it to an instance without an error):

    require 'json'
    volume_id = "vol-12345678"
    region = "us-east-1"
    command = ""
    case node[:platform]
    when 'debian','ubuntu'
      command << "/usr/local/bin/"
    when 'redhat','centos','fedora','amazon','scientific'
      command << "/usr/bin/"
    command << "aws --region #{region} ec2 describe-volumes --volume-ids #{volume_id} --output json""Waiting for volume #{volume_id} to be available")
    loop do
      shell ="#{command} 2>&1")
      if !shell.exitstatus
        raise "#{command} failed:" + shell.stdout
      jdoc = JSON.parse(shell.stdout)
      vol_state = jdoc["Volumes"].first["State"]
      Chef::Log.debug("#{volume_id} is #{vol_state}")
      if vol_state=="available"
        Chef::Log.debug("Waiting 5 more seconds for volume #{volume_id} to be ready...")
        sleep 5

The last ten lines show how easy it is to access the awscli output within your ruby code (in your chef recipe).

Cloud Developer Tips

Fragment of heretofore unknown Tractate of Babylonian Talmud discovered

Ancient wisdom apparently has much to offer modern cloud application architects. This fragment was discovered in a shadowy basement in the Tel Aviv area of Israel.

MasechetDBKammaSee a PDF of the fragment

This finding clearly shows that ancient cloud application architects in the great talmudic academies of Babylon struggled with the transition away from classic databases. At the time, apparently, a widely used solution was known as Urim veTumim (“oracle”). Yet this database was unsuited for reliable use in cloud applications, and the text explores the reasons behind that unsuitability.

Okay, here’s the real story: I created this for a client in 2011, and I was delighted to find it on my computer serendipitously today. It reflects the state of the art at the time. Translation into plain English:

1. Oracle RAC does not run on EC2

2. Achieving Oracle high availability on EC2 is a problem: there is no shared device, and relying on NFS is problematic.

3. The cloud frameworks (enStratus, etc.) do not currently support Oracle.

Cloud Developer Tips

Poking Holes in CloudFront-Based Sites for Dynamic Content

As of Februrary 2011 AWS S3 has been able to serve static websites, giving you superior availability for unchanging (or seldom-changing) content. But most websites today are not static; dynamic elements drive essential features such as personalized pages, targeted advertisements, and shopping carts. Today’s release from AWS CloudFront: Support for Dynamic Content alleviates some of the challenge of running dynamic websites. You can now configure a custom set of URL patterns to always be passed through to the origin server. This allows you to “poke holes” in the CDN cache for providing dynamic content.

Some web sites, such as this one, appear to be static but are driven by dynamic code. WordPress renders each page on every request. Though excellent tools exist to provide caching for WordPress, these tools still require your web server to process WordPress’s PHP scripts. Heavy traffic or poor hosting choices can still overwhelm your web server.

Poking Holes

It’s relatively easy to configure your entire domain to be served from CloudFront. What do you need to think about when you poke holes in a CloudFront distribution? Here are two important items: admin pages and form actions.

Admin pages

The last thing you want is for your site’s control panel to be statically served. You need an accurate picture of the current situation in order to manage your site. In WordPress, this includes everything in the /wp-admin/* path as well as the /wp-login.php page.

Form actions

Your site most likely does something with the information people submit in forms – search with it, store it, or otherwise process it. If not, why collect it? In order to process the submitted information you need to handle it dynamically in your web application, and that means the submit action can’t lead to a static page. Make sure your form submission actions – such as search and feedback links – pass through to the webserver directly.

A great technique for feedback forms is to use WuFoo, where you can visually construct forms and integrate them into your website by simple Javascipt. This means that your page can remain static – the Javascript code dynamically inserts the form, and WuFoo handles the processing, stops the spam, and sends you the results via email.

When Content Isn’t So Dynamic

Sometimes content changes infrequently – for example, your favicon probably changes rarely. Blog posts, once written, seldom change. Serving these items from a CDN is still an effective way to reduce load on your webserver and reduce latency for your users. But when things do change – such as updated images, additional comments, or new posts, how can you use CloudFront to serve the new content? How can you make sure CloudFront works well with your updated content?

Object versioning

A common technique used to enable updating static objects is called object versioning. This means adding a version number to the file name, and updating the link to the file when a new version is available. This technique also allows an entire set of resources to be versioned at once, when you create a versioned directory name to hold the resources.

Object versioning works well with CloudFront. In fact, it is the recommended way to update static resources that change infrequently. The alternative method, invalidating objects, is more expensive and difficult to control.

Combining the Above Techniques

You can use a combination of the above techniques to create a low-latency service that caches sometimes-dynamic content. For example, a WordPress blog could be optimized by integrating these techniques into the WordPress engine, perhaps via a plugin. Here’s what you’d do:

  • Create a CloudFront distribution for the site, setting its custom origin to point to the webserver.
  • Poke holes in the distribution necessary for the admin, login, and forms pages.
  • Create new versions of pages, images, etc. when they change, and new versions of the pages that refer to them.

Even though WordPress generates each page via PHP, this collection of techniques allows the pages to be served via CloudFront and also be updated when changes occur. I don’t know of a plugin that combines all these techniques, but I suspect the good folks at W3-EDGE, producers of the W3 Total Cache performance optimization framework I mentioned above, are already working on it.

Cloud Developer Tips

Scalability and HA Limitations of AWS Marketplace AMIs

Reading AWS’s recent announcement of the AWS Marketplace you would think that it provides a catalog of click-to-deploy, highly-available, scalable applications running on EC2. You’d be partially right: the applications available in the AWS Marketplace are deployable in only a few clicks. But highly-available and scalable services will be difficult to build using Marketplace images. Here’s why.

Essential Ingredients of HA and Scalability on AWS

AWS makes it easy to run scalable, HA applications via several features. Not all applications use all of these features, but it would be very difficult to provide scalable and highly available service without using at least one of these:

  • Elastic Load Balancing
  • Auto Scaling
  • Elastic Block Storage volumes

ELB and AutoScaling both enable horizontal scalability: spreading load and controlling deployment size via first-class-citizen tools integrated into the AWS environment. They also enable availability by providing an automated way to recover from the failure of individual instances. [Scalability and availability often move in lock-step; improving one usually improves the other.] EBS volumes provide improved data availability: data can be retrieved off of dying instances – and often are used in RAID configurations to improve write performance.

AWS Marketplace Limitations

The AWS Marketplace has limitations that cripple two of the above features, making highly available and scalable services much more difficult to provide.

Marketplace AMI instances cannot be added to an ELB

Update 17 May 2012: The Product Manager for AWS Marketplace informed me that AWS Marketplace instances are now capable of being used with ELB. This limitation no longer exists.

Try it. You’ll get this error message:

 Error: InvalidInstance: ElasticLoadBalancing does not support the paid AMI or supported AMI of instance i-10abf677.

There is no mention of this limitation in the relevant ELB documentation.

This constraint severely limits horizontal scalability for Marketplace AMIs. Without ELB it’s difficult to share web traffic to multiple identically-configured instances of these AMIs. The AWS Marketplace offers several categories of AMIs, including Application Stacks (RoR, LAMP, etc.) and Application Servers (JBoss, WebSphere, etc.), that are typically deployed behind an ELB – but that won’t work with these Marketplace AMIs.

Root EBS volumes of Marketplace AMI instances cannot be mounted on non-root devices

Because all Marketplace AMIs are EBS-backed, you might think that there is a quick path to recover data if the instance dies unexpectedly: simply attach the root EBS volume to another device on another instance and get the data from there. But don’t rely on that – it won’t work. Here is what happens when you try to mount the root EBS volume from an instance of a Marketplace AMI on an another instance:

Failed to attach EBS volume 'New-Mongo-ROOT-VOLUME' (/dev/sdj) to 'New-Mongo' due to: OperationNotPermitted: 'vol-98c642f7' with Marketplace codes may not be attached as a secondary device.

This limitation is described here in AWS documentation:

If a volume has an AWS Marketplace product code:

  • The volume can only be attached to the root device of a stopped instance.
  • You must be subscribed to the AWS Marketplace code that is on the volume.
  • The configuration (instance type, operating system) of the instance must support that specific AWS Marketplace code. For example, you cannot take a volume from a Windows instance and attach it to a Linux instance.
  • AWS Marketplace product codes will be copied from the volume to the instance.

Closing a Licensing Loophole

Why did AWS place these constraints on using Marketplace-derived EBS volumes? To help Sellers keep control of the code they place into their AMI. Without the above limitations it’s simple for the purchaser of a Marketplace AMI to clone the root filesystem and create as many clones of that Marketplace-derived instance without necessarily being licensed to do so and without paying the premiums set by the Seller. It’s to close a licensing loophole.

AWS did a relatively thorough job of closing that hole. Here is a section of the current (25 April 2012) AWS overview of the EC2 EBS API and Command-Line Tools, with relevant Marketplace controls highlighted:

Command and API Action Description
ec2-create-volumeCreateVolume Creates a new Amazon EBS volume using the specified size or creates a new volume based on a previously created snapshot. Any AWS Marketplace product codes from the snapshot are propagated to the volume. For an overview of the AWS Marketplace, go to For details on how to use the AWS Marketplace, see AWS Marketplace.
ec2-attach-volumeAttachVolume Attaches the specified volume to a specified instance, exposing the volume using the specified device name. A volume can be attached to only a single instance at any time. The volume and instance must be in the same Availability Zone. The instance must be in the running or stoppedstate.

[Note] Note
If a volume has an AWS Marketplace product code:

  • The volume can only be attached to the root device of a stopped instance.
  • You must be subscribed to the AWS Marketplace code that is on the volume.
  • The configuration (instance type, operating system) of the instance must support that specific AWS Marketplace code. For example, you cannot take a volume from a Windows instance and attach it to a Linux instance.
  • AWS Marketplace product codes will be copied from the volume to the instance.

For an overview of the AWS Marketplace, go to For details on how to use the AWS Marketplace, see AWS Marketplace.

ec2-detach-volumeDetachVolume Detaches the specified volume from the instance it’s attached to. This action does not delete the volume. The volume can be attached to another instance and will have the same data as when it was detached. If the root volume is detached from an instance with an AWS Marketplace product code, then the AWS Marketplace product codes from that volume will no longer be associated with the instance.
ec2-create-snapshotCreateSnapshot Creates a snapshot of the volume you specify. After the snapshot is created, you can use it to create volumes that contain exactly the same data as the original volume. When a snapshot is created, any AWS Marketplace product codes from the volume will be propagated to the snapshot.
ec2-modify-snapshot-attributeModifySnapshotAttribute Modifies permissions for a snapshot (i.e., who can create volumes from the snapshot). You can specify one or more AWS accounts, or specify all to make the snapshot public.

[Note] Note
Snapshots with AWS Marketplace product codes cannot be made public.

The constraints above are meant to maintain the AWS Marketplace product code, the mechanism AWS uses to identify resources (AMIs, snapshots, volumes, and instances) that require Marketplace licensing integration. Note that not all AMIs in the AWS Marketplace have a product code – for example, the Amazon Linux AMI does not have one. AMIs that do not require licensing control (such as Amazon Linux, and Ubuntu without support) do not have AWS Marketplace product codes – but the rest do.

A Hole

There remains a hole in this lockdown scheme. Any instance whose kernel allows booting from a volume based on its volume label can be manipulated into booting from a secondary EBS volume. This requires root privileges on the instance. I have successfully booted an instance of the MongoDB AMI in the AWS Marketplace from a secondary EBS volume created from the Amazon Linux AMI. Anyone exploiting this hole can circumvent the product code lockdown.

Plugging the Hole

Sellers want these licensing controls and lockdowns. Here’s how:

  • Disable the root account.
  • Disable sudo.
  • Prevent user-data from being executed. On the Amazon Linux AMI and Ubuntu AMIs, user-data beginning with a hashbang is executed as root during the startup sequence.

Unfortunately these mitigations result in a crippled instance. Users won’t be able to mount EBS volumes – which requires root access – so data can’t be stored on EBS volumes for better recoverability.

Alternatively, you could develop your AWS Marketplace solutions as SaaS applications. For many potential Sellers this would be a long-term effort.

I’m still looking for good ways to enable scalability and HA of Marketplace AMIs. I welcome your suggestions.

Update 27 April 2012: Amazon Web Services PR has contacted me to say they are actively working on a fix for the ELB limitations, and are also working on removing the limitation related to mounting Marketplace-derived EBS volumes on secondary devices. I’ll update this article when this happens. In the meantime, AWS said that users who want to recover data from Marketplace-derived EBS volumes should reach out to AWS Support for help.

Update 17 May 2012: The Product Manager for AWS Marketplace informed me that AWS Marketplace instances are now capable of being used with ELB.

Cloud Developer Tips

Recapture Unused EC2 Minutes

How much time is “wasted” in the paid-for but unused portion of the hour when you terminate an instance? How can you recapture this time – which represents compute power – and put it to good use? After all, you’ve paid for it already. This article presents a technique for repurposing an instance after you’re “done” with it, until the current billing hour is up. It’s inspired by a tweet from DEVOPS_BORAT:

We have new startup CloudJanitor. We recycle old or unuse cloud instance. Need only your cloud account login!

To clarify, we’re talking about per-hour pricing in public cloud IaaS services, where partial hours consumed are billed as whole hours. AWS EC2 is the most prominent example of a cloud sporting this pricing policy (search for “partial”). In this pricing policy, terminating (or stopping) an instance after it’s been running for 121 minutes results in a usage charge for three hours, “wasting” an extra 59 minutes that you have paid for but not used.

What’s Involved

You might think it’s easy to repurpose an instance: just Stop it (if it’s EBS-backed), change its root volume to a new one, and Start the instance again. Not so fast: Stopping an EC2 instance immediately ends the current billing hour before you can use it all, and when you Start the instance again a new billing hour begins – so we can’t Stop the instance. We also can’t Terminate the instance – that would also immediately curtail the billing hour and prevent us from utilizing it. Instead, we’re going to reboot the instance, which does not affect the billing.

We’ll need an EBS volume that has a bootable distro on it – let’s call this the “beneficiary” volume, because it’s going to benefit from the extra time on the clock. The beneficiary volume should have the same distro as the “normal” root volume has. [Actually, to be more precise, it need only have a distro that works with the same kernel that the instance is currently running.] I’ve tested this technique with Ubuntu 10.04 Lucid and 10.10 Maverick.

One of the great things about the Ubuntu images is how easy it is to play this root volume switcheroo: these distros boot from any volume that has the label uec-rootfs. To change the root volume we’ll change the volume labels, so a different volume is used as the root filesystem upon reboot.

It’s very important to disassociate the instance from all external hooks, such as Auto-Scaling Triggers and Elastic Load Balancers before you repurpose it. Otherwise the beneficiary workload will influence those no-longer-relevant systems. However, this may not be possible if you use hooks that cannot be de-coupled from the instance, such as a CloudWatch Dimension of ImageIdInstanceId, or InstanceType.

The network I/O incurred during the recaptured time may be subject to additional charges. In EC2, only communications between instances in the same availability zone, or between EC2 and S3 in the same region, are free of charge.

You’ll need to make sure the beneficiary workload only accepts communications on ports that are open in the normal instance’s security groups. It’s not possible to add or remove security groups while an instance is running. You also wouldn’t want to be modifying the security groups dynamically because that will influence all instances in those security groups – and you may have other instances that are still performing their normal workload.

The really cool thing about this technique is that it can be used on both EBS-backed and instance-store instances. However, you’ll need to prepare separate beneficiary volumes (or snapshots) for 32-bit and 64-bit instances.

How to Do it

There are three stages in repurposing an instance:

  1. Preparing the beneficiary volume (or snapshot).
  2. Preparing the normal workload image.
  3. Actually repurposing the instance.

Stages 1 and 2 are only performed once. Stage 3 is performed for every instance you want to repurpose.

Preparing the beneficiary snapshot

First we’re going to prepare the beneficiary snapshot. Beginning with a pristine Ubuntu 10.10 Maverick EBS-based instance (at the time of publishing this article that’s ami-ccf405a5 for 32-bit instances), let’s create a clone of the root filesystem:

ec2-run-instances ami-ccf405a5 -k my-keypair -t m1.small -g default

ec2-describe-instances $instanceId #use the instanceId outputted from the previous command

Wait for the instance to be “running”. Once it is, identify the volumeId of the root volume – it will be indicated in the ec2-describe-instances output, the one attached to device /dev/sda1.

At this point you have a running Ubuntu 10.10 instance. For real-world usage you’ll want to customize this instance by installing the beneficiary workload and arranging for it to automatically start up on boot. (I recommend Folding@home as a worthy beneficiary project.)

Now we create the beneficiary snapshot:

ec2-create-snapshot $volumeId #use the volumeId from the previous command

And now we have the beneficiary snapshot.

Preparing the normal workload image

Begin with the same base AMI that you used for the beneficiary snapshot. Launch it and customize it to contain your normal workload stuff. You’ll also need to put in a custom script that will perform the repurposing. Here’s what that script will do:

  1. Determine how much time is left on the clock in the current billing hour. If it’s not enough time to prepare and to reboot into the beneficiary volume, just force ourselves to shut down.
  2. Disassociate any external hooks the instance might participate in: remove it from ELBs, force it to fail any Auto-Scaling health checks, and make sure it’s not handling “normal” workloads anymore.
  3. Attach the beneficiary volume to the instance.
  4. Change the volume labels so the beneficiary volume will become the root filesystem at the next reboot.
  5. Edit the startup scripts on the beneficiary volume to start a self-destruct timer.
  6. Reboot.

The following script performs steps 1, 4, 5, and 6, and clearly indicates where you should perform steps 2 and 3.

#! /bin/bash
# reboot into the attached EBS volume on the specified device, but terminate
# before this billing hour is complete.
# requires the first argument to be the device on which the EBS volume is attached

safetyMarginMinutes=1 # set this to how long it takes to attach and reboot

# make sure we have at least "safetyMargin" minutes left this hour
if wget -q -O $t ; then
	# add 60 seconds artificially as a safety margin
	let runningSecs=$(( `date +%s` - `date -r $t +%s` ))+60
	rm -f $t
	let runningSecsThisHour=$runningSecs%3600
	let runningMinsThisHour=$runningSecsThisHour/60
	let leftMins=60-$runningMinsThisHour-$safetyMarginMinutes
	# start shutdown one minute earlier than actually required
	let shutdownDelayMins=$leftMins-1
	if [[ $shutdownDelayMins < 2 || $shutdownDelayMins > 59 ]]; then
		echo "Shutting down now."
		shutdown -h now
		exit 0

## here is where you would disassociate this instance from ELBs,
# force it to fail AutoScaling health checks, and otherwise make sure
# it does not participate in "normal" activities.

## here is where you would attach the beneficiary volume to $device
# ec2-create-volume --snapshot snap-00000000 -z this_availability_zone
# dont forget to wait until the volume is "available"

# ec2-attach-volume . . . and don't forget to wait until the volume is "attached"

## (optionally) force the beneficiary volume to be deleted when this instance terminates:
# ec2-modify-instance-attribute --block-device-mapping '$device=::true' this_instance_id

## get the beneficiary volume ready to be rebooted into
# change the filesystem labels
e2label /dev/sda1 old-uec-rootfs
e2label $device uec-rootfs
# mount the beneficiary volume
mkdir -m 000 $mountPoint
mount $device $mountPoint
# install the self-destruct timer
sed -i -e "s/^exit 0$/shutdown -h +$shutdownDelayMins\nexit 0/" \
# neutralize the self-destruct for subsequent boots
sed -i -e "s#^exit 0#chmod -x /etc/rc.local\nexit 0#" $mountPoint/etc/rc.local
# get out
umount $mountPoint
rmdir $mountPoint

# do the deed
shutdown -r now
exit 0

Save this script into the instance you’re preparing for the normal workload (perhaps, as the root user, into /root/ and chmod it to 744.

Now, make your application detect when its normal workload is completed – this exact method will be very specific to your application. Add in a hook there to invoke this script as the root user, passing it the device on which the beneficiary volume will be attached. For example, the following command will cause the instance to repurpose itself to a volume attached on /dev/sdp:

sudo /root/ /dev/sdp

Once all this is set up, use the usual EC2 AMI creation methods to create your normal workload image (either as an instance-store AMI or as an EBS-backed AMI).

Actually repurposing the instance

Now that everything is prepared, this is the easy part. Your normal workload image can be launched. When it is finished, the repurposing script will be invoked and the instance will be rebooted into the beneficiary volume. The repurposed instance will self-destruct before the billing hour is complete.

You can force this repurposing to happen by explicitly invoking the command at an SSH prompt on the instance:

sudo /root/ /dev/sdp

Notice that you will be immediately kicked out of your SSH session – either the instance will reboot or the instance will terminate itself because there isn’t enough time left in the current billable hour. If it’s just a reboot (which happens when there is significant time left in the current billing hour) then be aware: the SSH host key will most likely be different on the repurposed instance than it was originally, and you may need to clean up your local ~/.ssh/known_hosts file, removing the entry for the instance, before you can SSH in again.

Cloud Developer Tips

AWS Auto-Scaling and ELB with Reliable Root Domain Handling

Update May 2011: Now that AWS Route 53 can be used to allow an ELB to host a domain zone apex, the technique described here is no longer necessary. Cool, but not necessary.

Someone really has to implement this. I’ve had this draft sitting around ever since AWS announced support for improved CloudWatch alerts and AutoScaling policies (August 2010), but I haven’t yet turned it into a clear set of commands to follow. If you do, please comment.


You want an auto-scaled, load-balanced pool of web servers to host your site at Unfortunately it’s not so simple, because AWS Elastic Load Balancer can’t be used to host a domain apex (AKA a root domain). One of the longest threads on the AWS Developer Forum discusses this limitation: because ELB utilizes DNS CNAMEs, which are not legal for root domain entries, ELB does not support root domains.

An often-suggested workaround is to use an instance with an Elastic IP address to host the root domain, via standard static DNS, with the web server redirecting all root domain requests to the subdomain (www) served by the ELB. There are four drawbacks to this approach:

  1. The instance with the Elastic IP address is liable to be terminated by auto-scaling, leaving requests to the root domain unanswered.
  2. The instance with the Elastic IP address might fail unnaturally, again leaving requests to the root domain unanswered.
  3. Even when traffic is very low, we need at least two instances running: the one handling the root domain outside the auto-scaled ELB group (due to issue #1) and the one inside the auto-scaled ELB group (to handle the actual traffic hitting the ELB-managed subdomain).
  4. The redirect adds additional latency to requests hitting the root domain.

While we can’t do anything about the fourth issue, what follows is a technique to handle the first three issues.

The Idea

The idea is built on these principles:

  • The instance with the Elastic IP is outside the auto-scaled group so it will not be terminated by auto-scaling.
  • The instance with the Elastic IP is managed using AWS tools to ensure the root domain service is automatically recovered if the instance dies unexpectedly.
  • The auto-scaling group can scale back to zero size, so only a single instance is required to serve low traffic volumes.

How do we put these together?

Here’s how:

  1. Create an AMI for your web server. The AMI will need some special boot-time hooks, which are described below in italics. The web server should be set up to redirect root domain traffic to the subdomain that you’ll want to associate with the ELB, and to serve the subdomain normally.
  2. Create an ELB for the site’s subdomain with a meaningful Health Check (e.g. a URL that exercises representative areas of the application).
  3. Create an AutoScaling group with min=1 and max=1 instances of that AMI. This AutoScaling group will benefit from the default health checks that such groups have, and if EC2 reports the instance is degraded it will be replaced. The LaunchConfiguration for this AutoScaling group should specify user-data that indicates this instance is the “root domain” instance. Upon booting, the instance will notice this flag in the user data, associate the Elastic IP address with itself, an add itself to the ELB.
    Note: At this point, we have a reliably-hosted single instance hosting the root domain and the subdomain.
  4. Create a second AutoScaling group (the “ELB AutoScaling group”) that uses the same AMI, with min=0 instances – the max can be anything you want it to – and set it up to use the ELB’s Health Check. The LaunchConfiguration for this group should not contain the abovementioned special flag – these are not root domain instances.
  5. Create an Alarm that looks at the CPUUtilization across all instances of the AMI, and connect it to the “scale up” and “scale down” Policies for the ELB AutoScaling group.

That is the basic idea. The result will be:

  • The root domain is hosted on an instance that redirects to the ELB subdomain. This instance is managed by a standalone Auto Scaling group that will replace the instance if it becomes degraded. This instance is also a member of the ELB, so it serves the subdomain traffic as well.
  • A second AutoScaling group manages the “overflow” traffic, measured by the CPUUtilization of all the running instances of the AMI.


Here are the missing pieces:

  1. A script that can be run as a boot-time hook that checks the user-data for a special flag. When this flag is detected, the script associates the root domain’s Elastic IP address (which should be specified in the user-data) and adds the instance to the ELB (whose name is also specified in the user-data). This will likely require AWS Credentials to be placed on the instance – perhaps in the user-data itself (be sure you understand the security implications of this) as well as a library such as boto or the AWS SDK to perform the AWS API calls.
  2. The explicit step-by-step instructions for carrying out steps 1 through 5 above using the relevant AWS command-line tools.

Do you have these missing pieces? If so, please comment.

Cloud Developer Tips

Using Elastic Beanstalk via command-line on a Mac? Keep that OS X Install DVD handy

The title pretty much says it all.

Elastic Beanstalk is the new service from Amazon Web Services offering you easier deployment of Java WAR files. More languages and platforms are expected to be supported in the future.

Most people will use the service via the convenient web console, but if you want to automate things you’ll either end up using the command-line tools (CLI tools) or the API in the Java SDK (until their other SDKs add Beanstalk support).

But, if you’re running on a Mac, you’ll have a problem running the command-line tools:

Shlomos-MacBook-Pro:ec2 shlomo$ elastic-beanstalk-describe-applications/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- json (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/client/awsqueryhandler.rb:2
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/client/awsquery.rb:4:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/client/awsquery.rb:4
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/elasticbeanstalk.rb:19:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/elasticbeanstalk.rb:19
from /Users/shlomo/ec2/elasticbeanstalk/bin/setup.rb:18:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/setup.rb:18
from /Users/shlomo/ec2/elasticbeanstalk/bin/elastic-beanstalk-describe-applications:18:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/elastic-beanstalk-describe-applications:18

If you remember from the README (which you read, of course 😉 there was some vague mention of this:

If you're using Ruby 1.8, you will have to install the JSON gem:
gem install json

OK, let’s try that:

Shlomos-MacBook-Pro:ec2 shlomo$ sudo gem install json
Building native extensions. This could take a while...
ERROR: Error installing json:
ERROR: Failed to build gem native extension.

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/ruby.h

Gem files will remain installed in /Library/Ruby/Gems/1.8/gems/json-1.4.6 for inspection.
Results logged to /Library/Ruby/Gems/1.8/gems/json-1.4.6/ext/json/ext/generator/gem_make.out

The first complaint is about the ruby tools not being in the path. Let’s fix that and try again:

Shlomos-MacBook-Pro:ec2 shlomo$ export PATH=$PATH:/Users/shlomo/.gem/ruby/1.8/bin
Shlomos-MacBook-Pro:ec2 shlomo$ sudo gem install json
Building native extensions. This could take a while...
ERROR: Error installing json:
ERROR: Failed to build gem native extension.

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/ruby.h

Gem files will remain installed in /Library/Ruby/Gems/1.8/gems/json-1.4.6 for inspection.
Results logged to /Library/Ruby/Gems/1.8/gems/json-1.4.6/ext/json/ext/generator/gem_make.out

Uh oh, no dice. What now? StackOverflow to the rescue:

The ruby headers don’t come installed with the base ruby install with Mac OS X. These can been found on Mac OS X Install Disc 2 by installing the XCode Tools.

If you’re like me and you don’t carry around the OS X Install DVD wherever you go, you’re stuck.
Any readers in Seoul with an OS X 10.6 Install DVD?

Update 20 Jan 2011: I’ve gotten some comments that made me realize the title really didn’t say it all. Some clarifications are in order.

Some people pointed out that I should just download the XCode .dmg DVD image – all 3.4 GB of it. Unfortunately that wasn’t applicable for me at the time: I was connected via 3G, tethered to my Android phone. I’ve never tried to download 3.4 GB on that connection, and I don’t plan to try today: it would be expensive.

See the great comment below by Beltran (who works for Bitnami) about a great solution.

Cloud Developer Tips

Using AWS Route 53 to Keep Track of EC2 Instances

This article is a guest post by Guy Rosen, CEO of Onavo and author of the Jack of All Clouds blog. Guy was one of the first people to produce hard numbers on cloud adoption for site hosting, and he continues to publish regular updates to this research in his State of the Cloud series. These days he runs his startup Onavo which uses the cloud to offer smartphone users a way to slash overpriced data roaming costs.

In this article, Guy provides another technique to track changes to your dynamic cloud services automatically, possible now that AWS has released Route 53, DNS services. Take it away, Guy.

While one of the greatest things about EC2 is the way you can spin up, stop and start instances to your heart’s desire, things get sticky when it comes to actually connecting to an instance. When an instance boots (or comes up after being in the Stopped state), Amazon assigns a pair of unique IPs (and DNS names) that you can use to connect: a private IP used when connecting from another machine in EC2, and a public IP is used to connect from the outside. The thing is, when you start and stop dozens of machines daily you lose track of these constantly changing IPs. How many of you have found, like me, that each time you want to connect to a machine (or hook up a pair of machines that need to communicate with each other, such as a web and database server) you find yourself going back to your EC2 console to copy and paste the IP?

This morning I got fed up with this, and since Amazon launched their new Route 53 service I figured the time was ripe to make things right. Here’s what I came up with: a (really) small script that takes your EC2 instance list and plugs it into DNS. You can then refer to your machines not by their IP but by their instance ID (which is preserved across stops and starts of EBS-backed instances) or by a user-readable tag you assign to a machine (such as “webserver”).

Here’s what you do:

  1. Sign up to Amazon Route 53.
  2. Download and install cli53 from (follow the instructions to download the latest Boto and dnspython)
  3. Set up a domain/subdomain you want to use for the mapping (e.g.,
    1. Set it up on Route53 using cli53:
      ./ create
    2. Use your domain provider’s interface to set Amazon’s DNS servers (reported in the response to the create command)
    3. Run the following script (replace any details and paths, emphasized in bold, with your own):

      #!/bin/tcsh -f
      set root=`dirname $0`
      setenv EC2_HOME /usr/local/ec2-api-tools
      setenv EC2_CERT $root/ec2_x509_cert.pem
      setenv EC2_PRIVATE_KEY $root/ec2_x509_private.pem
      setenv AWS_ACCESS_KEY_ID myawsaccesskeyid
      setenv AWS_SECRET_ACCESS_KEY mysecretaccesskey

      $EC2_HOME/bin/ec2-describe-instances | \
      perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/ \
      and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sShortName\s+(\S+)/ \
      and print "$1 $dns\n"' | \
      perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' | \
      xargs -n 4 $root/cli53/ \
      rrcreate -x 60

Voila! You now have DNS names such as that point to your instances. To make things more helpful, if you add a tag called ShortName to your instances it will be picked up, letting you create names such as The script creates CNAME records, which means that you will automatically get internal EC2 IPs when querying inside EC2 and public IPs from the outside.

Put this script somewhere, run it in a cron – and you’ll have an auto-updating DNS zone for your EC2 servers.

Short disclaimer: the script above is a horrendous one-liner that roughly works and uses many assumptions, it works for me but no guarantees.