auto scaling – Shlomo Swidler

AWS Auto-Scaling and ELB with Reliable Root Domain Handling

Background

You want an auto-scaled, load-balanced pool of web servers to host your site at example.com. Unfortunately it’s not so simple, because AWS Elastic Load Balancer can’t be used to host a domain apex (AKA a root domain). One of the longest threads on the AWS Developer Forum discusses this limitation: because ELB utilizes DNS CNAMEs, which are not legal for root domain entries, ELB does not support root domains.

An often-suggested workaround is to use an instance with an Elastic IP address to host the root domain, via standard static DNS, with the web server redirecting all root domain requests to the subdomain (www) served by the ELB. There are four drawbacks to this approach:

The instance with the Elastic IP address is liable to be terminated by auto-scaling, leaving requests to the root domain unanswered.

The instance with the Elastic IP address might fail unnaturally, again leaving requests to the root domain unanswered.

Even when traffic is very low, we need at least two instances running: the one handling the root domain outside the auto-scaled ELB group (due to issue #1) and the one inside the auto-scaled ELB group (to handle the actual traffic hitting the ELB-managed subdomain).

The redirect adds additional latency to requests hitting the root domain.

While we can’t do anything about the fourth issue, what follows is a technique to handle the first three issues.

The Idea

The idea is built on these principles:

The instance with the Elastic IP is outside the auto-scaled group so it will not be terminated by auto-scaling.

The instance with the Elastic IP is managed using AWS tools to ensure the root domain service is automatically recovered if the instance dies unexpectedly.

The auto-scaling group can scale back to zero size, so only a single instance is required to serve low traffic volumes.

How do we put these together?

Here’s how:

Create an AMI for your web server. The AMI will need some special boot-time hooks, which are described below in italics. The web server should be set up to redirect root domain traffic to the subdomain that you’ll want to associate with the ELB, and to serve the subdomain normally.

Create an ELB for the site’s subdomain with a meaningful Health Check (e.g. a URL that exercises representative areas of the application).

Create an AutoScaling group with min=1 and max=1 instances of that AMI. This AutoScaling group will benefit from the default health checks that such groups have, and if EC2 reports the instance is degraded it will be replaced. The LaunchConfiguration for this AutoScaling group should specify user-data that indicates this instance is the “root domain” instance. Upon booting, the instance will notice this flag in the user data, associate the Elastic IP address with itself, an add itself to the ELB.
Note: At this point, we have a reliably-hosted single instance hosting the root domain and the subdomain.

Create a second AutoScaling group (the “ELB AutoScaling group”) that uses the same AMI, with min=0 instances – the max can be anything you want it to – and set it up to use the ELB’s Health Check. The LaunchConfiguration for this group should not contain the abovementioned special flag – these are not root domain instances.

Create an Alarm that looks at the CPUUtilization across all instances of the AMI, and connect it to the “scale up” and “scale down” Policies for the ELB AutoScaling group.

That is the basic idea. The result will be:

The root domain is hosted on an instance that redirects to the ELB subdomain. This instance is managed by a standalone Auto Scaling group that will replace the instance if it becomes degraded. This instance is also a member of the ELB, so it serves the subdomain traffic as well.

A second AutoScaling group manages the “overflow” traffic, measured by the CPUUtilization of all the running instances of the AMI.

TODO

Here are the missing pieces:

A script that can be run as a boot-time hook that checks the user-data for a special flag. When this flag is detected, the script associates the root domain’s Elastic IP address (which should be specified in the user-data) and adds the instance to the ELB (whose name is also specified in the user-data). This will likely require AWS Credentials to be placed on the instance – perhaps in the user-data itself (be sure you understand the security implications of this) as well as a library such as boto or the AWS SDK to perform the AWS API calls.

The explicit step-by-step instructions for carrying out steps 1 through 5 above using the relevant AWS command-line tools.

Do you have these missing pieces? If so, please comment.

Avoiding EC2 InsufficientInstanceCapacity: Insufficient Capacity Errors

Here’s a quick tip from this thread on the AWS EC2 Developer Forums.

If you experience the InsufficientInstanceCapacity: Insufficient Capacity error, you’ll be glad to know there are some strategies for working around it. Justin@AWS offers this advice:

There can be short periods of time when we are unable to accommodate instance requests that are targeted to a specific Availability Zone. When a particular instance type experiences unexpected demand in an Availability Zone, our system must react by shifting capacity from one instance type to another. This can result in short periods of insufficient capacity. We incorporate this data into our capacity planning and try to manage all zones to have adequate capacity at all times. The following steps will ensure that you will have the best experience launching Amazon EC2 instances when an initial insufficient capacity message is received:

1. Don’t specify an Availability Zone in your request unless necessary. By targeting a specific Availability Zone you eliminate our ability to satisfy that request by using our other available Availability Zones. Please note that a single RunInstances call will allocate all instances within a single Availability Zone.

2. If you require a large number of instances for a particular job, please request them in batches. The best practice to follow here is to request 25% of your total cluster size at a time. For example, if you want to launch 200 instances, launching 50 instances at a time will result in a better experience.

3. Try using a different instance type. As capacity varies across instance types, attempting to launch difference instance types provides spill over capacity should your primary instance type be temporarily unavailable.

Unfortunately, these techniques require that you be willing to accept higher bandwidth costs for cross-availability-zone traffic.

And, none of these tips help if you’re using Auto Scaling. A single Auto Scaling Group must be in a specific availability zone, so #1 won’t help. You can try using smaller numbers of instances when a trigger is reached by choosing a smaller LowerBreachScaleIncrement or UpperBreachScaleIncrement (which control by how many instances or by what percent to scale in each direction), as per #2, but this is only helpful if you’ve planned in advance. And #3 is only possible if you’ve already noticed an auto scaling activity failure and changed the Launch Configuration – which defeats the purpose of Auto Scaling.

Auto Scaling’s error reporting and recovery is very limited currently. Are you listening, AWS?

Update 18 October 2009: AWS is listening. The following post by John@AWS appears in this thread:

AutoScaling currently reports […]InsufficientInstanceCapacity […] as a generic Internal Error. This is unintentional, and will be remedied in our next release.

Cool!

Update 19 October 2009: Auto Scaling Groups can now be configured to support more than one Availability Zone. Here is the salient quote from the updated documentation:

Instance Distribution and Balance across Multiple Zones

Auto Scaling attempts to distribute instances evenly between the Availability Zones that are enabled for your Auto Scaling group. Auto Scaling attempts to launch new instances in the Availability Zone with the fewest instances. If the attempt fails, however, Auto Scaling will attempt to launch in other zones until it succeeds.

Certain operations and conditions can cause your Auto Scaling group to become unbalanced. Auto Scaling compensates by creating a rebalancing activity under any of the following conditions:

You issue a request to change the Availability Zones for your group.

You call TerminateInstanceInAutoScalingGroup, which causes the group to become unbalanced.

An Availability Zone that previously had insufficient capacity recovers and has additional capacity available.

Auto Scaling always launches new instances before attempting to terminate old ones, so a rebalancing activity will not compromise the performance or availability of your application.

Multi-Zone Instance Counts when Approaching Capacity

Because Auto Scaling always attempts to launch new instances before terminating old ones, being at or near the specified maximum capacity could impede or completely halt rebalancing activities. To avoid this problem, the system can temporarily exceed the specified maximum capacity of a group by a 10 percent margin during a rebalancing activity (or by a 1-instance margin, whichever is greater). The margin is extended only if the group is at or near maximum capacity and needs rebalancing, either as a result of user-requested rezoning or to compensate for zone availability issues. The extension lasts only as long as needed to rebalance the group—typically a few minutes.