Categories
Cloud Developer Tips

Play “Chicken” with Spot Instances

AWS Spot Instances have an interesting economic characteristic that make it possible to game the system a little. Like all EC2 instances, when you initiate termination of a Spot Instance then you incur a charge for the entire hour, even if you’ve used less than a full hour. But, when AWS terminates the instance due to the spot price exceeding the bid price, you do not pay for the current hour.

What if your Spot Instance could wait, after finishing its work, to see if AWS will terminate it involuntarily in this hour and avoid the hour’s cost? In the worst case, your instance can kill itself in the last few minutes of the hour and you will not have incurred any extra unplanned cost. In the best case, the spot price will rise above the instance’s bid price before the hour is up, AWS will terminate the instance involuntarily, and you will not be charged for that entire hour. Wouldn’t this technique reduce costs, especially when performed at large scale?

I call this technique Playing Chicken, based on the game of that name, because it shares similar characteristics to the game:

  • Whoever “swerves” (terminates) first, loses (pays for the hour)
  • If nobody “swerves” (terminates), then an undesirable situation occurs (the instance remains running)

How to Play Chicken

Playing Chicken is really as simple as running a script on the instance when you’re done with the work. Here’s such a script:

#! /bin/bash
t=/tmp/ec2.running.seconds.$$
if wget -q -O $t http://169.254.169.254/latest/meta-data/local-ipv4 ; then
	# add 60 seconds artificially as a safety margin
	let runningSecs=$(( `date +%s` - `date -r $t +%s` ))+60
	rm -f $t
	let runningSecsThisHour=$runningSecs%3600
	let runningMinsThisHour=$runningSecsThisHour/60
	let leftMins=60-$runningMinsThisHour
	# start shutdown one minute earlier than actually required
	let shutdownDelayMins=$leftMins-1
	if [[ $shutdownDelayMins > 1 && $shutdownDelayMins < 60 ]]; then
		echo "Shutting down in $shutdownDelayMins mins."
		# TODO: Notify off-instance listener that the game of chicken has begun
		sudo shutdown -h +$shutdownDelayMins
	else
		echo "Shutting down now."
		sudo shutdown -h now
	fi
	exit 0
fi
echo "Failed to determine remaining minutes in this billable hour. Terminating now."
sudo shutdown -h now
exit 1

This script uses the technique published by Dmitriy Samovskiy to determine the launch time of the current instance without using the EC2 API, using the instance meta-data instead. We include a safety margin of two minutes: accounting for the remaining time conservatively adding 1 minute, and beginning the shutdown sequence one minute earlier.

You would run this script on the instance when the Spot Instance is done with its work instead of terminating the instance immediately. You also can add a hook at the indicated place to notify an off-instance listener that the game of chicken has begun, to allow you to track the savings delivered by this technique.

Warning: Make sure you really understand what this script does before you use it. If you mistakenly schedule an instance to be shut down you can cancel it with this command, run on the instance: sudo shutdown -c

How Much is Saved by Playing Chicken?

The extent to which you can benefit from playing chicken depends on a number of factors:

  • The difference between the spot price and your instance’s bid price. The further away the spot price is from your bid, the less likely it is that the spot price will hit the bid and save you money.
  • The volatility of the spot price. The more volatile the spot price, the more likely it will hit the bid and save you money.
  • The number of Spot Instances you terminate in a given period of time. If you normally don’t terminate any Spot Instances then you won’t save anything; if you terminate many then you can potentially save an hour’s worth of cost for each of them.
  • The EC2 Region and instance type. The actual spot price varies by region and instance type, so the potential savings depends on these factors as well.

I’m looking for help to work out a model that can describe the potential savings. If you are interested and able to help with the financial math, please get in touch.

Update: Hat tip to Simon Wardley who pointed out the site CloudExchange that shows great visualizations of the spot prices by region and instance type. This may help you formulate a bidding strategy.

5 replies on “Play “Chicken” with Spot Instances”

If the price is only going up a little, then Amazon could easily let your instance finish out that hour before terminating it so that you are charged for the hour.

If the price suddenly jumps up a lot, then Amazon might be able to make more money for the remaining portion of the hour by terminating your instance, not charging you for the initial part of the hour, and charging the higher price sooner for another spot instance on that same hardware.

It isn’t clear to me exactly when Amazon is obligated or not to terminate your instance when the spot price goes up. I think they would be within their right to delay 5-50 minutes to terminate running instances before starting slightly more expensive instances in their place.

You could calculate the break even point in spot instance cost percentage increase where it would make sense for them to leave your instance running for the rest of the instance hour and where it would make sense to terminate it and get the higher priced instance in for the remainder of that hour.

It would be an interesting test to run lots of spot instances right at the current spot instance price and then report at what minute of the instance-hour Amazon terminated the instance when the spot instance price goes up. This could either be a smooth spread or it could be loaded towards the beginning of the instance-hour mark if Amazon works as I propose above. The test would need to make sure that instances are started at many different wall clock times to avoid any skewing that might be introduced if Amazon changes spot instance prices and terminates instances at regular wall clock intervals. This test should probably be piggy backed on somebody’s already running spot instance jobs as it could get expensive to collect enough data for reliable results otherwise.

@Oded,

uptime shows the time since the last reboot, which is not the same thing as the time that you began paying for the instance. Reboots will cause these figures to differ.

Leave a Reply to Eric Hammond Cancel reply

Your email address will not be published.