Categories
Cloud Developer Tips

Play “Chicken” with Spot Instances

AWS Spot Instances have an interesting economic characteristic that make it possible to game the system a little. Like all EC2 instances, when you initiate termination of a Spot Instance then you incur a charge for the entire hour, even if you’ve used less than a full hour. But, when AWS terminates the instance due to the spot price exceeding the bid price, you do not pay for the current hour.

What if your Spot Instance could wait, after finishing its work, to see if AWS will terminate it involuntarily in this hour and avoid the hour’s cost? In the worst case, your instance can kill itself in the last few minutes of the hour and you will not have incurred any extra unplanned cost. In the best case, the spot price will rise above the instance’s bid price before the hour is up, AWS will terminate the instance involuntarily, and you will not be charged for that entire hour. Wouldn’t this technique reduce costs, especially when performed at large scale?

I call this technique Playing Chicken, based on the game of that name, because it shares similar characteristics to the game:

  • Whoever “swerves” (terminates) first, loses (pays for the hour)
  • If nobody “swerves” (terminates), then an undesirable situation occurs (the instance remains running)

How to Play Chicken

Playing Chicken is really as simple as running a script on the instance when you’re done with the work. Here’s such a script:

#! /bin/bash
t=/tmp/ec2.running.seconds.$$
if wget -q -O $t http://169.254.169.254/latest/meta-data/local-ipv4 ; then
	# add 60 seconds artificially as a safety margin
	let runningSecs=$(( `date +%s` - `date -r $t +%s` ))+60
	rm -f $t
	let runningSecsThisHour=$runningSecs%3600
	let runningMinsThisHour=$runningSecsThisHour/60
	let leftMins=60-$runningMinsThisHour
	# start shutdown one minute earlier than actually required
	let shutdownDelayMins=$leftMins-1
	if [[ $shutdownDelayMins > 1 && $shutdownDelayMins < 60 ]]; then
		echo "Shutting down in $shutdownDelayMins mins."
		# TODO: Notify off-instance listener that the game of chicken has begun
		sudo shutdown -h +$shutdownDelayMins
	else
		echo "Shutting down now."
		sudo shutdown -h now
	fi
	exit 0
fi
echo "Failed to determine remaining minutes in this billable hour. Terminating now."
sudo shutdown -h now
exit 1

This script uses the technique published by Dmitriy Samovskiy to determine the launch time of the current instance without using the EC2 API, using the instance meta-data instead. We include a safety margin of two minutes: accounting for the remaining time conservatively adding 1 minute, and beginning the shutdown sequence one minute earlier.

You would run this script on the instance when the Spot Instance is done with its work instead of terminating the instance immediately. You also can add a hook at the indicated place to notify an off-instance listener that the game of chicken has begun, to allow you to track the savings delivered by this technique.

Warning: Make sure you really understand what this script does before you use it. If you mistakenly schedule an instance to be shut down you can cancel it with this command, run on the instance: sudo shutdown -c

How Much is Saved by Playing Chicken?

The extent to which you can benefit from playing chicken depends on a number of factors:

  • The difference between the spot price and your instance’s bid price. The further away the spot price is from your bid, the less likely it is that the spot price will hit the bid and save you money.
  • The volatility of the spot price. The more volatile the spot price, the more likely it will hit the bid and save you money.
  • The number of Spot Instances you terminate in a given period of time. If you normally don’t terminate any Spot Instances then you won’t save anything; if you terminate many then you can potentially save an hour’s worth of cost for each of them.
  • The EC2 Region and instance type. The actual spot price varies by region and instance type, so the potential savings depends on these factors as well.

I’m looking for help to work out a model that can describe the potential savings. If you are interested and able to help with the financial math, please get in touch.

Update: Hat tip to Simon Wardley who pointed out the site CloudExchange that shows great visualizations of the spot prices by region and instance type. This may help you formulate a bidding strategy.

Categories
Cloud Developer Tips The Business of IT

CloudConnect 2011 Platforms and Ecosystems BOF

This March 7-10 2011 I will be in Santa Clara for the CloudConnect conference. There are many reasons you should go too, and I’d like to highlight one session that you should not miss: the Platforms and Ecosystems BOF, on Tuesday March 8 at 6:00 – 7:30 PM. Read on for a detailed description of this BOF session and why it promises to be worthwhile.

[Full disclosure: I’m the track chair for the Design Patterns track and I’m running the Platforms and Ecosystems BOF. The event organizers are sponsoring my hotel for the conference, and like all conference speakers my admission to the event is covered.]

CloudConnect 2011 promises to be a high-quality conference, as last year’s was. This year you will be able to learn all about design patterns for cloud applications in the Design Patterns track I’m leading. You’ll also be able to hear from an all-star lineup about many aspects of using cloud: cloud economics, cloud security, culture, risks, and governance, data and storage, devops and automation, performance and monitoring, and private clouds.

But I’m most looking forward to the Platforms and Ecosystems BOF because the format of the event promises to foster great discussions.

The BOF Format

The BOF will be conducted as a… well, it’s hard to describe in words only, so here is a picture:

BOF Overview

At three fixed points around the outside of the room will be three topics: Public IaaS, Public PaaS, and Private Cloud Stacks. There will also be three themes which rotate around the room at each interval; these themes are: Workload Portability, Monitoring & Control, and Avoiding Vendor Lock-in. At any one time there will be three discussions taking place, one in each corner, focusing on the particular combination of that theme and topic.

Here is an example of the first set of discussions:

BOF Session 1

And here is the second set of discussions, which will take place after one “turn” of the inner “wheel”:

BOF Session 2

And here is the final set of discussions, after the second turn of the inner wheel:

BOF Session 3

In all, nine discussions are conducted. Here is a single matrix to summarize:

BOF Summary Matrix

Anatomy of a Discussion

What makes a discussion worthwhile? Interesting questions, focus, and varied opinions. The discussions in the BOF will have all of these elements.

Interesting Questions

The questions above are just “seeder” questions. They are representative questions related to the intersection of each topic and theme, but they are by no means the only questions. Do you have questions you’d like to see discussed? Please, leave a comment below. Better yet, attend the BOF and raise them there, in the appropriate discussion.

Focus

Nothing sucks more than a pointless tangent or a a single person monopolizing the floor. Each discussion will be shepherded by a capable moderator, who will keep things focused on the subject area and encourage everyone’s participation.

Varied Opinions

Topic experts and theme experts will participate in every discussion.

Vendors will be present as well – but the moderators will make sure they do not abuse the forum.

And interested parties – such as you! – will be there too. In unconference style, the audience will drive the discussion.

These three elements all together provide the necessary ingredients to make sure things stay interesting.

At the Center

What, you may ask, is at the center of the room? I’m glad you asked. There’ll be beer and refreshments at the center. This is also where you can conduct spin-off discussions if you like.

The Platforms and Ecosystems BOF at CloudConnect is the perfect place to bring your cloud insights, case studies, anecdotes, and questions. It’s a great forum to discuss the ideas you’ve picked up during the day at the conference, in a less formal atmosphere where deep discussion is on the agenda.

Here’s a discount code for 25% off registrationCNXFCC03. I hope to see you there.

Categories
Cloud Developer Tips

Lessons Learned from Using Multiple Cloud APIs

Adrian Cole, author of the jClouds library, has an excellent writeup of the trajectory the library’s development followed as it added support for more cloud providers’ APIs.

[Update August 2011: Blogger.com has removed Adrian’s blog so that link no longer works.]

Some important takeaways for application developers:

  • Your unit tests are as valuable as your code. The tests ensure the code works to spec, and they should be used as frequently as possible during development.
  • Make your code easy to test, with sensible defaults that require no external dependencies: e.g. don’t require internet connectivity.
  • Cloud  limitations, both general (such as eventual consistency) and cloud-specific (such as a limit on the number of buckets per S3 account), will require careful consideration in your code (and tests).
  • Some things can only really be tested against a live cloud service. As Adrian points out, the only way to test that an instance launched with the desired customizations is to ssh in to that instance and explore it from the inside. This is not testable using an offline stub emulator.

But the key lesson developers can learn is: Whenever possible, use an existing library to interface with your clouds. As Adrian’s post makes patently clear, a lot of effort goes into ensuring the library works properly with the various supported APIs, and you can only benefit by leveraging those accomplishments.

On the other side of the fence, API developers can also learn from Adrian’s article. As William Vambenepe recently commented:

Rather than spending hours obsessing about the finer points of your API, spend the time writing love letters to [boto author] Mitch and Adrian so they support you in their libraries.

In fact, Adrian’s blog can be viewed as a TODO list for API creators who want to encourage adoption.

API authors should also refer to Steve Loughran’s Cloud Tools Manifesto for more great ideas on how to make life easy for developers.

Categories
Cloud Developer Tips

AWS Auto-Scaling and ELB with Reliable Root Domain Handling

Update May 2011: Now that AWS Route 53 can be used to allow an ELB to host a domain zone apex, the technique described here is no longer necessary. Cool, but not necessary.

Someone really has to implement this. I’ve had this draft sitting around ever since AWS announced support for improved CloudWatch alerts and AutoScaling policies (August 2010), but I haven’t yet turned it into a clear set of commands to follow. If you do, please comment.

Background

You want an auto-scaled, load-balanced pool of web servers to host your site at example.com. Unfortunately it’s not so simple, because AWS Elastic Load Balancer can’t be used to host a domain apex (AKA a root domain). One of the longest threads on the AWS Developer Forum discusses this limitation: because ELB utilizes DNS CNAMEs, which are not legal for root domain entries, ELB does not support root domains.

An often-suggested workaround is to use an instance with an Elastic IP address to host the root domain, via standard static DNS, with the web server redirecting all root domain requests to the subdomain (www) served by the ELB. There are four drawbacks to this approach:

  1. The instance with the Elastic IP address is liable to be terminated by auto-scaling, leaving requests to the root domain unanswered.
  2. The instance with the Elastic IP address might fail unnaturally, again leaving requests to the root domain unanswered.
  3. Even when traffic is very low, we need at least two instances running: the one handling the root domain outside the auto-scaled ELB group (due to issue #1) and the one inside the auto-scaled ELB group (to handle the actual traffic hitting the ELB-managed subdomain).
  4. The redirect adds additional latency to requests hitting the root domain.

While we can’t do anything about the fourth issue, what follows is a technique to handle the first three issues.

The Idea

The idea is built on these principles:

  • The instance with the Elastic IP is outside the auto-scaled group so it will not be terminated by auto-scaling.
  • The instance with the Elastic IP is managed using AWS tools to ensure the root domain service is automatically recovered if the instance dies unexpectedly.
  • The auto-scaling group can scale back to zero size, so only a single instance is required to serve low traffic volumes.

How do we put these together?

Here’s how:

  1. Create an AMI for your web server. The AMI will need some special boot-time hooks, which are described below in italics. The web server should be set up to redirect root domain traffic to the subdomain that you’ll want to associate with the ELB, and to serve the subdomain normally.
  2. Create an ELB for the site’s subdomain with a meaningful Health Check (e.g. a URL that exercises representative areas of the application).
  3. Create an AutoScaling group with min=1 and max=1 instances of that AMI. This AutoScaling group will benefit from the default health checks that such groups have, and if EC2 reports the instance is degraded it will be replaced. The LaunchConfiguration for this AutoScaling group should specify user-data that indicates this instance is the “root domain” instance. Upon booting, the instance will notice this flag in the user data, associate the Elastic IP address with itself, an add itself to the ELB.
    Note: At this point, we have a reliably-hosted single instance hosting the root domain and the subdomain.
  4. Create a second AutoScaling group (the “ELB AutoScaling group”) that uses the same AMI, with min=0 instances – the max can be anything you want it to – and set it up to use the ELB’s Health Check. The LaunchConfiguration for this group should not contain the abovementioned special flag – these are not root domain instances.
  5. Create an Alarm that looks at the CPUUtilization across all instances of the AMI, and connect it to the “scale up” and “scale down” Policies for the ELB AutoScaling group.

That is the basic idea. The result will be:

  • The root domain is hosted on an instance that redirects to the ELB subdomain. This instance is managed by a standalone Auto Scaling group that will replace the instance if it becomes degraded. This instance is also a member of the ELB, so it serves the subdomain traffic as well.
  • A second AutoScaling group manages the “overflow” traffic, measured by the CPUUtilization of all the running instances of the AMI.

TODO

Here are the missing pieces:

  1. A script that can be run as a boot-time hook that checks the user-data for a special flag. When this flag is detected, the script associates the root domain’s Elastic IP address (which should be specified in the user-data) and adds the instance to the ELB (whose name is also specified in the user-data). This will likely require AWS Credentials to be placed on the instance – perhaps in the user-data itself (be sure you understand the security implications of this) as well as a library such as boto or the AWS SDK to perform the AWS API calls.
  2. The explicit step-by-step instructions for carrying out steps 1 through 5 above using the relevant AWS command-line tools.

Do you have these missing pieces? If so, please comment.

Categories
Cloud Developer Tips

Using Elastic Beanstalk via command-line on a Mac? Keep that OS X Install DVD handy

The title pretty much says it all.

Elastic Beanstalk is the new service from Amazon Web Services offering you easier deployment of Java WAR files. More languages and platforms are expected to be supported in the future.

Most people will use the service via the convenient web console, but if you want to automate things you’ll either end up using the command-line tools (CLI tools) or the API in the Java SDK (until their other SDKs add Beanstalk support).

But, if you’re running on a Mac, you’ll have a problem running the command-line tools:

Shlomos-MacBook-Pro:ec2 shlomo$ elastic-beanstalk-describe-applications/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- json (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/client/awsqueryhandler.rb:2
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/client/awsquery.rb:4:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/client/awsquery.rb:4
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/elasticbeanstalk.rb:19:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/../lib/aws/elasticbeanstalk.rb:19
from /Users/shlomo/ec2/elasticbeanstalk/bin/setup.rb:18:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/setup.rb:18
from /Users/shlomo/ec2/elasticbeanstalk/bin/elastic-beanstalk-describe-applications:18:in `require'
from /Users/shlomo/ec2/elasticbeanstalk/bin/elastic-beanstalk-describe-applications:18

If you remember from the README (which you read, of course 😉 there was some vague mention of this:

If you're using Ruby 1.8, you will have to install the JSON gem:
gem install json

OK, let’s try that:

Shlomos-MacBook-Pro:ec2 shlomo$ sudo gem install json
Building native extensions. This could take a while...
ERROR: Error installing json:
ERROR: Failed to build gem native extension.

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/ruby.h

Gem files will remain installed in /Library/Ruby/Gems/1.8/gems/json-1.4.6 for inspection.
Results logged to /Library/Ruby/Gems/1.8/gems/json-1.4.6/ext/json/ext/generator/gem_make.out

The first complaint is about the ruby tools not being in the path. Let’s fix that and try again:

Shlomos-MacBook-Pro:ec2 shlomo$ export PATH=$PATH:/Users/shlomo/.gem/ruby/1.8/bin
Shlomos-MacBook-Pro:ec2 shlomo$ sudo gem install json
Building native extensions. This could take a while...
ERROR: Error installing json:
ERROR: Failed to build gem native extension.

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/ruby.h

Gem files will remain installed in /Library/Ruby/Gems/1.8/gems/json-1.4.6 for inspection.
Results logged to /Library/Ruby/Gems/1.8/gems/json-1.4.6/ext/json/ext/generator/gem_make.out

Uh oh, no dice. What now? StackOverflow to the rescue:

The ruby headers don’t come installed with the base ruby install with Mac OS X. These can been found on Mac OS X Install Disc 2 by installing the XCode Tools.

If you’re like me and you don’t carry around the OS X Install DVD wherever you go, you’re stuck.
Any readers in Seoul with an OS X 10.6 Install DVD?

Update 20 Jan 2011: I’ve gotten some comments that made me realize the title really didn’t say it all. Some clarifications are in order.

Some people pointed out that I should just download the XCode .dmg DVD image – all 3.4 GB of it. Unfortunately that wasn’t applicable for me at the time: I was connected via 3G, tethered to my Android phone. I’ve never tried to download 3.4 GB on that connection, and I don’t plan to try today: it would be expensive.

See the great comment below by Beltran (who works for Bitnami) about a great solution.

Categories
Cloud Developer Tips

Using AWS Route 53 to Keep Track of EC2 Instances

This article is a guest post by Guy Rosen, CEO of Onavo and author of the Jack of All Clouds blog. Guy was one of the first people to produce hard numbers on cloud adoption for site hosting, and he continues to publish regular updates to this research in his State of the Cloud series. These days he runs his startup Onavo which uses the cloud to offer smartphone users a way to slash overpriced data roaming costs.

In this article, Guy provides another technique to track changes to your dynamic cloud services automatically, possible now that AWS has released Route 53, DNS services. Take it away, Guy.

While one of the greatest things about EC2 is the way you can spin up, stop and start instances to your heart’s desire, things get sticky when it comes to actually connecting to an instance. When an instance boots (or comes up after being in the Stopped state), Amazon assigns a pair of unique IPs (and DNS names) that you can use to connect: a private IP used when connecting from another machine in EC2, and a public IP is used to connect from the outside. The thing is, when you start and stop dozens of machines daily you lose track of these constantly changing IPs. How many of you have found, like me, that each time you want to connect to a machine (or hook up a pair of machines that need to communicate with each other, such as a web and database server) you find yourself going back to your EC2 console to copy and paste the IP?

This morning I got fed up with this, and since Amazon launched their new Route 53 service I figured the time was ripe to make things right. Here’s what I came up with: a (really) small script that takes your EC2 instance list and plugs it into DNS. You can then refer to your machines not by their IP but by their instance ID (which is preserved across stops and starts of EBS-backed instances) or by a user-readable tag you assign to a machine (such as “webserver”).

Here’s what you do:

  1. Sign up to Amazon Route 53.
  2. Download and install cli53 from https://github.com/barnybug/cli53 (follow the instructions to download the latest Boto and dnspython)
  3. Set up a domain/subdomain you want to use for the mapping (e.g., ec2farm.mycompany.com):
    1. Set it up on Route53 using cli53:
      ./cli53.py create ec2farm.mycompany.com
    2. Use your domain provider’s interface to set Amazon’s DNS servers (reported in the response to the create command)
    3. Run the following script (replace any details and paths, emphasized in bold, with your own):

      #!/bin/tcsh -f
      set root=`dirname $0`
      setenv EC2_HOME /usr/local/ec2-api-tools
      setenv EC2_CERT $root/ec2_x509_cert.pem
      setenv EC2_PRIVATE_KEY $root/ec2_x509_private.pem
      setenv AWS_ACCESS_KEY_ID myawsaccesskeyid
      setenv AWS_SECRET_ACCESS_KEY mysecretaccesskey

      $EC2_HOME/bin/ec2-describe-instances | \
      perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/ \
      and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sShortName\s+(\S+)/ \
      and print "$1 $dns\n"' | \
      perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' | \
      xargs -n 4 $root/cli53/cli53.py \
      rrcreate -x 60 ec2farm.mycompany.com

Voila! You now have DNS names such as i-abcd1234.ec2farm.mycompany.com that point to your instances. To make things more helpful, if you add a tag called ShortName to your instances it will be picked up, letting you create names such as dbserver2.ec2farm.mycompany.com. The script creates CNAME records, which means that you will automatically get internal EC2 IPs when querying inside EC2 and public IPs from the outside.

Put this script somewhere, run it in a cron – and you’ll have an auto-updating DNS zone for your EC2 servers.

Short disclaimer: the script above is a horrendous one-liner that roughly works and uses many assumptions, it works for me but no guarantees.

Categories
Cloud Developer Tips

S3 Reduced Redundancy Storage with Simple Notification Service: What, Why, and When

AWS recently added support for receiving Simple Notification Service notifications when S3 loses a Reduced Redundancy Storage S3 object. This raises a number of questions:

  • What the heck does that even mean?
  • Why would I want to do that?
  • Under what conditions does it make financial sense to do that?

Let’s take a look at these questions, and we’ll also do a bit of brainstorming (please participate!) to design a service that puts it all together.

What is S3 Reduced Redundancy Storage?

Standard objects stored in S3 have “eleven nines” of durability annually. This means 99.999999999% of your objects stored in S3 will still be there after one year. On average, you will need to store 100,000,000,000 – that’s one hundred billion – objects in standard S3 storage before you will, on average, have one of them disappear over a year’s time. Pretty great.

Reduced Redundancy Storage (RRS) is a different class of S3 storage that, in effect, has a lower durability: 99.99% annually. On average, you will need to store only 10,000 objects in RRS S3 before you should expect one of them to disappear over a year’s time. Not quite as great, but still more than 400 times better than a traditional hard drive.

When an RRS object is lost S3 will return an HTTP 405 response code, and your application is supposed to be built to understand that and take the appropriate action: most likely regenerate the object from its source objects, which have been stored elsewhere more reliably – probably in standard eleven-nines S3. It’s less expensive for AWS to provide a lower durability class of service, and therefore RRS storage is priced accordingly: it’s about 2/3 the cost of standard S3 storage.

RRS is great for derived objects – for example, image thumbnails. The source object – the full-quality image or video – can be used to recreate the derived object – the thumbnail – without losing any information. All it costs to create the derived object is time and CPU power. And that’s most likely why you’re creating the derived objects and storing them in S3: to act as a cache so the app server does not need to spend time and CPU power recreating them for every request. Using S3 RRS as a cache will save you 1/3 of your storage costs for the derived objects, but  you’ll need to occasionally recreate a derived object in your application.

How Do You Handle Objects Stored in RRS?

If you serve the derived objects to clients directly from S3 – as many web apps do with their images – your clients will occasionally get a HTTP 405 response code (about once a year for every 10,000 RRS objects stored). The more objects you store the higher the likelihood of a client’s browser encountering a HTTP 405 error – and most browsers show ugly messages when they get a 405 error. So your application should do some checking.

To get your application to check for a lost object you can do the following: Send S3 an HTTP HEAD request for the object before giving the client its URL. If the object exists then the HEAD request will succeed. If the object is lost the HEAD request will return a 405 error. Once you’re sure the object is in S3 (either the HEAD request succeeded, or you recreated the derived object and stored it again in S3), give the object’s URL to the client.

All that HEAD checking is a lot of overhead: each S3 RRS URL needs to be checked every time it’s served. You can add a cache of the URL of objects you’ve checked recently and skip those. This will cut down on the overhead and reduce your S3 bill – remember that each HEAD request costs 1/10,000 of a cent – but it’s still a bunch of unnecessary work because most of the time you check its HEAD the object will still be there.

Using Simple Notification Service with RRS

Wouldn’t it be great if you could be notified when S3 RRS loses an object?

You can. AWS’s announcement introduces a way to receive notification – via Simple Notification Service, SNS – when S3 RRS detects that an object has been lost. This means you no longer need your application to check for 405s before serving objects. Instead you can have your application listen for SNS notifications (either via HTTP or via email or via SQS) and proactively process them to restore lost objects.

Okay, it’s not really true that your application no longer needs to check for lost objects. The latency between the actual loss of an object and the time you recreate and replace it is still nonzero, and during that time you probably want your application to behave nicely.

[An aside: I do wonder what the expected latency is between the object’s loss and the SNS notification. I’ve asked on the Forums and in a comment to Jeff Barr’s blog post – I’ll update this article when I have an answer.]

When Does it Make Financial Sense to Use S3 RRS?

While you save on storage costs for using S3 RRS you still need to devote resources to recreating any lost objects. How can you decide when it makes sense to go with RRS despite the need to recreate lost objects?

There are a number of factors that influence the cost of recreating lost derived objects:

  • Bandwidth to get the source object from S3 and return the derived object to S3. If you perform the processing inside the same EC2 region as the S3 region you’re using then this cost is zero.
  • CPU to perform the transformation of the source object into the derived object.
  • S3 requests for GETting the source object and PUTting the derived object.

I’ve prepared a spreadsheet analyzing these costs for various different numbers of objects, sizes of objects, and CPU-hours required for each derived object.

For 100,000 source objects of average 5MB size stored in Standard S3, each of which creates 5 derived objects of average 500KB size stored in RRS and requiring 1 second of CPU time to recreate, the savings in choosing RRS is $12.50 per month. Accounting for the cost of recreating lost derived objects reduces that savings to $12.37.

For the same types of objects but requiring 15 minutes of CPU time to recreate each derived object the net savings overall is $12.28. Still very close to the entire savings generated by using RRS.

For up to about 500,000 source objects it doesn’t pay to launch a dedicated m1.small instance just for the sake of recreating lost RRS objects. An m1.small costs $61.20 per month, which is approximately the same as the net savings from 500,000 source objects of average 5MB size with 5 derived objects each of average size 500KB. At this level of usage, if you have spare capacity on an existing instance then it would make financial sense to run the recreating process there.

For larger objects the savings is also almost the entire amount saved by using RRS, and the amounts saved are larger than the cost of a single m1.small so it already pays to launch your own instance for the processing.

For larger numbers of objects the savings is also almost the entire amount saved by using RRS.

As far down as you go in the spreadsheet, and as much as you may play with the numbers, it makes financial sense to use RRS and have a mechanism to recreate derived objects.

Which leads us to the the brainstorming.

Why Should I Worry About Lost Objects?

Let’s face it, nobody wants to operate a service that is not core to their business. Most likely, creating the derived objects from the source object is not your business core competency. Creating thumbnails and still frame video captures is commodity stuff.

So let’s imagine a service that does the transformation, storage in S3, and maintenance of RRS derived objects for you so you don’t have to.

You’d drop off your source object in your bucket in S3. Then you’d send an SQS message to the service containing the new source object’s key and a list of the transformations you want applied. As Jeff Bar suggests in his blog, the service would process the message and create derived objects (stored in RRS) whose keys (the name) would be composed of the source object’s name and the name of the transformation applied. You’d know how to construct the name of every derived object, so you would know how to access them. The service would subscribe to the RRS SNS notifications and recreate the derived objects when they are lost.

This service would need a way for clients to discover the supported file types and the supported transformations for each file type.

As we pointed out above, there is a lot of potential financial savings in using RRS, so such a service has plenty of margin to price itself profitably, below the cost of standard S3 storage.

What else would such a service need? Please comment.

If you build such a service, please cut me in for 30% for giving you the idea. Or, at least acknowledge me in your blog.

Categories
Cloud Developer Tips

Storing AWS Credentials on an EBS Snapshot Securely

Thanks to reader Ewout and his comment on my article How to Keep Your AWS Credentials on an EC2 Instance Securely for suggesting an additional method of transferring credentials: via a snapshot. It’s similar to burning credentials into an AMI, but easier to do and less prone to accidental inclusion in the application’s AMI.

Read on for a discussion of how to implement this technique.

How to Store AWS Credentials on an EBS Snapshot

This is how to store a secret on an EBS snapshot. You do this only once, or whenever you need to change the secret.

We’re going to automate as much as possible to make it easy to do. Here’s the command that launches an instance with a newly created 1GB EBS volume, formats it, mounts it, and sets up the root user to be accessible via ssh and scp. The new EBS volume created will not be deleted when the instance is terminated.

$ ec2-run-instances -b /dev/sdf=:1:false -t m1.small -k \
my-keypair -g default ami-6743ae0e -d '#! /bin/bash
yes | mkfs.ext3 /dev/sdf
mkdir -m 000 /secretVol
mount -t ext3 -o noatime /dev/sdf /secretVol
cp /home/ubuntu/.ssh/authorized_keys /root/.ssh/'

We have set up the root user to be accessible via ssh and scp so we can store the secrets on the EBS volume as the root user by directly copying them to the volume as root. Here’s how we do that:

$ ls -l
total 24
-r--r--r-- 1 shlomo  shlomo  916 Jun 20  2010 cert-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem
-r--------  1 shlomo  shlomo   90 Jun  1  2010 creds
-r-------- 1 shlomo  shlomo  926 Jun 20  2010 pk-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem
$ scp -i /path/to/id_rsa-my-keypair * root@174.129.83.237:/secretVol/

Our secret is now on the EBS volume, visible only to the root user.

We’re almost done. Of course you want to test that your application can access the secret appropriately. Once you’ve done that you can terminate the instance – don’t worry, the volume will not be deleted due to the “:false” specification in our launch command.

$ ec2-terminate-instance $instance
$ ec2-describe-volumes
VOLUME	vol-7ce48a15	1		us-east-1b	available	2010-07-18T17:34:01+0000
VOLUME	vol-7ee48a17	15	snap-5e4bec36	us-east-1b	deleting	2010-07-18T17:34:02+0000

Note that the root EBS volume is being deleted but the new 1GB volume we created and stored the secret on is intact.

Now we’re ready for the final two steps:
Snapshot the volume with the secret:

$ ec2-create-snapshot $secretVolume
SNAPSHOT	snap-2ec73045	vol-7ce48a15	pending	2010-07-18T18:05:39+0000		540528830757	1

And, once the snapshot completes, delete the volume:

$ ec2-describe-snapshots -o self
SNAPSHOT	snap-2ec73045	vol-7ce48a15	completed	2010-07-18T18:05:40+0000	100%	540528830757	1
$ ec2-delete-volume $secretVolume
VOLUME	vol-7ce48a15
# save the snapshot ID
$ secretSnapshot=snap-2ec73045

Now you have a snapshot $secretSnapshot with your credentials stored on it.

How to Use Credentials Stored on an EBS Snapshot

Of course you can create a new volume from the snapshot, attach the volume to your instance, mount the volume to the filesystem, and access the secrets via the root user. But here’s a way to do all that at instance launch time:

$ ec2-run-instances un-instances -b /dev/sdf=$secretSnapshot -t m1.small -k \
my-keypair -g default ami-6743ae0e -d '#! /bin/bash
mkdir -m 000 /secretVol
mount -t ext3 -o noatime /dev/sdf /secretVol
# make sure it gets remounted if we reboot
echo "/dev/sdf /secretVol ext3 noatime 0 0" > /etc/fstab'

This one-liner uses the -b option of ec2-run-instances to specify a new volume be created from $secretSnapshot, attached to /dev/sdf, and this volume will be automatically deleted when the instance terminates. The user-data script sets up the filesystem mount point and mounts the volume there, also ensuring that the volume will be remounted if the instance reboots.
Check it out, a new volume was created for /dev/sdf:

$ ec2-describe-instances
RESERVATION	r-e4f2608f	540528830757	default
INSTANCE	i-155b857f	ami-6743ae0e			pending	my-keypair	0		m1.small	2010-07-19T15:51:13+0000	us-east-1b	aki-5f15f636	ari-d5709dbc	monitoring-disabled					ebs
BLOCKDEVICE	/dev/sda1	vol-8a721be3	2010-07-19T15:51:22.000Z
BLOCKDEVICE	/dev/sdf	vol-88721be1	2010-07-19T15:51:22.000Z

Let’s make sure the files are there. SSHing into the instance (as the ubuntu user) we then see:

$ ls -la /secretVol
ls: cannot open directory /secretVol: Permission denied
$ sudo ls -l /secretVol
total 28
-r--------  1 root root   916 2010-07-18 17:52 cert-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem
-r--------  1 root root    90 2010-07-18 17:52 creds
dr--------  2 root   root   16384 2010-07-18 17:42 lost+found
-r--------  1 root root   926 2010-07-18 17:52 pk-NT63JNE4VSDEMH6VHLHBGHWV3DRFDECP.pem

Your application running the instance (you’ll install it by adding to the user-data script, right?) will need root privileges to access those secrets.

Categories
Cloud Developer Tips

Track Changes to your Dynamic Cloud Services Automatically

Dynamic infrastructure can be a pain to accommodate in applications. How do you keep track of the set of web servers in your dynamically scaling web farm? How do your apps keep up with which server is currently running what service? How can applications be written so they don’t need to care if a service gets moved to a different machine? There are a number of techniques available, and I’m happy to share implementation code for one that I’ve found useful.

One thing common to all these techniques: they all allow the application code to refer to services by name instead of IP address. This makes sense because the whole point is not to care about the IP address running the service. Every one of these techniques offers a way to translate the name of the service into an IP address behind the scenes, without your application knowing about it. Where the techniques differ is in how they provide this indirection.

Note that there are four usage scenarios that we might want to support:

  1. Service inside the cloud, client inside the cloud
  2. Service inside the cloud, client outside the cloud
  3. Service outside the cloud, client inside the cloud
  4. Service outside the cloud, client outside the cloud

Let’s take a look at a few techniques to provide loose coupling between dynamically movable services and their IP addresses, and see how they can support these usage scenarios.

Dynamic DNS

Dynamic DNS is the classic way of handling dynamically assigned roles: DNS entries on a DNS server are updated via an API (usually HTTP/S) when a server claims a given role. The DNS entry is updated to point to the IP address of the server claiming that role. For example, your DNS may have a production-master-db.example.com record. When the production deployment’s master database starts up it can register itself with the DNS provider to claim the production-master-db.example.com dns record, pointing that DNS entry to its own IP address. Any client of the database can use the host name production-db-master.example.com to refer to the master database, and as long as the machine that last claimed that DNS entry is still alive, it will work.

When running your service within EC2, Dynamic DNS servers running outside EC2 will see the source IP address for the Dynamic DNS registration request as the public IP address of the instance. So if your Dynamic DNS is hosted outside EC2 you can’t easily register the internal IP addresses. Often you want to register the internal IP address because from within the same EC2 region it costs less to use the private IP address than the public IP addresses. One way to use Dynamic DNS with private IPs is to build your own Dynamic DNS service within EC2 and set up all your application instances to use that DNS server for your domain’s DNS lookups. When instances register with that EC2-based DNS server, the Dynamic DNS service will detect the source of the registration request as being the internal IP address for the instance, and it will assign that internal IP address to the DNS record.

Another way to use Dynamic DNS with internal IP addresses is to use DNS services such as DNSMadeEasy whose API allows you to specify the IP address of the server in the registration request. You can use the EC2 instance metadata to discover your instance’s internal IP address via the URL http://169.254.169.254/latest/meta-data/local-ipv4 .

Here’s how Dynamic DNS fares in each of the above usage scenarios:

Scenario 1: Service in the cloud, client inside the cloud: Only if you run your own DNS inside EC2 or use a special DNS service that supports specifying the internal IP address.
Scenario 2: Service in the cloud, client outside the cloud: Can use public Dynamic DNS providers.
Scenario 3: Service outside the cloud, client inside the cloud: Can use public Dynamic DNS providers.
Scenario 4: Service outside the cloud, client outside the cloud: Can use public Dynamic DNS providers.

Update window: Changes are available immediately to all DNS servers that respect the zero TTL on the Dynamic DNS server (guaranteed only for Scenario 1). DNS propagation delay penalty may still apply because not all DNS servers between the client and your Dynamic DNS service necessarily respect TTLs properly.

Pros: For public IP addresses only, easy to integrate into existing scripts.

Cons: Running your own DNS (to support private IP addresses) is not trivial, and introduces a single point of failure.

Bottom line: Dynamic DNS is useful when both the service and the clients are in the cloud; and for other usage scenarios if a DNS propagation delay is acceptable.

Elastic IP Addresses

In AWS you can have an Elastic IP address: an IP address that can be associated with any instance within a given region. It’s very useful when you want to move your service to a different instance (perhaps because the old one died?) without changing DNS and waiting for those changes to propagate across the internet to your clients. You can put code into the startup sequence of your instances that associates the desired Elastic IP address, making this approach very scriptable. For added flexibility you can write those scripts to accept configurable input (via settings in the user-data or some data stored in S3 or SimpleDB) that specifies which Elastic IP address to associate with the instance.

A cool feature of Elastic IP addresses: if clients use the DNS name of the IP address (“ec2-1-2-3-4.compute-1.amazonaws.com”) instead of the numeric IP address you can have extra flexibility: clients within EC2 will get routed via the internal IP address to the service while clients outside EC2 will get routed via the public IP address. This seamlessly minimizes your bandwidth cost. To take advantage of this you can put a CNAME entry in your domain’s DNS records.

Summary of Elastic IP addresses:

Scenario 1: Service in the cloud, client inside the cloud: Trivial, client should use Elastic IP’s DNS name (or set up a CNAME).
Scenario 2: Service in the cloud, client outside the cloud: Trivial, client should use Elastic IP’s DNS name (or set up a CNAME).
Scenario 3: Service outside the cloud, client inside the cloud: Elastic IPs do not help here.
Scenario 4: Service outside the cloud, client outside the cloud: Elastic IPs do not help here.

Update window: Changes are available in under a minute.

Pros: Requires minimal setup, easy to script.

Cons: No support for running the service outside the cloud.

Bottom line: Elastic IPs are useful when the service is inside the cloud and an approximately one minute update window is acceptable.

Generating Hosts Files

Before the OS queries DNS for the IP address of a hostname it checks in the hosts file. If you control the OS of the client you can generate the hosts file with the entries you need. If you don’t control the OS of the client then this technique won’t help.

There are three important ingredients to get this to work:

  1. A central repository that stores the current name-to-IP address mappings.
  2. A method to update the repository when mappings are updated.
  3. A method to regenerate the hosts file on each client, running on a regular schedule.

The central repository can be S3 or SimpleDB, or a database, or security group tags . If you’re concerned about storing your AWS access credentials on each client (and if these clients are web servers then they may not need your AWS credentials at all) then the database is a natural fit (and web servers probably already talk to the database anyway).

If your service is inside the cloud and you want to support clients both inside and outside the cloud you’ll need to maintain two separate repository tables – one containing the internal IP addresses of the services (for use generating the hosts file of clients inside the cloud) and the other containing the public IP addresses of the services (for use generating the hosts file of clients outside the cloud).

Summary of Generating Hosts Files:

Scenario 1: Service in the cloud, client inside the cloud: Only if you control the client’s OS, and register the service’s internal IP address.
Scenario 2: Service in the cloud, client outside the cloud: Only if you control the client’s OS, and register the service’s public IP address.
Scenario 3: Service outside the cloud, client inside the cloud: Only if you control the client’s OS.
Scenario 4: Service outside the cloud, client outside the cloud: Only if you control the client’s OS.

Update Window: Controllable via the frequency with which you regenerate the hosts file. Can be as short as a few seconds.

Pros: Works on any client whose OS you control, whether inside or outside the cloud, and with services either inside or outside the cloud. And, assuming your application already uses a database, this technique adds no additional single points of failure.

Cons: Requires you to control the client’s OS.

Bottom line: Good for all scenarios where the client’s OS is under your control and you need refresh times of a few seconds.

A Closer Look at Generating Hosts Files

Here is an implementation of this technique using a database as the repository, using Java wrapped in a shell script to regenerate the hosts file, and using Java code to perform the updates. This implementation was inspired by the work of Edward M. Goldberg of myCloudWatcher.

Creating the Repository

Here is the command to create the necessary database (“Hosts”) and table (“hosts”):

mysql -h dbHostname -u dbUsername -pDBPassword -e \
'CREATE DATABASE IF NOT EXISTS Hosts; \
USE Hosts; \
DROP TABLE IF EXISTS \`hosts\`; \
CREATE TABLE \`hosts\` ( \
\`record\` TEXT \
) DEFAULT CHARSET=latin1; \
INSERT INTO \`hosts\` VALUES ("127.0.0.1   localhost   localhost.localdomain");'

Notice that we pre-populate the repository with an entry for “localhost”. This is necessary because the process that updates the hosts file will completely overwrite the old one, and that’s where the localhost entry is supposed to live. Removing the localhost entry could wreak havoc on networking services – so we preserve it by ensuring a localhost entry is in the repository.

Updating the Repository

To claim a certain role (identified by a hostname – in this example “webserver1” – with an IP address 1.2.3.4) it is registered in the repository. Here’s the one-liner:

mysql -h dbHostname -u dbUsername -pDBPassword -e \
'DELETE FROM Hosts.\`hosts\` WHERE record LIKE "% webserver1"; \
INSERT INTO Hosts.\`hosts\` (\`record\`) VALUES ("1.2.3.4   webserver1");'

The registration process can be performed on the client itself or by an outside agent. Make sure you substitute the real host name and the correct IP address.

On an EC2 instance you can get the private and public IP addresses of the instance via the instance metadata URLs. For example:

$ privateIp=$(curl --silent http://169.254.169.254/latest/meta-data/local-ipv4)
$ echo $privateIp
10.209.206.223
$ publicIp=$(curl --silent http://169.254.169.254/latest/meta-data/public-ipv4)
$ echo $publicIp
75.101.198.120

Regenerating the Hosts File

The final piece is recreating the hosts file based on the contents of the database table. Notice how the table records are already in the correct format for a hosts file. It would be simple to dump the output of the entire table to the hosts file:

mysql -h dbHostname -u dbUsername -pDBPassword --silent --column-names=0 -e \
'SELECT \`record\` FROM Hosts.\`hosts\`' | uniq > /etc/hosts  # This is simple and wrong

But it would also be wrong to do that! Every so often the database connection might fail and you’d be left with a hosts file that was completely borked – and that would prevent the client from properly resolving the hostnames of your services. It’s safer to only overwrite the hosts file if the SQL query actually returns results. Here’s some Java code that does that:

Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection conn = DriverManager.getConnection("jdbc:mysql://" + dbHostname + "/?user=" +
	dbUsername + "&password=" + dbPassword);
String outputFileName = "/etc/hosts";
Statement stmt = conn.createStatement();
ResultSet res = stmt.executeQuery("SELECT record FROM Hosts.host");
HashSet<String> uniqueMe = new HashSet<String>();
PrintStream out = System.out;
if (res.isBeforeFirst()) {
	out = new PrintStream(outputFileName);
}
while (res.next()) {
	String record = res.getString(1);
	if (uniqueMe.add(record)) {
		out.println(record);
	}
}
out.println();
out.close();
res.close();
stmt.close();

This code uses the MySQL Connector/J JDBC driver. It makes sure only to overwrite the hosts file if there were actual records returned from the database query.

Scheduling the Regeneration

Now that you have a script that regenerates that hosts file (you did wrap that Java program into a script, right?) you need to place that script on each client and schedule a cron job to run it regularly. Via cron you can run it as often as every minute if you want – it adds a negligible amount of load to the database server so feel free – but if you need more frequent updates you’ll need to write your own driver to call the regeneration script more frequently.

If you find this technique helpful – or have any questions about it – I’d be happy to hear from you in the comments.

Update December 2010: Guy Rosen guest-authored this article on using AWS’s DNS service Route 53 to track instances.

Categories
Cloud Developer Tips

How I Moved 5% of All Objects in S3 with Jets3t

This is a true story about a lot of data. The cast of characters is as follows:

The Protagonist: Me.

The Hero: Jets3t, a Java library for using Amazon S3.

The Villain: Decisions made long ago, for forgotten reasons.

Innocent Bystanders: My client.

Once Upon a Time…

Amazon S3 is a great place to store media files and allows these files to be served directly from S3, instead of from your web server, thereby saving your server’s network and CPU for more important tasks. There are some gotchas with serving files directly from S3, and it is these gotchas that had my client locked in to paying for bandwidth and CPU to serve media files directly from his web server.

You see, a few years ago when my client first created their S3 bucket, they named it media.example.com. Public objects in that bucket could be accessed via the URL http://s3.amazonaws.com/media.example.com/objectKey or via the Virtual Host style URL http://media.example.com.s3.amazonaws.com/objectKey. If you’re just serving images via HTTP then this can work for you. But you might have a good reason to convince the browser that all the media is being served from your domain media.example.com (for example, when using Flash, which requires an appropriately configured crossdomain.xml). Using a URL that lives at s3.amazonaws.com or a subdomain of that host will not suffice for these situations.

Luckily, S3 lets you set up your DNS in a special manner, convincing the world that the same object lives at the URL http://media.example.com/objectKey. All you need to do is to set up a DNS CNAME alias pointing media.example.com to media.example.com.s3.amazonaws.com. The request will be routed to S3, which will look at the HTTP Host header and discover the bucket name media.example.com.

So what’s the problem? That’s all great for bucket with a name that works in DNS. But it won’t work for a bucket whose name is Bucket.example.com, because DNS is case insensitive. There are limitations on the name of a bucket if you want to use the DNS alias. This is where we reveal a secret: the bucket was not really named media.example.com. For some reason nobody remembers, the bucket was named Media.example.com – with a capital letter, which is invalid in DNS entries. This makes all the difference in the world, because S3 cannot serve this bucket via the Virtual Host method – you get a NoSuchBucket error when you try to access http://Media.example.com.s3.amazonaws.com/objectKey (equivalent to http://Media.example.com/objectKey with the appropriate DNS CNAME in place).

As a workaround my client developed an application that dynamically loaded the media onto the server and served it directly from there. This server serviced media.example.com, and it would essentially do the following for each requested file:

  1. Do we already have this objectKey on our local filesystem? If yes, go to step 3.
  2. Fetch the object from S3 via http://s3.amazonaws.com/Media.example.com/objectKey and save it to the local filesystem.
  3. Serve the file from the local filesystem.

This workaround allowed the client to release URLs that looked correct, but required using a separate server for the job. It costs extra time (when there is a cache miss) and money (to operate the server).

The challenge? To remove the need for this caching server and allow the URLs to be served directly from S3 via media.example.com.

Just Move the Objects, Right?

It might seem obvious: Why not simply move the objects to a correctly-named bucket? Turns out that’s not quite so simple to do in this case.

Obviously, if I was dealing with a few hundred, thousand, or even tens of thousands of objects, I could use a GUI tool such as CloudBerry Explorer or the S3Fox Organizer Firefox Extension. But this client is a popular web site, and has been storing media in the bucket for a few years already. They had 5 billion objects in the bucket (which is 5% of the total number of objects in S3). These tools crashed upon viewing the bucket. So, no GUI for you.

S3 is a hosted object store system. Why not just use its MOVE command (via the API) to move the objects from the wrong bucket to the correctly-named bucket? Well, it turns out that S3 has no MOVE command.

Thankfully, S3 has a COPY command which allows you to copy an object on the server-side, without downloading the object’s contents and uploading them again to the new location. Using some creative programming you can put together a COPY and a DELETE (only if the COPY succeeded!) to simulate a MOVE. I tried using the boto Python library but it choked on manipulating any object in the bucket name Media.example.com – even though it’s a legal name, it’s just not recommended – so I couldn’t use this tool. The Java-based Jets3t library was able to handle this unfortunate bucket name just fine, and it also provides a convenience method to move objects via COPY and DELETE. The objects in this bucket are immutable, so we don’t need to worry about consistency.

So I’m all set with Jets3t.

Or so I thought.

First Attempt: Make a List

My first attempt was to:

  1. List all the objects in the bucket and put them in a database.
  2. Run many client programs that requested the “next” object key from the database and deleted the entry from the database when it was successfully moved to the correctly-named bucket.

This approach would provide a way to make sure all the objects were moved successfully.

Unfortunately, listing so many objects took too long – I allowed a process to list the bucket’s contents for a full 24 hours before killing it. I don’t know how far it got, but I didn’t feel like waiting around for it to finish dumping its output to a file, then waiting some more to import the list into a database.

Second Attempt: Make a Smaller List

I thought about the metadata I had: The objects in the bucket all had object keys with a particular structure:

/binNumber/oneObjectKey

binNumber was a number from 0 to 4.5 million, and each binNumber served as the prefix for approximately 1200 objects (which works out to 5.4 billion objects total in the bucket). The names of these objects were essentially random letters and numbers after the binNumber/ component. S3 has  a list objects with this prefix method. Using this method you can get a list of object keys that begin with a specific prefix – which is perfect for my needs, since it will return a list of very manageable size.

So I coded up something quick in Java using Jets3t. Here’s the initial code snippet:

public class MoveObjects {
private static final String AWS_ACCESS_KEY_ID = .... ; private static final String AWS_SECRET_ACCESS_KEY = .... ; private static final String SOURCE_BUCKET_NAME = "Media.example.com"; private static final String DEST_BUCKET_NAME = "media.example.com"; public static void main(String[] args) {
AWSCredentials awsCredentials = new AWSCredentials(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY); S3Service restService = new RestS3Service(awsCredentials); S3Bucket sourceBucket = restService.getBucket(SOURCE_BUCKET_NAME); final String delimiter = "/"; String[] prefixes = new String[...]; for (int i = 0; i < prefixes.length; ++i) {
// fill the list of binNumbers from the command-line args (not shown)
prefixes[i] = String.valueOf(...);
} ExecutorService tPool = Executors.newFixedThreadPool(32); long delay = 50; for (String prefix : prefixes) {
S3Object[] sourceObjects = restService.listObjects(sourceBucket, prefix + delimiter, delimiter); if (sourceObjects != null && sourceObjects.length > 0) {
System.out.println(" At key " + sourceObjects[0].getKey() + ", this many: " + sourceObjects.length); for (int i = 0; i < sourceObjects.length; ++i) {
final S3Object sourceObject = sourceObjects[i]; final String sourceObjectKey = sourceObject.getKey(); sourceObject.setAcl(AccessControlList.REST_CANNED_PUBLIC_READ); Mover mover = new Mover(restService, sourceObject, sourceObjectKey); while (true) {
try {
tPool.execute(mover); delay = 50; break;
} catch (RejectedExecutionException r) {
System.out.println("Queue full: waiting " + delay + " ms"); Thread.sleep(delay); // backoff and retry delay += 50;
}
}
}
}
} tPool.shutdown(); tPool.awaitTermination(360000, TimeUnit.SECONDS); System.out.println(" Completed!");
} private static class Mover implements Runnable {
final S3Service restService; final S3Object sourceObject; final String sourceObjectKey; Mover(final S3Service restService, final S3Object sourceObject, final String sourceObjectKey) {
this.restService = restService; this.sourceObject = sourceObject; this.sourceObjectKey = sourceObjectKey;
} public void run() {
Map moveResult = null; try {
moveResult = restService.moveObject(SOURCE_BUCKET_NAME, sourceObjectKey, DEST_BUCKET_NAME, sourceObject, false); if (moveResult.containsKey("DeleteException")) {
System.out.println("Error: " + sourceObjectKey);
}
} catch (S3ServiceException e) {
System.out.println("Error: " + sourceObjectKey + " EXCEPTION: " + e.getMessage());
}
}
}
}

The code uses an Executor to control a pool of threads, each of which is given a single object to move which is encapsulated in a Mover. All objects with a given prefix (binNumber) are listed and then added to the Executor’s pool to be moved. The initial setup of Jets3t with the credentials and building the array of prefixes is not shown.

We need to be concerned that the thread pool will fill up faster than we can handle the operations we’re enqueueing, so we have backoff-and-retry logic in that code. But, notice we don’t care if a particular object’s move operation fails. This is because we will run the same program again a second time, after covering all the binNumber prefixes, to catch any objects that have been left behind (and a third time, too – until no more objects are left in the source bucket).

I ran this code on an EC2 m1.xlarge instance in two simultaneous processes, each of which was given half of the binNumber prefixes to work with. I settled on 32 threads in the thread pool after a few experiments showed this number ran the fastest. I made sure to set the proper number of underlying HTTP connections for Jets3t to use, with these arguments: -Ds3service.max-thread-count=32 -Dhttpclient.max-connections=60 . Things were going well for a few hours.

Third Attempt: Make it More Robust

After a few hours I noticed that the rate of progress was slowing. I didn’t have exact numbers, but I saw that things were just taking longer in minute 350 than they had taken in minute 10. I could have taken on the challenge of debugging long-running, multithreaded code. Or I could hack in a workaround.

The workaround I chose is to force the program to terminate every hour, and to restart itself. I added the following code to the main method:

    // exit every hour
    Timer t = new Timer(true);
    TimerTask tt = new TimerTask() {
    	public void run() {
    		System.out.println("Killing myself!");
    		System.exit(42);
    	}
    };
    final long dieMillis = 3600 * 1000;
    t.schedule(tt, dieMillis);

And I wrapped the program in a “forever” wrapper script:

#! /bin/bash

while true; do
	DATE=`date`
	echo $DATE: $0: launching $*
	$* 2>&1
done

This script is invoked as follows:

ARGS=... ./forever.sh nohup java -Ds3service.max-thread-count=32 -Dhttpclient.max-connections=60 -classpath bin/:lib/jets3t-0.7.2.jar:lib/commons-logging-1.1.1.jar:lib/commons-httpclient-3.1.jar:lib/commons-codec-1.3.jar com.orchestratus.s3.MoveObjects $ARGS >> nohup.out 2>&1 &

Whenever the Java program terminates, the forever wrapper script re-launches it with the same arguments. This works properly because the only objects that will be left in the bucket will be those that haven’t been deleted yet. Eventually, this ran to completion and the program would start, check all its binNumber prefixes, find nothing, exit, restart, find nothing, exit, restart, etc.

The whole process took 5 days to completely move all objects to the new bucket. Then I gave my client the privilege of deleting the Media.example.com bucket.

Lessons Learned

Here are some important lessons I learned and reinforced through this project.

Use the metadata to your benefit

Sometimes the only thing you know about a problem is its shape, not its actual contents. In this case I knew the general structure of the object keys, and this was enough to go on even if I couldn’t discover every object key a priori. This is a key principle when working with large amounts of data: the metadata is your friend.

Robustness is a feature

It took a few iterations until I got to a point where things were running consistently fast. And it took some advanced planning to design a scheme that would gracefully tolerate failure to move some objects. But without these features I would have had to manually intervene when problems arose. Don’t let intermittent failure delay a long-running process.

Sometimes it doesn’t pay to debug

I used an ugly hack workaround to force the process to restart every hour instead of debugging the actual underlying problem causing it to gradually slow down. For this code, which was one-off code that I wrote for my specific circumstances, I decided this was a more effective approach than getting bogged down making it correct. It works fast when brute-forced, so it didn’t need to be truly corrected.

Repeatability

I’ve been thinking about how someone would repeat my experiments and discover improvements to the techniques I employed. We could probably get by without actually copying and deleting the objects, rather we could perform two successive calls – perhaps to get different metadata headers. We’d need some public S3 bucket with many millions of objects in it to make a comparable test case. And we’d need an S3 account willing to let users play in it.

Any takers?